Full Text
Cybernetics and Control Theory
Corresponding Member of the USSR Academy of Sciences I. M. Gelfand, I. I. Pyatetskii-Shapiro,
M. L. Tsetlin
On Some Classes of Games and Games of Automata
Problems concerning the collective behavior of automata arise naturally in constructing mathematical models of the simplest forms of collective behavior. The collective behavior of automata is generated by their interaction. A convenient way of specifying such interaction is the language of game theory \((^{1,2})\). The application of this language, while narrowing the class of forms of behavior under study, nevertheless leads to the construction of a number of meaningful models. In constructing such models we used constructions of automata possessing expedient behavior in stationary random environments (“in a game with nature”), as well as the definition of a game of automata given in papers \((^{6,7})\). The zero-sum game of two automata was studied in papers \((^{5,7})\).
Attempts to construct games modeling relatively uncomplicated forms of collective behavior naturally lead to the singling out of certain relatively simple classes of games. This note is devoted to the description of such classes and of the concepts arising here.
- Let \(N\) players \(A_1, \ldots, A_N\) participate in a game \(\Gamma\). Player \(A_k\), \(k = 1, \ldots, N\), has at his disposal \(n_k\) strategies \(f_1^k, \ldots, f_{n_k}^k\). A play \(f\) of the game \(\Gamma\) is the set \(f = (f_{i_1}^1, \ldots, f_{i_N}^N)\) of strategies chosen by the players. On the set of plays \(N\) functions \(m_k(f)\) are given. The function \(m_k(f)\) is called the payoff function of player \(A_k\) and has the meaning of the amount of gain (the mathematical expectation of the gain) of player \(A_k\) in the play \(f\).
In studying the behavior of collectives consisting of a large number of players, it is natural to single out those classes of games for which the payoff functions depend only on a small number of parameters. Let us consider, in particular, such games in which the payoff function of each player is determined only by the choice of strategies of a small number of players—his “neighbors.” We give examples of such games.
Example 1. Games on a circle.
1a. The payoff function of a player depends only on his strategy and on the strategy of his left neighbor.
1b. The payoff function of a player depends on his strategy and on the strategies of his left and right neighbors.
Example 2. A game on the plane. The players \(A_{11}, A_{12}, \ldots, A_{NN}\) participate in the game; the payoff function of player \(A_{ik}\) is determined by his strategies and by the strategies of the players \(A_{i+1,k}\), \(A_{i-1,k}\), \(A_{i,k+1}\), \(A_{i,k-1}\).
Analogously, a game can be specified using an arbitrary difference scheme.
Games with a bounded number of neighbors are conveniently specified by means of special game graphs. For this, to player \(A_k\) there is put in correspondence a vertex \(k\) of the graph. If the payoff function \(m_k(f)\) depends on the strategy of player \(i\), then an arrow is drawn from vertex \(i\) to vertex \(k\).
The game graphs of Example 1 are shown in Fig. 1 (a and b).
- Suppose that in the game \(\Gamma\) there is a play \(f^0\) in which it is not advantageous for any of the players to change his strategy if the remaining players do not change their strategies. Such a game we shall call a Nash game,
and the strategy profile \(f^0\) a Nash profile. If \(f^0=(f^1_{i_1},\ldots,f^N_{i_N})\), then the inequalities
\[ m_k(f^0)\geq m_k(f^1_{i_1},\ldots,f^k_{i_{k-1}}, f^k_j, f^{k+1}_{i_{k+1}},\ldots,f^N_{i_N}) \tag{1} \]
hold for all values \(k=1,\ldots,N\) and \(j=1,\ldots,n_k\).
Example 3. The players \(A_1\) and \(A_2\), each having two strategies, 1 and 2, participate in the game. Denote by \(m_{ik}\) the payoff of a player using his \(i\)-th strategy against the \(k\)-th strategy of his partner. If \(m_{11}=m_{22}=0.25\), \(m_{12}=0.9\), \(m_{21}=-0.1\), then the profile \(f=(1,1)\) is a Nash profile. Indeed, a change of strategy by either player reduces his payoff from \(0.25\) to \(-0.1\).
Fig. 1
Suppose now that in a Nash game the players can choose their strategies so that the payoff functions of all players are simultaneously maximized. We shall call such games \(K\)-games. In \(K\)-games there exists a profile \(f_0\) such that
\[ m_k(f_0)\geq m_k(f) \]
for all \(k=1,\ldots,N\) and for all profiles \(f\). We shall call the profile \(f_0\) a \(K\)-profile. \(K\)-games admit complete solidarity of interests of all players. Let us give the simplest example of such a game.
Example 4. Let \(m_k(f)=i_1+\cdots+i_N\), where \(f=(f^1_{i_1},\ldots,f^N_{i_N})\). Obviously, \(f^0=(f^1_{n_1},\ldots,f^N_{n_N})\) is a \(K\)-profile.
For an arbitrary game one can construct the corresponding \(K\)-game in the following way. Let \(m_k(f)\) be the system of payoff functions of some game \(\Gamma\); define the payoff functions \(M_k(f)\) of the \(K\)-game \(\Gamma_K\) by the relations
\[ M_k(f)=\frac{1}{N}\sum m_j(f), \qquad k=1,\ldots,N, \tag{2} \]
the functions \(M_k(f)\) coincide for all players and therefore simultaneously attain a maximum at some \(K\)-profile. We shall return to this procedure when considering automaton games.
- In this section we shall consider a class of games in which all participants are equal in rights—the so-called homogeneous games. A homogeneous game is specified by only one payoff function, which substantially simplifies the description. Homogeneous games with a small number of “neighbors” are of particular interest.
Let us give definitions. We shall say that a mapping \(g\) of the game \(\Gamma\) onto itself is given if the following are given: 1) a one-to-one mapping of the set of players onto itself, \(gA_i=A_j\); 2) for each \(i\), a one-to-one mapping \(f^i_k\to f^{gi}_{gk}\) of the set of strategies of player \(A_i\) onto the set of strategies of player \(A_{gi}\). The mapping \(g\) determines, in a natural way, a mapping \(f\to gf\) of the set of profiles onto itself.
An automorphism of a game is a mapping \(g\) preserving the payoff functions, i.e. \(m_k(f)=m_{gk}(gf)\).
It is easy to see that the automorphisms of the game \(\Gamma\) form a group \(G_\Gamma\). A game is called homogeneous if this group is transitive on the set of players. In other words, a game is called homogeneous if for any pair of players \(A_i\) and \(A_k\) there is an automorphism \(g\in G_\Gamma\) such that \(g(A_i)=A_k\).
- In this section we shall consider homogeneous games of identical automata, hereafter called homogeneous automaton games. An automaton game, understood in the sense of \((5\text{--}7)\), consists of repeatedly recurring profiles. Each participant in an automaton game receives only information about his gain or loss in the given profile. This information is used—
is used for choosing actions (strategies) in subsequent plays. It is assumed that the automata have no a priori information about the game in which they participate. The payoff functions of the game then describe the interaction of the automata. For ergodic games of automata there exists a final distribution of probabilities of plays. If \(p(f)\) is the final probability of the play \(f\) in an ergodic homogeneous game of automata \(\Gamma\), then \(p(gf)=p(f)\), so that in a homogeneous game of automata the mathematical expectations of the payoffs of all automata coincide.
In a homogeneous game, together with each play \(f\) it is natural to consider the set \(\{gf\}\), invariant with respect to the group \(G_\Gamma\), of all plays of the form \(gf\), \(g \in G_\Gamma\). For each player, his average payoff \(U(f)\) on the invariant set is equal to the arithmetic mean payoff of all players in the play \(f\). We shall call the quantity \(U(f)\) the price of the play. The maximal price of a play will be called the maximal payoff, and the minimal price of a play the minimal payoff. Thus, for example, the game of Example 3 is homogeneous and has 3 invariant sets. The first of them consists of the play \((1,1)\), the second of the play \((2,2)\), the third of the plays \((1,2)\) and \((2,1)\). The maximal payoff is equal to 0.4, and the minimal payoff to 0.25.
In simulating this example on a computer, the average payoff (with sufficiently large memory capacity) was close to 0.25. Generally speaking, automata that receive information only about their own payoffs and losses in individual plays of the game do not obtain the maximal payoff. The point is that if the play \(f\), at which the maximal payoff is attained, is not a Nash play, then at least one of the players will change his strategy, and the total payoff will decrease.
Let us now consider the invariant set of plays generated by a Nash play. If the maximal payoff is attained on this set, then the set is called a Moore set, and the plays generating it are called Moore plays.
The game of Example 2 has no Moore set. However, the maximal payoff 0.4 would be attained if the players agreed among themselves to play only the plays \((1,2)\) and \((2,1)\) with equal probabilities.
For homogeneous games of automata one can propose a procedure which, in a certain sense, replaces such an agreement. We shall call this procedure the common purse*, and it is equivalent to constructing, from the given homogeneous game \(\Gamma\), a homogeneous game \(\Gamma'\) with the same players and strategies as in \(\Gamma\), and in which the payoff functions are defined by formula (2). A game with a common purse \(\Gamma'\) may be understood as the game \(\Gamma\) in which the automata receive information not only about their own payoffs, but also about the payoffs of all the other playing automata, and use them to determine their own behavior. Such greater information about the game allows the automata to bring the payoff up to the maximum. In the game \(\Gamma'\), in every play the payoffs of all players are the same. Therefore, in the game \(\Gamma'\) there necessarily exists a Moore point \((^8)\).
For the game \(\Gamma\) described in Example 2, the game \(\Gamma'\) is specified by the following payoff functions: \(m'_{11}=m_{11}=0.25\); \(m'_{22}=m_{22}=0.25\); \(m'_{12}=m'_{21}=0.4\). In computer simulation the maximal payoff 0.4 is attained (if the memory of the playing automata is sufficiently large). Other examples of achieving the maximal payoff by applying the common purse are given in article \((^8)\).
In a number of important cases the maximal payoff is attained if the game contains a Moore point and the automata form an asymptotically optimal sequence in the sense of (3); if, however, the game has no Moore point, then the average payoff of the players coincides with the maximal price of a Nash play.
- Let us now describe homogeneous games on a circle, the graphs of which are shown in Figs. 1a and 1b. Each of the players has only two actions, 1 and 2.
* The definition of the common purse arose in connection with models in the work of M. L. Tsetlin, S. L. Ginzburg, and V. Yu. Krylov \((^8)\).
For the game in Fig. 1a, the payoff of each of them is determined by his strategy and by the strategy of his left neighbor:
\[ m_k\left(f_{i_1}^{1},\ldots,f_{i_N}^{N}\right)=m_k\left(f_{i_k}^{k},f_{i_{k+1}}^{k+1}\right). \]
If the number \(N\) is even, then this game always has a Nash play. Indeed, the numbers \(m(\lambda_1,\lambda_2)\), \(\lambda_1,\lambda_2=1,2\), satisfy at least one of the following groups of inequalities:
\[ \begin{array}{ll} \text{I. } m(1,1)\geq m(2,1). & \text{II. } m(2,2)\geq m(1,2).\\[2mm] \multicolumn{2}{c}{\text{III. } m(1,1)<m(2,1)\ \text{ and }\ m(2,2)<m(1,2).} \end{array} \]
For case I the Nash play has the form \(f_0=(1,1,\ldots,1)\); for case II \(f_0=(2,2,\ldots,2)\); finally, for case III \(f_0=(1,2,1,2,\ldots,1,2)\).
For the game whose graph is shown in Fig. 1b, the payoff of each of the players is given by the formulas
\[ m_k\left(f_{i_1}^{1},\ldots,f_{i_N}^{N}\right) = m\left(f_{i_{k-1}}^{k-1}, f_{i_k}^{k}, f_{i_{k+1}}^{k+1}\right), \quad k=1,\ldots,N, \tag{3} \]
and, consequently, the game is determined by eight numbers \(m(\lambda_1,\lambda_2,\lambda_3)\), where \(\lambda_i=1,2,\ i=1,2,3\). This game may also fail to have Nash plays.
A sequence \(f=(\lambda_1,\ldots,\lambda_N)\) that is a Nash play must have the following properties: if the triple \(\lambda_1,\lambda_2,\lambda_3\) occurs in the sequence, then the triple \(\lambda_1\mu\lambda_3\), where \(\mu\ne\lambda_2\), does not occur in it. From this remark it follows that Nash plays can have only the following form: either 1) all \(\lambda\)’s are equal to one another; or 2) ones occur only singly, and twos occur in runs of no more than two; or, finally, 3) twos occur only singly, and ones occur in runs of no more than two.
Received
8 VI 1963
REFERENCES
- R. D. Luce, H. Raiffa, Games and Decisions, IL, 1961.
- J. McKinsey, Introduction to the Theory of Games, Moscow, 1960.
- M. L. Tsetlin, Automation and Remote Control, No. 10 (1961).
- M. L. Tsetlin, Uspekhi Mat. Nauk, 18, No. 4 (1963).
- M. L. Tsetlin, Dokl. Akad. Nauk SSSR, 149, No. 1 (1963).
- M. L. Tsetlin, V. Yu. Krylov, Dokl. Akad. Nauk SSSR, 149, No. 2 (1963).
- M. L. Tsetlin, V. Yu. Krylov, Automation and Remote Control, 24, No. 7 (1963).
- M. L. Tsetlin, S. L. Ginzburg, V. Yu. Krylov, Automation and Remote Control, 24, No. 12 (1963).