Reports of the Academy of Sciences of the USSR

M. L. TSETLIN, V. Yu. KRYLOV

Submitted 1963-01-01 | RussiaRxiv: ru-196301.72352 | Translated from Russian

Full Text

Reports of the Academy of Sciences of the USSR

Volume 149, No. 2

CYBERNETICS AND CONTROL THEORY

M. L. TSETLIN, V. Yu. KRYLOV

EXAMPLES OF GAMES OF AUTOMATA

(Presented by Academician M. V. Keldysh on 16 X 1962)

In game theory (see, for example, \((^{1-4})\)), when players choose one strategy or another it is customary to assume that the game is completely specified (for example, by a system of payoff functions) and that the players work out their strategies using this a priori information and any computational means; the strategy (usually mixed) does not change in the course of the game.

In this note we consider the behavior of players in a game whose conditions (including the number of players and their possible strategies) are not known to them in advance. It is assumed that the game is repeated many times. In a single play each of the players chooses one of the pure strategies available to him; the choice of strategy determines the probability of one or another outcome of the play. For each player, the outcome of a play is his unit win or loss; information about the win or loss in the given play is used by the player to choose strategies in subsequent plays.

It is assumed that the participants in the game are finite automata possessing expedient behavior in random environments (see \((^{5,6})\)), whose properties in \((^{5,6})\) were assumed to be constant or to change independently of their actions (“a game with nature”).

In this note we shall give a definition of a game of automata, present the simplest examples, and consider a zero-sum game of two automata with linear tactics, for which we shall compute the value of the mathematical expectation of the payoff, to a certain extent analogous to the von Neumann value of a game.

I. Consider finite automata \(A^1, \ldots, A^\nu\). Suppose that the output variable \(f^i(t)\) of the automaton \(A^i\) \((t = 1, 2, \ldots;\ i = 1, \ldots, \nu)\) takes the values \(f^i = 1, \ldots, k_i\), which we shall call its strategies. Let \(m_i\) be the number of states of the automaton \(A^i\), and let its input variable \(s^i(t)\) be able to take only two values: 0 or 1, the first of which we shall call a win, and the second a loss of the automaton \(A^i\). The definition of the automaton \(A^i\) coincides with that given in \((^{5,6})\).

We shall call a play \(f(t)\) the set of strategies \(f = (f^1, \ldots, f^\nu)\) of the automata \(A^1, \ldots, A^\nu\) at time \(t\). We shall call the outcome \(s(t+1)\) of the play \(f(t)\) the set \(s = (s^1, \ldots, s^\nu)\) of values of the input variable of the automata \(A^1, \ldots, A^\nu\) at time \(t+1\).

We shall say that the automata \(A^1, \ldots, A^\nu\) participate in the game \(\Gamma\), if for each play \(f(t)\) the probability \(P(f; s)\) of its outcome \(s(t+1)\) is specified.

Thus, for example, the equality \(P(1, \ldots, 1;\ 1, \ldots, 1) = 1\) means that if every automaton chooses its first strategy, then all of them necessarily lose.

II. Consider an example of a game of two automata with linear tactics \(L_{4,4}\) and \(L_{5,5}\).* These automata have respectively 4 and 5 strategies; in the event of a loss after the use of the \(i\)-th strategy, the \((i+1)\)-st is chosen,

* The definition of the automaton \(L_{kn,k}\) is in \((^6)\).

whereas in the event of a win they continue to use the \(i\)-th one. After a loss resulting from the use of the \(k\)-th (last) strategy, the first one is chosen.

Consider an example of a game of these automata, specified by the matrices:

\[ \begin{pmatrix} 10110\\ 10111\\ 10110\\ 00010 \end{pmatrix}, \qquad \begin{pmatrix} 01001\\ 11110\\ 11111\\ 01101 \end{pmatrix}. \]

Here, at the intersection of the \(i\)-th row and the \(k\)-th column of the first (respectively, second) matrix is the value of the probability of a loss by the first (respectively, second) automaton. Thus, for example, for the game specified by the matrices given above, if each player applies his first strategy \(f=(1,1)\), then the first player loses and the second wins with probability one, i.e. the outcome \(s=(1,0)\) is realized. In the next play the first player chooses the second strategy, and the second the first, \(f=(2,1)\), predetermining the outcome of the second play \(s=(1,1)\), and so on. Thus the following sequence of plays \(f\) is determined:
\((1,1)\), \((2,1)\), \((3,2)\), \((3,3)\), \((4,4)\), \((1,4)\), \((2,4)\), \((3,5)\), \((3,1)\), \((4,2)\), \((4,3)\), \((4,4)\), \((1,4)\), and so on.

We see that, starting with \((4,4)\), the sequence of plays generates a cycle of length 7. The same cycle is reached by the sequence of plays beginning with \(f=(1,5)\). The remaining sequences of plays end with the play \(f=(4,1)\), whose outcome is \(s=(0,0)\). In this play both automata win and, consequently, no longer change strategies (a stable point).

The presence of cycles and stable points is characteristic of all games for which the probabilities \(P(f;s)\) take the values 0 or 1.

III. Let us now consider a game of two automata that corresponds to the case, studied in detail in game theory, of a two-person zero-sum game. Namely, suppose that for any \(f^1\) and \(f^2\)

\[ P(f^1,f^2;1,1)=P(f^1,f^2;0,0)=0;\qquad P(f^1,f^2;0,1)+P(f^1,f^2;1,0)=1. \]

Then in each play one of the players wins, the other loses, and the quantity
\(p_{f^1f^2}=P(f^1,f^2;0,1)\) (respectively, \(q_{f^1f^2}=1-p_{f^1f^2}\)) has the meaning of the probability of a unit win (respectively, a loss) by the first player in the play \(f=(f^1,f^2)\). The matrix \(\|a_{ik}\|=\|p_{ik}-q_{ik}\|\) of mathematical expectations of the first player’s payoff corresponds to the matrix of a two-person zero-sum game.

Let the automata \(B_{Mn,M}\) and \(B_{Nn,N}\) take part in the game. The automaton \(B_{Mn,M}\) has \(Mn\) states \(\varphi^i_\alpha\) \((i=1,\ldots,M;\ \alpha=1,\ldots,n)\). In the state \(\varphi^i_\alpha\), the \(i\)-th strategy out of \(M\) possible strategies is realized. State transitions are carried out as follows: in the event of a win \((s=0)\), the states \(\varphi^i_\alpha\) \((\alpha=2,\ldots,n)\) pass into the states \(\varphi^i_{\alpha-1}\), while the states \(\varphi^i_1\) pass into themselves.

In the event of a loss \((s=1)\), the states \(\varphi^i_\alpha\) \((\alpha=1,\ldots,n-1)\) pass into the states \(\varphi^i_{\alpha+1}\). The states \(\varphi^i_n\), upon a loss, pass with equal probabilities \(1/M\) into the states \(\varphi^j_n\) \((j=1,\ldots,M)\). Note that the automata \(B_{Mn,M}\) differ from automata with linear tactics only in the way strategies are changed.

For the game of automata \(B_{Mn,M}\) and \(B_{Nn,N}\) defined in this way, one can compute the quantity \(W(n)\), the mathematical expectation of the payoff of the first automaton, which is to a certain extent analogous to the value of the game in the sense of von Neumann. We shall therefore call this quantity the value of the game \(\Gamma\) for the automata \(B_{Mn,M}\) and \(B_{Nn,N}\).

We shall show that the limit \(W=\lim W(n)\) as \(n\to\infty\) exists and has the following properties.

\(1^\circ\). If the matrix \(\|a_{ik}\|\) contains at least one row consisting of nonnegative entries (a column consisting of nonpositive entries), then \(W\) is—

is the harmonic mean of the elements of that row (column) whose smallest element has the greatest value (whose largest element has the smallest value).

\(2^\circ\). If the condition of case \(1^\circ\) is not satisfied, then \(W=0\).

In case \(1^\circ\) the behavior of the first player coincides with the behavior prescribed by the cautious tactics of game theory—the first player chooses a strategy giving him the greatest guaranteed payoff. The second player, however, does not minimize (as in game theory) the payoff of the first player, but is satisfied only with the harmonic mean. We note in passing that case \(1^\circ\) (the presence of a row of nonnegative elements) provides certain advantages for the first player. Case \(2^\circ\) shows that, in the absence of such an explicit advantage for one of the players, the automata play to a draw (\(W=0\)). The absence of a priori information about the conditions of the game leads to the automata playing, so to speak, “more crudely,” not being able to use the finer properties of the matrix. However, even in this case the quantity \(W\) is enclosed between the upper and lower values of the game: \(\max_i \min_k a_{ik} \leq W \leq \min_k \max_i a_{ik}\).

The properties \(1^\circ\) and \(2^\circ\) can be obtained in the following way. Denote by \(\Psi_{\alpha\beta}^{ik}\) \((i=1,\ldots,M;\ k=1,\ldots,N;\ \alpha,\beta=1,\ldots,n)\) that state of the system of automata in which the first automaton \(B_{Mn,M}\) is in state \(\varphi_\alpha^i\), and the second \(B_{Nn,N}\) in state \(\varphi_\beta^k\). It can be shown that the transition probabilities of the states of the system are described by a stochastic matrix determined by the matrix \(\|a_{ik}\|\) of the game \(\Gamma\) and by the structure of the automata. Thus, the behavior of this system is described by a finite ergodic Markov chain, so that the computation of the value \(W(n)\) reduces to finding the final probabilities \(r_{\alpha\beta}^{ik}\) of the states of the system \(\Psi_{\alpha\beta}^{ik}\). Then

\[ W(n)=\sum_{i,k,\alpha,\beta} a_{ik} r_{\alpha\beta}^{ik}. \tag{1} \]

To determine the quantities \(r_{\alpha\beta}^{ik}\) we have the following system of equations:

\[ r_{\alpha\beta}^{ik} = p_{ik} r_{\alpha+1\,\beta-1}^{ik} + q_{ik} r_{\alpha-1\,\beta+1}^{ik} \quad (\alpha,\beta=2,\ldots,n-1); \tag{2} \]

\[ r_{1\beta}^{ik} = p_{ik}\bigl(r_{1\,\beta-1}^{ik}+r_{2\,\beta-1}^{ik}\bigr); \qquad r_{\alpha 1}^{ik} = q_{ik}\bigl(r_{\alpha-1\,1}^{ik}+r_{\alpha-1\,2}^{ik}\bigr) \quad (\alpha,\beta=2,\ldots,n); \tag{3} \]

\[ r_{n\beta}^{ik} = q_{ik} r_{n-1\,\beta+1}^{ik} + \frac{1}{M}\sum_{\gamma=1}^{M} q_{\gamma k} r_{n\,\beta+1}^{\gamma k} \quad (\beta=1,\ldots,n-1); \tag{4} \]

\[ r_{\alpha n}^{ik} = p_{ik} r_{\alpha+1\,n-1}^{ik} + \frac{1}{N}\sum_{\gamma=1}^{N} p_{i\gamma} r_{\alpha+1\,n}^{i\gamma} \quad (\alpha=1,\ldots,n-1); \tag{5} \]

\[ r_{nn}^{ik}=0;\qquad r_{11}^{ik}=0;\qquad \sum_{i,k,\alpha,\beta} r_{\alpha\beta}^{ik}=1. \tag{6} \]

It can be shown that the final probabilities \(r_{\alpha\beta}^{ik}\) are different from zero only in the case \(\alpha+\beta=n+1\). Using further the substitution \(r_{\alpha\,n+1-\alpha}^{ik}=b_{ik}+c_{ik}\lambda_{ik}^{\,n-\alpha}\), where \(\lambda_{ik}=p_{ik}/q_{ik}\), from conditions (4) and (5) for \(b_{ik}\) and \(c_{ik}\) we have

\[ p_{ik}b_{ik}+q_{ik}c_{ik}=\sigma_k;\qquad q_{ik}b_{ik}+p_{ik}\lambda_{ik}^{\,n-1}c_{ik}=\tau_i, \tag{7} \]

where

\[ \sigma_k=\frac{1}{M}\sum_{\gamma=1}^{M} q_{\gamma k} r_{n1}^{\gamma k}, \qquad \tau_i=\frac{1}{N}\sum_{\delta=1}^{N} p_{i\delta} r_{1n}^{i\delta}. \tag{8} \]

Summing the first of these equations with respect to the index \(i\), and the second with respect to the index \(k\), and substituting into the resulting equalities the expressions for \(b_{ik}\) in terms of \(\sigma_k\) and \(\tau_i\), we obtain the following system, which relates \(\sigma_k\) and \(\tau_i\):

\[ \sigma_k \sum_{\gamma=1}^{M} \frac{\lambda_{\gamma k}^{n}(\lambda_{\gamma k}-1)} {\lambda_{\gamma k}^{n+1}-1} = \sum_{\gamma=1}^{M} \tau_\gamma \frac{\lambda_{\gamma k}-1} {\lambda_{\gamma k}^{n+1}-1}, \qquad \tau_i \sum_{\delta=1}^{N} \frac{\lambda_{i\delta}-1} {\lambda_{i\delta}^{n+1}-1} = \sum_{\delta=1}^{N} \sigma_\delta \frac{\lambda_{i\delta}^{n}(\lambda_{i\delta}-1)} {\lambda_{i\delta}^{n+1}-1}. \tag{9} \]

Passing in these formulas to the limit as \(n \to \infty\), we can find the limiting values of the quantities \(\sigma_k, \tau_i\), and then also of the final probabilities \(r_{\alpha\, n+1-\alpha}^{ik}\).

The authors gratefully acknowledge numerous conversations with I. M. Gel'fand, I. I. Pyatetskii-Shapiro, and M. A. Evgrafov, which proved very useful in the preparation of this work.

Received
4 X 1962

References

D. Blackwell, M. A. Girshick, Theory of Games and Statistical Decisions, IL, 1958.
R. D. Luce, H. Raiffa, Games and Decisions, IL, 1961.
S. Vajda, in the collection Linear Inequalities and Related Questions, IL, 1959.
J. McKinsey, Introduction to the Theory of Games, Moscow, 1960.
M. L. Tsetlin, Dokl. Akad. Nauk SSSR, 139, No. 4 (1961).
M. L. Tsetlin, Avtomatika i Telemekhanika, 22, No. 10 (1961).

Submission history

[v1] 1963-01-01

Full Text

Reports of the Academy of Sciences of the USSR

CYBERNETICS AND CONTROL THEORY

EXAMPLES OF GAMES OF AUTOMATA

References

Submission history

Access Paper

Citation

Share

Related Papers

Feedback

Reports of the Academy of Sciences of the USSR