Full Text
CYBERNETICS AND CONTROL THEORY
M. L. TSETLIN
A NOTE ON THE GAME OF A FINITE AUTOMATON WITH A PARTNER USING A MIXED STRATEGY
(Presented by Academician M. V. Keldysh on 16 X 1962)
The example of the behavior of an automaton that constitutes the content of this note relates to the case of a two-person zero-sum game, for which the well-known theorem of von Neumann holds (see, for example, (¹–⁴)). Assuming that one of the players uses an arbitrary mixed strategy, while the other is a finite automaton, we indicate a construction of such an automaton which, with a sufficiently large memory capacity (and with one further condition, determined by the game matrix), “plays no worse” than its partner. In particular, if the opponent uses an optimal strategy, then the automaton’s payoff tends, as the capacity of its memory grows, to the von Neumann value of the game. The note uses the concepts and results of works (⁵, ⁶).
Consider a rectangular game \(\Gamma\), given by the matrix \(A=\|a_{ik}\|\), \(i=1,\ldots,M;\ k=1,\ldots,N\). We shall assume that the game is repeated many times, and that the outcome of a play in which the first player applies his \(i\)-th strategy against the \(k\)-th strategy of the second player is a unit win of the first player (loss of the second) with probability \(p_{ik}=\frac12(1+a_{ik})\) and a unit loss of the first player (win of the second) with probability \(q_{ik}=\frac12(1-a_{ik})\). In this play the mathematical expectation of the payoff of the first player is \(a_{ik}=p_{ik}-q_{ik}\). Without restricting the generality of the discussion, one may assume \(|a_{ik}|\leqslant 1\).
Suppose that the first player in the game \(\Gamma\) is a finite automaton \(\mathfrak A\), given by its canonical equations
\[
\varphi(t+1)=\Phi(\varphi(t),\,s(t+1)),
\]
\[
f(t)=F(\varphi(t)),\qquad t=1,2,\ldots
\tag{1}
\]
Let the input variable \(s(t)\) be able to take only two values, 0 and 1. We shall call the value \(s(t)=0\) a win, and \(s(t)=1\) a loss of the automaton \(\mathfrak A\) in the play carried out at time \(t-1\). The output variable \(f(t)\) may take \(M\) values \(f_1,\ldots,f_M\). We shall say that in the play carried out at time \(t\) the automaton uses the \(i\)-th pure strategy if \(f(t)=f_i\). The values \(\varphi_1,\ldots,\varphi_m\) of the variable \(\varphi(t)\) will be called the states of the automaton, and the number \(m\) the capacity of its memory.* The choice of strategies as a function of the outcomes of the plays already made is determined by the structure of the automaton, given by equations (1).
Suppose that the opponent of the automaton \(\mathfrak A\) in the game \(\Gamma\) is a device implementing some mixed strategy, i.e. using in each play its \(k\)-th pure strategy \((k=1,\ldots,N)\) with probability \(x_k\), \(x_1+\cdots+x_N=1\). By the definition of a mixed strategy, the quantities \(x_k\) are functions of the matrix \(A\) of the game \(\Gamma\) and do not depend on the behavior of the partner.
Let, in the play carried out at time \(t\), the automaton \(\mathfrak A\) use its \(i\)-th pure strategy. Define the probability \(p_i\) of its win \((s(t+1)=0)\) and the probability \(q_i\) of its loss \((s(t+1)=1)\):
\[ p_i=\sum_{k=1}^{N} p_{ik}x_k,\qquad q_i=\sum_{k=1}^{N} q_{ik}x_k,\qquad i=1,\ldots,M. \tag{2} \]
* The strategies, wins, and losses for the automaton described here coincide with the actions, non-penalties, and penalties for the automata described in (⁵, ⁶).
Thus, the game \(\Gamma\) of the automaton \(\mathfrak A\) with an opponent who has chosen a mixed strategy is described by the behavior of a finite automaton in a stationary random environment \(C=C(q_1,\ldots,q_M)\) in the sense of \((^5,^6)\).
Let now the automaton playing the game \(\Gamma\) be the automaton \(L_{Mn,M}\) with a linear tactic (see \((^5,^6)\)). This automaton has \(nM\) states \(\varphi^i_\alpha\) \((i=1,\ldots,M;\ \alpha=1,\ldots,n)\). In the states \(\varphi^i_\alpha\) the \(i\)-th pure strategy is used. The state transitions, depending on the value of the input variable \(s\), are carried out as follows: for \(s=0\) (win) the state \(\varphi^i_\alpha\) goes into the state \(\varphi^i_{\alpha-1}\) (for \(\alpha=2,\ldots,n\)), and the state \(\varphi^i_1\) goes into itself. For \(s=1\) (loss) the state \(\varphi^i_\alpha\) goes into the state \(\varphi^i_{\alpha+1}\) (for \(\alpha=1,\ldots,n-1\)), the state \(\varphi^i_n\) goes into \(\varphi^{i+1}_n\) (for \(i=1,\ldots,M-1\)); the state \(\varphi^M_n\) goes into \(\varphi^1_n\). Then from the results of \((^6)\) it follows without difficulty that the mathematical expectation of the payoff \(W(n)\) of the automaton \(L_{nM,M}\) in the game \(\Gamma\) is expressed by the formula
\[ W(n)=\sum_{i=1}^{M}(1-\lambda_i^n)\bigg/\sum_{i=1}^{M}\frac{1-\lambda_i^n}{a_i}, \tag{3} \]
where
\[ a_i=\sum_{k=1}^{M} a_{ik}x_k,\qquad \lambda_i=\frac{p_i}{q_i}. \tag{4} \]
Passing to the limit as \(n\to\infty\), it is not hard to verify that
\[ W=\lim_{n\to\infty}W(n)=\max(a_1,\ldots,a_M) \tag{5} \]
provided that
\[ \max(a_1,\ldots,a_M)\geq 0. \tag{6} \]
If, however, all the numbers \(a_1,\ldots,a_M\) are negative, then
\[ W=M\,(a_1^{-1}+\ldots+a_M^{-1})^{-1}\,*. \]
If the optimal one is used as the mixed strategy and inequality (6) holds, then the value \(W\) coincides with the value of the game in von Neumann’s sense; thus, the automaton \(L_{Mn,M}\) (for sufficiently large \(n\)) plays no worse than its partner, who has chosen the optimal strategy. At the same time, the automaton, unlike its partner, does not possess a priori information about the structure of the matrix \(A\) of the game \(\Gamma\), obtaining all the necessary information in the course of the game itself.
In the case when condition (6) is not satisfied, i.e., when the mathematical expectation \(a_i\) of the payoff of the automaton is negative for each of its possible strategies, the automaton plays somewhat more crudely, not minimizing the payoff of its opponent, but attaining only the harmonic mean of the quantities \(a_1,\ldots,a_M\).
Received
4 X 1962
CITED LITERATURE
\(^1\) D. Blackwell, M. A. Girshick, Theory of Games and Statistical Decisions, IL, 1958. \(^2\) R. D. Luce, H. Raiffa, Games and Decisions, IL, 1961. \(^3\) C. Vajda, Collections. Linear Inequalities, IL, 1959. \(^4\) J. McKinsey, Introduction to the Theory of Games, M., 1960. \(^5\) M. L. Tsetlin, DAN, 139, No. 4 (1961). \(^6\) M. L. Tsetlin, Automation and Remote Control, 22, No. 10 (1961).
\(*\) That is, \(W\) is the harmonic mean of the numbers \(a_1,\ldots,a_M\).