Reports of the Academy of Sciences of the USSR

I. V. ROMANOVSKII

Submitted 1964-01-01 | RussiaRxiv: ru-196401.67548 | Translated from Russian

Full Text

Reports of the Academy of Sciences of the USSR
1964. Volume 157, No. 5

CYBERNETICS AND CONTROL THEORY

I. V. ROMANOVSKII

DERANDOMIZATION OF OPTIMAL STRATEGIES IN ANTAGONISTIC GAMES WITH BLUFFING

(Presented by Academician V. I. Smirnov on 21 III 1964)

In the literature many examples have been considered of antagonistic games in which the results of the actions of an “intermediary”—random moves of nature—are known to the participants of the game to varying degrees. Such are, for example, games with bluffing, in which the situation depends on a parameter that is a random variable with a given distribution function, and the realized value of this parameter is learned by only one of the players. In many such games (see, for example, (¹)) the optimal strategies that are constructed turn out to be nonrandomized. Apparently, this is explained by the fact that the uncertainty introduced into the game by the “intermediary” proves sufficient to carry out the required randomization. In support of this consideration one may note that, in most similar cases, the set of values of the parameter determined by the “intermediary” is sufficiently representative in comparison with the sets of alternatives available to the players.

In the present note an attempt will be made to find a general regularity in this phenomenon. We shall consider two game models—a game with bluffing and a game with inaccurate transmission of information.

The first of them is defined as follows. The “intermediary” chooses a random number \(x\), uniformly distributed on \([0, 1]\). Player I, knowing the realization \(x\) obtained by the “intermediary,” chooses one of his pure strategies \(r_i,\ i = 1, 2, \ldots, m\). Simultaneously player II, who does not know \(x\), chooses one of his pure strategies \(s_j,\ j = 1, 2, \ldots, n\). The payoff of the second player to the first in this situation is \(K_{ij}(x)\), where \(K_{ij}(x)\) for any \(ij\) is a given continuous function of \(x\).

A mixed strategy of player II is, as usual, a probability vector \(q = (q_1, \ldots, q_n)\). The mixed strategy of player I should be defined as a probability measure on the set of pure strategies of player I, which in our case will be measurable functions of \(x\), taking the values \(1, 2, \ldots, m\). However, it can be shown that in the game we have defined there is an analogue of Kuhn’s theorem (²) on the equivalence of mixed strategies to the corresponding behavior strategies (player I has perfect memory in this game). Therefore it is sufficient for us to consider only behavior strategies of player I.

By a behavior strategy \(p\) of player I we shall mean a collection of \(m\) nonnegative Lebesgue-measurable functions \(p_i(x)\), for which \(\sum_{i=1}^{m} p_i(x) \equiv 1\).

The payoff of player I when he uses strategy \(p\), and player II strategy \(q\), will be equal to

\[ K(p,q)=\int_{0}^{1}\sum_{i=1}^{m}\sum_{j=1}^{n} p_i(x) q_j K_{ij}(x)\,dx. \tag{1} \]

The following minimax theorem holds:

Theorem 1. There exist strategies \(\mathfrak p_0\) and \(\mathfrak q_0\) such that, for all \(\mathfrak p\) and \(\mathfrak q\),

\[ K(\mathfrak p,\mathfrak q_0)\leq K(\mathfrak p_0,\mathfrak q_0)\leq K(\mathfrak p_0,\mathfrak q). \tag{2} \]

The proof of this theorem can easily be carried out by the usual method of constructing approximate finite \(\varepsilon\)-games \({}^{(3)}\), for which one can use Kuhn’s theorem and the minimax theorem for finite games.

Theorem 2. For every behavioral strategy \(\mathfrak p\) of player I there exists a pure strategy of this player that is no worse than \(\mathfrak p\).

Proof. We must show that, for the given \(\mathfrak p\), there is a strategy \(i(x)\) such that, for any \(j\), the inequality

\[ \int_0^1 \sum_{i=1}^{m} p_i(x)K_{ij}(x)\,dx \leq \int_0^1 K_{i(x)j}\,dx \tag{3} \]

will hold.

For this we use a lemma that may be regarded as a generalization of the theorem on partitions of an interval similar with respect to a finite number of continuous measures \({}^{(4)}\).

Lemma. Let \(m\) measurable nonnegative functions \(p_i(x)\) be given on \([0,1]\), for which \(\sum_i p_i(x)\equiv 1\). There exist measurable sets \(A_1,A_2,\ldots,A_m\) such that

\[ A_{i_1}\cap A_{i_2}=\Lambda \quad \text{for } i_1\ne i_2, \qquad \bigcup_i A_i=[0,1] \]

and, for any \(j\),

\[ \int_0^1 \sum_{i=1}^{m} p_i(x)K_{ij}(x)\,dx \leq \sum_{i=1}^{m}\int_{A_i} K_{ij}(x)\,dx . \tag{4} \]

We shall briefly present the proof of the lemma. Its idea is taken from the proof of the lemma in \({}^{(5)}\) and consists in successively transforming the functions \(p_i\) in such a way that the set of \(x\) at which the \(p_i\) take values different from 0 and 1 is successively reduced. For this purpose the sets

\[ B_{i_1i_2}=\{x\mid 0<p_{i_1}<1,\;0<p_{i_2}<1\} \]

are formed.

If any one of these sets has positive measure, then it is possible to change the functions \(p_{i_1}(x)\) and \(p_{i_2}(x)\) so that the values of the corresponding integrals in (4) do not decrease, the sum \(p_{i_1}(x)+p_{i_2}(x)\) and the remaining functions \(p_i(x)\) remain unchanged, and the measure \(B_{i_1i_2}\) decreases. Further, the proof of the existence of the sets \(A_i\) is carried out without particular difficulty.

From (4) we obtain (3) in an obvious way, and the theorem is proved. From Theorems 1 and 2 it follows directly that among the optimal strategies of player I there is at least one pure strategy.

Remark 1. The same arguments are carried over without difficulty to the case of an arbitrary convex closed set of parameters of a finite-dimensional Euclidean space and any continuous distribution on the set of parameters.

Remark 2. A game in which player II knows the result of player I’s move may be regarded as a special case of the game considered. In this version of the game player II also has a pure optimal strategy.

Another game model in which the same effect is observed is that of games with imperfect transmission of information (see (6)). The following antagonistic game is meant.

Player I chooses one of his pure strategies \(r_i,\ i=1,\ldots,m\). After this, the mediator finds the realization \(x\) of a continuous random variable distributed on \([0,1]\) with density \(f_i(x)\). Player II, knowing \(x\), chooses one of his pure strategies \(s_j,\ j=1,\ldots,n\). In this situation the payoff to the first player, \(K_{ij}\), is determined and does not depend on \(x\).

The mixed strategies of player I in this game are defined, as usual, by the probability vector \(\mathbf p=(p_1,\ldots,p_m)\). The mixed strategies of player II, which here too should be introduced as probability measures on the set of all pure strategies, will, as before, be replaced by behavior strategies, defined as a collection of measurable nonnegative functions \(q_i(x)\) whose sum is identically equal to 1.

When strategies \(\mathbf p\) and \(\mathbf q\) are used, the payoff of player I is equal to

\[ K(\mathbf p,\mathbf q)=\sum_i p_i \int_0^1 f_i(x)\sum_j K_{ij}q_j(x)\,dx. \tag{5} \]

The minimax theorem for this game was proved in (6). From the form of the payoff function \(K(\mathbf p,\mathbf q)\) it easily follows that in this game, too, one may restrict oneself to behavior strategies of player II.

Theorem 3. Whatever the behavior strategy \(\mathbf q\) of player II, there exists a pure strategy \(j(x)\) of this player that will be no worse than the strategy \(\mathbf q\), i.e., for every \(i\)

\[ \int_0^1 f_i(x)\sum_j K_{ij}q_j(x)\,dx \ge \int_0^1 f_i(x)K_{ij(x)}\,dx. \tag{6} \]

The proof follows directly from the following assertion (formulated analogously to the lemma).

There exist measurable sets \(A_j\), forming a partition of \([0,1]\), such that for every \(i\)

\[ \int_0^1 f_i(x)\sum_{j=1}^n K_{ij}q_j(x)\,dx \ge \sum_{j=1}^n K_{ij}\int_{A_j} f_i(x)\,dx. \tag{7} \]

From this theorem, as in the preceding model, we obtain a proof of the existence of an optimal pure strategy for player II.

Leningrad State University
named after A. A. Zhdanov

Received
19 III 1964

REFERENCES

S. Karlin, Mathematical Methods and Theory in Games, Programming and Economics, 2, London, 1959.
H. W. Kuhn, Contributions to the Theory of Games, 2, Princeton, 1953, p. 193.
A. Wald, Statistical Decision Functions, N. Y., 1950.
A. A. Lyapunov, Izv. AN SSSR, Ser. Mat., 3, 465 (1950).
Yu. V. Linnik, I. V. Romanovskii, V. N. Sudakov, DAN, 155, No. 6 (1964).
I. V. Romanovskii, Teor. Veroyatn. i ee Primenen., 7, No. 1, 89 (1962).

Submission history

[v1] 1964-01-01

Full Text

CYBERNETICS AND CONTROL THEORY

DERANDOMIZATION OF OPTIMAL STRATEGIES IN ANTAGONISTIC GAMES WITH BLUFFING

REFERENCES

Submission history

Access Paper

Citation

Share

Related Papers

Feedback

Reports of the Academy of Sciences of the USSR