UDC 518.92
MATHEMATICS
Submitted 1970-01-01 | RussiaRxiv: ru-197001.76059 | Translated from Russian

Full Text

UDC 518.92

MATHEMATICS

E. R. SMOLYAKOV

DIFFERENTIAL GAMES IN MIXED STRATEGIES

(Presented by Academician N. N. Krasovskii on 18 VII 1969)

In an \(n\)-dimensional Euclidean space the motion of a material point is governed by the system of equations

\[ R_i(y', y, u, v, t) \equiv y_i' - f_i(y, u, v, t)=0,\qquad i=1,2,\ldots,n, \tag{1} \]

where \(u\) and \(v\) are elements of the closed sets \(U\) and \(V\), respectively. Suppose that at each instant \(t\) player \(u\) chooses \(u\) from \(U\) in such a way as to ensure the minimum of the functional

\[ I=\int_{t_0}^{T} f_0(y,u,v,t)\,dt+\Phi[y(t_0),t_0,y(T),T], \tag{2} \]

while satisfying the boundary conditions

\[ G_k[y(t_0),t_0,y(T),T]=0,\qquad k=1,2,\ldots,p. \tag{3} \]

Player \(v\), choosing \(v\) from \(V\), seeks to maximize \(I\). We shall be interested mainly in cases in which such a game has no value in pure strategies. In this case the game is considered as a “program” game (without synthesis). For the sets \(U\) and \(V\), for clarity of exposition, we shall take closed intervals \([u_0,u_1]\), \([v_0,v_1]\) of the real line. The results are readily generalized to the case where \(U\) and \(V\) are arbitrary finite-dimensional measurable sets.

On the \(\sigma\)-algebra of subsets of the open set \(S_u \supset U\) define a nonnegative countably additive set function \(\varphi(i_u)\), \(i_u \subset S_u\). We subject the measure \(\varphi(i_u)\) to the conditions: \(\varphi(U)=1\), \(\varphi(S_u \setminus U)=0\). The interval \([u_0,u_1]\) is \(\varphi\)-measurable, as is every set closed relative to \(S_u\). Similarly, introduce a \(\psi\)-measure on the open set \(S_v \supset V\).

The generating functions (point functions) \(\varphi(u,t)\), \(\psi(v,t)\) of the measures \(\varphi(i_u,t)\), \(\psi(i_v,t)\) satisfy the following conditions: a) \(\varphi(u,t)\), \(\psi(v,t)\) do not decrease; b) they are upper semicontinuous at every point \(u\in S_u\), \(v\in S_v\) at which they are defined; c)

\[ \int_U d\varphi=\varphi(u_1+0,t)-\varphi(u_0-0,t)=1, \]

\[ \int_V d\psi=\psi(v_1+0,t)-\psi(v_0-0,t)=1, \]

where the integrals are understood in the Lebesgue–Stieltjes sense \((^{1,2})\).

Since two point functions differing by a constant correspond to the same measure, one may set \(\varphi(u_0-0,t)=\psi(v_0-0,t)=0\). The point functions \(\varphi(u,t)\), \(\psi(v,t)\) satisfying a)—c) may be regarded as distribution functions of random variables \(u\in U\), \(v\in V\).

We shall say that player \(u\) uses a mixed strategy if at each instant \(t\) he organizes his behavior in accordance with the distribution function \(\varphi(u,t)\) specified at each instant \(t\) on the set of his admissible states \(U\ni u\).

Now let us formulate the following game, more general than (1)—(3). Suppose that the motion of a material point is governed by the system of equations

\[ \int_U\int_V R_i(y',y,u,v,t)\,d\varphi(u,t)\,d\psi(v,t)=0,\qquad i=1,2,\ldots,n, \tag{4} \]

where \(R_i\) are the functions from (1), and \(y_i(t)\), \(i=1,\ldots,n\), are absolutely continuous.

Player \(u\), by choosing the function \(\varphi(u,t)\), seeks to minimize the functional

\[ \overline{I}=\int_{t_0}^{T}\left[\iint_{UV} f_0(y,u,v,t)\,d\varphi(u,t)\,d\psi(v,t)\right]dt+\Phi \tag{5} \]

and to satisfy the boundary conditions (3). Player \(v\), choosing \(\psi(v,t)\), wishes to maximize the functional (5). Naturally, it is assumed that \(\varphi(u,t)\) and \(\psi(v,t)\) satisfy conditions a)—c) formulated above.

In the present work we shall restrict ourselves to functions \(f_i(y,u,v,t)\), \(i=0,1,\ldots,n\), that are \(l_t L\)-measurable on \((t_0,T)\), bounded, continuous in \(y,u,v\), and continuously differentiable with respect to each of these variables (with continuous mixed derivatives up to the 3rd order assumed) on \(S_u\times S_v\times S_y\), where \(S_y\) is an open set of admissible values \(y(t)\). Such functions are, obviously, Lebesgue–Stieltjes integrable, and the result of integration (for example, in (4)) does not depend on the order of integration, which is easy to show by means of the corresponding theorems from \((^{1,2})\).

We shall call an extremum in the game any extremum (in the usual sense) from the totality of all extrema corresponding to the behavior of the players pursuing not only opposite, but also common goals (for example, both players seek to minimize (5)). Necessary conditions for an extremum in the game (3)—(5) are obtained by the methods set forth in \((^3)\). Everywhere below the integrals are considered in the Riemann sense, unless otherwise specified.

Necessary conditions for a weak extremum are given by relations (6), (10), (11). The Lagrange multipliers \(\lambda_i(t)\), \(i=0,1,\ldots,n\), \(\lambda_0=\mathrm{const}\leqslant 0\), satisfy the equations:

\[ \lambda_k'=-\sum_{i=0}^{n}\lambda_i \left[ \iint_{UV} f''_{iuvy_k}\varphi\psi\,du\,dv -\int_U f''_{iuy_k}(v_1)\varphi\,du -\int_V f''_{ivy_k}(u_1)\psi\,dv +f'_{iy_k}(u_1,v_1) \right], \quad i=1,2,\ldots,n. \tag{6} \]

The distribution functions \(\varphi(u,t)\), \(\psi(v,t)\) for each \(t\) on the open intervals \((u_0,u_1)\subset U\), \((v_0,v_1)\subset V\), respectively, satisfy the relations

\[ \sum_{i=0}^{n} \left[ f_i(u_1,v)-\int_U f'_{iu}\varphi\,du \right]_{v_0}^{v_1} \lambda_i(t)=0, \tag{7} \]

\[ \sum_{i=0}^{n} \left[ f_i(u,v_1)-\int_V f'_{iv}\psi\,dv \right]_{u_0}^{u_1} \lambda_i(t)=0, \tag{8} \]

which are very weak extremum conditions, having no analogue in the calculus of variations and not being fulfilled at arbitrary extrema.

For given \(\varphi\) and \(\psi\), the values \(u_{\mathrm{cp}}(t)\), \(v_{\mathrm{cp}}(t)\) satisfying the conditions

\[ \sum_{i=0}^{n}\lambda_i(t) \left[ \int_V f''_{iuv}\psi\,dv-f'_{iu}(v_1) \right]=0, \qquad \sum_{i=0}^{n}\lambda_i(t) \left[ \int_U f''_{iuv}\varphi\,du-f'_{iv}(u_1) \right]=0, \tag{9} \]

will be called mean controls.

The functions \(\lambda_i(t)\), \(i=1,2,\ldots,n\), and

\[ \mathcal{H}(\lambda(t),y(t),t)\equiv \sum_{i=1}^{n}\lambda_i y_i' +\lambda_0\iint_{UV} f_0\,d\varphi\,d\psi \]

(here the integrals are in the Lebesgue–Stieltjes sense) are continuous everywhere on the interval \((t_0,T)\), except at its endpoints, at which the relations

\[ -[\mathcal H]^{t_0}=-\lambda_0\Phi'_{t_0}+\sum_{k=1}^{p} l_k G'_{kt_0},\qquad [\mathcal H]^T=-\lambda_0\Phi'_T+\sum_{k=1}^{p} l_k G'_{kt}, \tag{10} \]

\[ \lambda_i(t_0)=-\lambda_0\Phi'_{y_i(t_0)}+\sum_{k=1}^{p} l_k G'_{ky_i(t_0)}, \tag{11a} \]

\[ i=1,2,\ldots,n, \]

\[ -\lambda_i(T)=-\lambda_0\Phi'_{y_i(T)}+\sum_{k=1}^{p} l_k G_{ky_i(T)}, \tag{11б} \]

hold, where \(l_k\) are certain constants.

(6) and (11б) are used in deriving the following condition of a strong extremum (a necessary condition for the existence of a value in the game (3)—(5)):

\[ \max_{\bar\varphi}\min_{\bar\psi} \left\{ \sum_{i=0}^{n} \left[ \iint_{U\,V} f''_{iuv}\varphi\bar\psi\,du\,dv -\int_U f'_{iu}(v_1)\bar\varphi\,du -\int_V f'_{iv}(u_1)\bar\psi\,dv \right]\lambda_i(t) \right\} = \]

\[ = \sum_{i=0}^{n} \left[ \iint_{U\,V} f''_{iuv}\varphi\psi\,du\,dv -\int_U f'_{iu}(v_1)\varphi\,du -\int_V f'_{iv}(u_1)\psi\,dv \right]\lambda_i(t) \tag{12} \]

(where it is assumed that \(\max_{\bar\varphi}\min_{\bar\psi}\{\cdot\}=\min_{\bar\psi}\max_{\bar\varphi}\{\cdot\}\)).

Remark. If the integral equations (9) have solutions with respect to \(\varphi(u,t)\) and \(\psi(v,t)\) containing the optimal one, then in this case, instead of (12), we obtain the following conditions, which make it possible to split the game problem into separate variational ones:

\[ \max_{\bar\varphi} \left\{ \sum_{i=0}^{n} \left[ \int_V \left( \int_U f''_{iuv}\bar\varphi\,du - f'_{iv}(u_1) \right)\psi\,dv - \int_U f'_{iu}(v_1)\bar\varphi\,du \right]\lambda_i(t) \right\} = -\sum_{i=0}^{n} \left[ \int_U f'_{iu}(v_1)\varphi\,du \right]\lambda_i(t), \tag{12a} \]

\[ \min_{\bar\psi} \left\{ \sum_{i=0}^{n} \left[ \int_U \left( \int_V f''_{iuv}\bar\psi\,dv - f'_{iu}(v_1) \right)\varphi\,du - \int_V f'_{iv}(u_1)\bar\psi\,dv \right]\lambda_i(t) \right\} = -\sum_{i=0}^{n} \left[ \int_V f'_{iv}(u_1)\psi\,dv \right]\lambda_i(t). \tag{12б} \]

Owing to the weakness of conditions (7), (8), even when the conditions of the remark are not satisfied, game problems may decompose into as many independent variational problems as is the dimension of the set \(U\times V\), as the following example confirms.

Example. Let the motion of a material point in the space \(\{y,t\}\) obey the equation

\[ y'=(v-u)^2,\qquad 0\le t\le T, \tag{13} \]

with the initial condition \(y(0)=y^0\) and free right endpoint \(y(T)\), \(T\) being fixed. Here \(u\in U=[0,1]\), \(v\in V=[0,1]\). The player \(u\), choosing \(u(t)\), seeks to minimize the functional

\[ I=\int_0^T y\,dt, \tag{14} \]

and player \(v\), on the contrary, seeks to make it correspond as much as possible to the choice \(v(t)\). As Berkovitz showed \((^{4})\), in this game there is no value in pure strategies. Let us find the solution of this game in mixed strategies, using the results obtained above. We formulate an auxiliary problem of the form (3)—(5), replacing equation (13) by the following:

\[ y'=\iint_{UV}(v-u)^2\,d\varphi\,d\psi \]

(the integrals being in the Lebesgue—Stieltjes sense).

The behavior of player \(u\) is determined by the function \(\varphi(u,t)\), and that of player \(v\) by \(\psi(v,t)\). From (6) and (11b) we find \(\lambda(t)=t-T\le 0\). Conditions (7)—(9) give

\[ \int_0^1 \varphi\,du=0.5,\qquad \int_0^1 \psi\,dv=0.5. \tag{15} \]

If (15) is substituted into (12), then (12) splits into two conditions (the substitution is possible precisely because of the weakness of (15)):

\[ \max_{\varphi}\left[\int_0^1 u\bar{\varphi}\,du\right]=\int_0^1 u\varphi\,du,\qquad \min_{\psi}\left[\int_0^1 v\bar{\psi}\,dv\right]=\int_0^1 v\psi\,dv. \]

Thus, the game in mixed strategies has been reduced to the following two variational problems.

Problem 1. Find a nondecreasing function \(\varphi(u,t)\) and the corresponding trajectory \(z(u,t)\) satisfying the relations \(z'_u=\varphi(u,t)\), \(0\le \varphi\le 1\),

\[ z(0)=0,\quad z(1)=0.5 \]

and delivering a minimum to the functional

\[ I_1=-\int_0^1 u\varphi\,du. \]

Problem 2. Find a nondecreasing function \(\psi(v,t)\) and the trajectory \(z(v,t)\) satisfying the relations \(z'_v=\psi(v,t)\), \(0\le \psi\le 1\), \(z(0)=0\),

\[ z(1)=0.5 \]

and delivering a minimum to the functional

\[ I_2=\int_0^1 v\psi\,dv. \]

The solutions of Problems 1 and 2 are the functions

\[ \varphi(u,t)= \begin{cases} 0 & \text{for } u<0.5,\\ 1 & \text{for } u\ge 0.5; \end{cases} \qquad \psi(v,t)= \begin{cases} 0 & \text{for } v<0,\\ 0.5 & \text{for } 0\le v<1,\\ 1 & \text{for } v\ge 1. \end{cases} \]

Since the \(\varphi\)-measure of the point \(u=0.5\) is 1, it follows that \(u_{\mathrm{av}}(t)=0.5\), satisfying (9), is an optimal strategy for player \(u\). The optimal behavior for player \(v\) will be to choose at each moment \(t\) the values \(v=1\) and \(v=0\) with probability 0.5 (i.e., \(v\) must at each moment \(t\) on the trajectory “toss a coin”). It is readily verified that the behavior of the players thus obtained leads to a saddle point in the game (13)—(14).

Received
11 VII 1969

REFERENCES

\(^{1}\) N. Dunford, J. T. Schwartz, Linear Operators. General Theory, Moscow, 1962.
\(^{2}\) E. Kamke, The Lebesgue—Stieltjes Integral, Moscow, 1959.
\(^{3}\) E. R. Smolyakov, Dissertation, Some Variational Problems in the Dynamics of Spacecraft Flight, Institute of Applied Mathematics, Academy of Sciences of the USSR, 1968.
\(^{4}\) Advances in Game Theory, No. 52, 1964.

Submission history

UDC 518.92