UDC 518-9
MATHEMATICS
Submitted 1970-01-01 | RussiaRxiv: ru-197001.82500 | Translated from Russian

Full Text

UDC 518-9

MATHEMATICS

L. A. PETROSYAN

DIFFERENTIAL GAMES WITH INCOMPLETE INFORMATION

(Presented by Academician L. S. Pontryagin on 27 IV 1970)

1. Informal description. Consider a differential game with prescribed duration \(T\) and equations of motion \(\dot{x}=f(x,u,v)\), where \(x \in R^n\), \(u \in U \subset R^e\), \(v \in V \subset R^k\). The control parameter \(u\) is at the disposal of player \(P\) (the minimizing player), and the control parameter \(v\) is at the disposal of player \(E\) (the maximizing player). The motion starts from an initial state \(x_0\), which is assumed to be known to both players. A certain number \(l>0\) is fixed, which we shall call the information lag. On the time interval \([0,l]\), player \(P\) knows only the initial state \(x_0\) and his own choices \(u(\tau)\); thereafter, for \(l \le t \le T\), at each instant of time player \(P\) knows the state of the process at time \(t-l\), \(x(t-l)\), and his own choices \(u(\tau)\), \(\tau \le t\). Player \(E\), at each instant of time \(t\), knows the state of the process at that instant, \(x(t)\). Let \(x(t)\) be some trajectory realized in the course of the game. The payoff of player \(P\) is defined as a certain functional of \(x(t)\), \(F(x_0;x(t))\). The payoff of \(E\) is equal to \(-F\).

2. Discrete model. Let a partition of the interval \([0,T]\) with step \(\delta\) be given. Let \(t_k\) be the partition instants, where \(t_k-t_{k-1}=\delta\). Replace the equation of motion for the continuous case by the difference equation

\[ x_{k+1}=x_k+\delta f(x_k,u_k,v_k). \tag{1} \]

For simplicity we shall assume that \(l\) is a multiple of \(\delta\). The game proceeds as follows. Player \(E\) at each step \(k\) chooses a control \(v_k\), knowing \(x_0,\ldots,x_{k-1}\). Player \(P\), for \(0 \le k \le l/\delta\), chooses at each step his control \(u_k\), knowing the state \(x_0\) and his own choices at previous steps \(i<k\); at subsequent steps \(P\) chooses his control knowing the state \(x_{k-l}\), his previous choices, and the prehistory \(x_0,\ldots,x_{k-1}\). The game ends at time \(T\). Player \(P\) receives a payoff equal to \(F(x_0,x_1,\ldots,x_{T/\delta})\). \(E\) receives \(-F\).

Let us define strategies in the game. By a strategy of \(P\) we shall mean a rule that assigns to each information set of the player some control \(u \in U\), i.e. the strategies of player \(P\) are all possible measurable functions \(u(x_0,\ldots,x_{k-l-1},u_0,\ldots,u_{k-1})\) with values in \(U\). Similarly, a strategy of \(E\) assigns to each information set of player \(E\) some control \(v \in V\), i.e. the strategies of \(E\) are all possible measurable functions \(v(x_0,\ldots,x_k)\) with values in \(V\). For simplicity, the players’ strategies will be denoted by \(u(I_k^P)\) and, respectively, \(v(I_k^E)\), understanding by \(I_k^P\) the current information available to player \(P\), and by \(I_k^E\) that available to player \(E\).

It is shown in the usual way that any initial condition \(x_0\) and fixed pair of strategies \(u(I)\), \(v(I)\) uniquely determine a “game” — the sequence \(x_0,\ldots,x_{T/\delta}\), and consequently also the payoff \(K(x_0;u(I),v(I))=F(x_0,\ldots,x_{T/\delta})\). Denote the set of all strategies of player \(P\) by \(A\), and the set of all strategies of player \(E\) by \(B\).

Definition. The set of plays \(x_0,\ldots,x_k\) that are obtained for fixed \(x_0,\ldots,x_{k-l}\) and controls \(u_0,\ldots,u_{k-1}\) for all possible choices of controls of player \(E\) at the steps \(k-l\leq i<k\) is called an information set of player \(P\) of the \(k\)-th level. It is obvious that information sets of the same level do not intersect. An information set of level \(k\) of player \(E\) coincides with some sequence \(x_0,\ldots,x_k\) (is a singleton).

We shall denote the class of all information sets of level \(k\) for player \(P\) by \(J_k^P\); similarly, we shall denote the class of all information sets of player \(E\) of level \(k\) by \(J_k^E\). We shall regard all \(J_k^E\) and all \(J_k^P\) as standard measurable spaces. (A space is called standard if it is finite or countable with a discrete structure, or if it is isomorphic to the unit interval.)

By a solution of an antagonistic game it is natural to understand an equilibrium situation, i.e., such a pair of strategies that for all \(u(I)\in A\), \(v(I)\in B\) one has

\[ K(x_0;u(I),v^*(I))\geq K(x_0;u^*(I),v^*(I))\geq K(x_0;u^*(I),v(I)). \tag{2} \]

However, it is well known \((^1)\) that in games with incomplete information such a situation exists only in exceptional cases, and for our problem the concept of solution (2) proves unsuitable.

In what follows we shall do the same as von Neumann did in the solution of matrix games: we shall extend the concept of strategies by including in it the possibility of random choice.

3. Mixed strategies and behavior strategies. We shall assume that the sets \(U\) and \(V\) are copies of the unit interval. This means that they are isomorphic to the unit interval, i.e., there exists a one-to-one isomorphic correspondence in both directions. One may also assume that the sets \(U,V\) change from step to step, and write \(U_k,V_k\).

Denote by \(\Omega\) the space with measure that is obtained from the unit interval if the Lebesgue measure is introduced on it. All our sample spaces will be isomorphic to \(\Omega\). We may further assume that all \(J_k^P,J_k^E\) are copies of the unit interval, since if any of them is finite or countable, one can always add to it a continuum of identical elements. Elements of the Cartesian products \(xJ_k^P\) and \(J_k^E\) will be denoted by

\[ I^{P(E)}=(I_1^{P(E)},\ldots,I_T^{P(E)}). \]

Definition. A mixed strategy of \(P\) is a sequence \(m=(m_1,\ldots,m_{T/\delta})\) of measurable mappings \(m_i:\Omega\times J_i\to U\), where \(\Omega\) is the fixed sample space. A behavior strategy is such a mixed strategy \(b\) that for \(i\neq k\), \(b(\cdot,I_i^P)\) and \(b(\cdot,I_k^P)\) are independent random variables (\(I_i^P\in J_i^P\) and \(I_k^P\in J_k^P\) are arbitrary). The definition for \(E\) is analogous.

Each triple \((\omega;m,v(I))\), consisting of an element of the sample space, a mixed strategy, and an opponent’s strategy, uniquely determines an element \(u(\omega;m,v(I))\) from \(U\); the sequence \(u=(u_1,\ldots,u_{T/\delta})\) is determined recursively by the equality \(u_i=m_i(\omega,I_i^P)\), where \(I_i^P\in J_i^P\). Roughly speaking, this is precisely the sequence of controls actually chosen in the course of the game. In addition, each pair \((m,v(I))\) uniquely determines a measure \(\mu\) in \(xU_i\); for any measurable set \(\overline{B}\in xU_i\) it is determined by the equality

\[ \mu(\overline{B})=\mu(\overline{B};m,v(I))=\lambda(\omega;u(\omega;m,v(I))\in \overline{B}) \]

(here \(\lambda\) is the measure on \(\Omega\)). Thus, \(\mu\) is the distribution of the random variable \(u(\cdot;m,v(I))\). Two mixed strategies are called equivalent—

equivalent if for each \(v(I)\in B\) they determine the same distribution in \(xU_i\).

The following theorem is an analogue of Kuhn’s theorem \((^1)\) for a differential game with an information lag.

Theorem. In a discrete game with an information lag, every mixed strategy has an equivalent behavioral strategy.

4. Equilibrium situation in behavioral strategies.
The preceding theorem makes it possible to seek an equilibrium situation in the class of behavioral strategies. It follows from it that any equilibrium situation in the class of behavioral strategies is also an equilibrium situation in mixed strategies.

Theorem. Suppose that the following conditions are satisfied: a) the payoff function is a continuous function of the play; b) each of the information sets is a compact set; c) the class of all information sets of each of the players of one level is compact in the Hausdorff metric. If conditions a), b), c) are satisfied, then in a discrete game with an information lag there exists an equilibrium situation in behavioral strategies.

In the proof of the theorem, auxiliary subgames are introduced by analogy with the finite case (see \((^2)\)). By induction on the length of the subgame it is shown that each of the subgames possesses an equilibrium situation in behavioral strategies. The transition from a subgame to the main game is carried out in the same way as in \((^2)\). To derive the functional equation for the values of the subgames, let us define a subgame.

Consider some information set of player \(P\) of level \(k\). Fix this state of information \(I_k^P\). Suppose that player \(P\) also knows the probability distribution on the choices of player \(E\), \(v_{k-l},\ldots,v_{k-1}\), which we denote by \(p_k(\cdot)\). The subgame proceeds as follows. The moves \(v_{k-l},\ldots,v_{k-1}\) are randomized according to the distribution \(p_k(\cdot)\) and are reported to player \(E\), but not to \(P\). After this, \(E\) chooses \(v_k\), and the choice \(v_{k-l}\) is reported to player \(P\). The choice \(u_k\) is reported to both players after it has been made, but, in accordance with the information conditions, the choice \(v_k\) is kept secret from player \(E\) until the latter makes the choice \(u_{k+(l-1)}\). After this the moves \(u_{k+1}\) and \(v_{k+l}\), respectively, are made, and the choice \(v_{k+1}\) is announced to player \(P\). This sequence continues until all random moves have been announced, after which the game continues using the scheme of the state of information adopted for the original game. From the construction of \(p_k(\cdot)\) it is clear that \(p_k(\cdot)\) is simply a probability distribution on \(I_k^P\). The payoff function will be the payoff function in the original game. We shall denote the subgame by
\[ \Gamma_k=\Gamma_k[I_k^P;\,p_k(\cdot)]. \]
Let \(V[I_k^P,p_k(\cdot)]\) be the value of the subgame \(\Gamma_k\); then it can be shown that the function \(V[I_k^P,p_k(\cdot)]\) is continuous in \(I_k^P,p_k(\cdot)\) (closeness between information sets of one level is understood in the Hausdorff metric), and \(V[I_k^P,p_k(\cdot)]\) also satisfies the following functional equation:
\[ V[I_k^P;\,p_k(\cdot)] = \max_{F(v_k\mid v_{k-l},\ldots,v_{k-1})} \min_{\hat u_k} \int V[I_{k+1}^P,p_{k+1}(\cdot\mid \hat v_{k-l})]\,dF_{v_{k-l}}. \tag{3} \]

Here \(F(v_k\mid v_{k-l},\ldots,v_{k-1})\) is the behavioral strategy of player \(E\) at the \(k\)-th step, and \(p_{k+1}(\cdot,\hat v_{k-l})\) is the distribution on \(I_{k+1}^P\) for fixed \(\hat v_{k-l}\) (the choice \(v_{k-l}\) became known to \(P\) after the \(k\)-th step), which is induced by the distribution \(p_k(\cdot)\) and the behavioral strategy \(F(v_k\mid v_{k-l},\ldots,v_{k-1})\). The functional equation (3) can be used to find an optimal behavioral strategy of player \(E\).

Let us note that up to now we have considered discrete games.

Definition. Suppose there exists a limit of the values of discrete games with an information lag under arbitrary refinement of the partition

the time interval \(\delta\) tends to zero, then the limiting value will be called the generalized value of the differential game with an information lag.

  1. Example. Consider a game with a lag (the lag \(l\) has \(P\)). The motions of the players are independent.

\[ \dot{x}=u,\quad \dot{y}=v,\quad |u|\leq 1,\quad |v|\leq \lambda,\quad \lambda<1, \]

\(P\) controls the parameter \(u\), \(E\) controls the parameter \(v\). The game has prescribed duration \(T\), and the payoff is equal to

\[ K(x_0,y_0;u(I),v(I))=-\rho(x(T),y(T)). \]

The optimal strategy \(u^*\) of player \(P\) is pure. At each instant of time \(t\), the strategy \(u^*\) chooses the direction of motion toward the point \(y(t-l)\) for \(t>l\), and toward the point \(y_0\) for \(0\leq t\leq l\). If there exists an instant \(t'<T\) such that \(x(t')=y(t'-l)\), then on the interval \(t'\leq t\leq T\), \(P\) strives to maintain the equality \(x(t)=y(t-l)\).

The optimal strategy \(v^*\) of player \(E\). On the time interval \(0\leq t\leq T-l\), the strategy \(v^*\) chooses the direction away from the point \(x(t)\). At the instant \(T-l\), \(E\), with probabilities \((1/2,\,1/2)\), chooses one of the two directions perpendicular to the segment \([x(T-l),\,y(T-l)]\), if \(x(T-l)\ne y(T-l)\), and any two opposite directions if \(x(T-l)=y(T-l)\). Then, on the interval \(T-l\leq t\leq T\), he adheres to the direction chosen at time \(T-l\) by means of a random mechanism \((1/2,\,1/2)\).

If, in the equilibrium situation, there exists an instant \(t^1\) such that \(x(t^1)=y(t^1-l)\) and \(t^1<T\), then the value of the game does not depend on \(T\) and is equal to \(\lambda l\).

Leningrad State University
named after A. A. Zhdanov

Received
15 IV 1970

REFERENCES

  1. H. W. Kuhn, Positional games and the problem of information, Positional Games, “Nauka,” 1967.
  2. H. E. Scarf, L. S. Shapley, Games with incomplete information, Applications of Game Theory in Military Affairs, Moscow, 1961.

Submission history

UDC 518-9