UDC 62.505

F. L. CHERNOUSKO

Submitted 1969-01-01 | RussiaRxiv: ru-196901.50579 | Translated from Russian

Full Text

UDC 62.505

CYBERNETICS AND CONTROL THEORY

F. L. CHERNOUSKO

ON DIFFERENTIAL GAMES WITH DELAYED INFORMATION

(Presented by Academician A. Yu. Ishlinskii on 31 X 1968)

In the theory of differential games it is usually assumed \((^1)\) that each of the controlling sides knows exactly, at any moment of time, the phase coordinates of both players. In the present paper it is assumed that one of the sides receives information about the phase coordinates of the other side with a delay. In applied problems such a delay may be caused by the time required to obtain and process information. The paper considers one class of differential games with delayed information and shows that each game of this class is equivalent to a certain differential game without delay. This makes it possible to apply known approaches to the solution of differential games with delayed information.

1. Statement of the problem. Consider a differential game of two controlled systems, which we shall call the sides (or players) \(X\) and \(Y\). The system \(X\) is characterized by an \(n\)-dimensional vector of phase coordinates \(x = (x_1, \ldots, x_n)\) and an \(m\)-dimensional vector of control functions \(u = (u_1, \ldots, u_m)\), while the system \(Y\) is characterized by an \(s\)-dimensional vector of phase coordinates \(y = (y_1, \ldots, y_s)\) and an \(r\)-dimensional vector of control functions \(v = (v_1, \ldots, v_r)\). The dimensions \(n, m, s, r\) are arbitrary.

The motion of the systems is described by the vector differential equations

\[ dx/dt = f(x,u,t), \qquad dy/dt = g(y,v,t). \tag{1,1} \]

Here \(t\) is time, \(f = (f_1, \ldots, f_n)\) and \(g = (g_1, \ldots, g_s)\) are given vector functions. The motion is considered on the time interval \([t_0, T]\). The initial instant \(t_0\) is given, and the instant \(T\) at which the game ends is determined from the condition

\[ h(x(T), y(T), T) = 0, \tag{1,2} \]

where \(T\) is the smallest root of equation \((1,2)\) for which \(T > t_0\).

The control functions are subject to the constraints

\[ u(t) \in U, \qquad v(t) \in V \tag{1,3} \]

for all \(t \ge t_0\). Here \(U, V\) are closed sets in \(m\)-dimensional and \(r\)-dimensional spaces, respectively. The functional (payoff) is given in the form

\[ J = F(x(T), y(T), T), \tag{1,4} \]

where \(F(x,y,t)\) is a known function. The side \(X\) seeks to minimize, and \(Y\) to maximize, the functional \(J\). In particular, if the payoff is the duration of the process (in pursuit problems), then \(F = t\) or \(F = -t\), depending on whether \(X\) is the pursuing or the pursued side. In addition to the introduced conditions and constraints, which are standard in the theory of differential games, we shall adopt the following additional assumption concerning information. Suppose that at each moment \(t\) the side \(X\) learns the current value of its phase vector \(x(t)\), as well as the value

the phase vector of side \(Y\) at the instant \(t-\tau\), i.e. the vector \(y(t-\tau)\). Here the constant \(\tau>0\) characterizes the delay of information for side \(X\) and is equal to the time needed by side \(X\) to obtain and process the data of measurements or observations. We shall regard the delay as sufficiently small (less than the duration of the process), i.e. \(\tau<T-t_0\). In addition, side \(X\) is assumed to know in advance the functions \(f, g, h, F\) and the domains \(U, V\) in relations (1,1)—(1,4), i.e. the rules and the objective of the game. We shall solve the problem for side \(X\), i.e. seek the control of side \(X\). With respect to the information available to side \(Y\), no special assumptions need be made, since one should reckon with the control of side \(Y\) that is worst for \(X\).

In view of the delay of information, the initial value of the phase vector \(y(t_0)=y^0\) of side \(Y\) will become known to side \(X\) only at the instant \(t_0+\tau\). Therefore it is natural to seek the control law \(u(t)\) of side \(X\) only for instants \(t_0+\tau\leq t\leq T\), specifying the initial conditions in the form

\[ x(t_0+\tau)=x^0,\qquad y(t_0)=y^0. \tag{1,5} \]

On the interval \([t_0,t_0+\tau]\) the control of side \(X\) must be chosen from a priori considerations independent of the arrival of information, and therefore it may be regarded as known in advance.

Thus, the problem reduces to determining the optimal control law \(u\) of side \(X\) as a function of time \(t\) on the interval \([t_0+\tau,T]\) and of the current measurement data, i.e. of the vectors \(x(t)\) and \(y(t-\tau)\). Side \(Y\) is assumed to act in the manner worst for \(X\).

2. The basic equation and initial conditions. Denote by \(S(x,y,t)\) the minimal guaranteed value of the functional \(J\) from (1,4), which can be obtained under the optimal control of side \(X\) and under the control of side \(Y\) worst for \(X\), if at the instant \(t\) the values \(x=x(t)\) and \(y=y(t-\tau)\) are known. The minimal value of the functional \(J\) under the initial conditions (1,5) can be expressed through the function \(S\) in the form \(S(x^0,y^0,t_0+\tau)\).

Let us derive a differential equation for the function \(S(x,y,t)\), assuming that it has continuous partial derivatives with respect to all arguments. From the very definition of the function \(S\) it follows that it will not change along the trajectory if side \(X\) chooses the optimal control and the control of side \(Y\) is the worst for \(X\). Consequently,

\[ \min_{u\in U}\max_{v\in V}\frac{dS}{dt} \equiv \min_{u\in U}\max_{v\in V} \left[ \frac{\partial S}{\partial t} +\left(\frac{\partial S}{\partial x},\frac{dx}{dt}\right) +\left(\frac{\partial S}{\partial y},\frac{dy}{dt}\right) \right]=0, \]

\[ \frac{\partial S}{\partial x} = \left( \frac{\partial S}{\partial x_1},\ldots,\frac{\partial S}{\partial x_n} \right), \qquad \frac{\partial S}{\partial y} = \left( \frac{\partial S}{\partial y_1},\ldots,\frac{\partial S}{\partial y_s} \right). \tag{2,1} \]

Here \(dS/dt\) is the total derivative with respect to time, \(\partial S/\partial x\) and \(\partial S/\partial y\) are the gradient vectors with respect to the variables \(x\) and \(y\), respectively, and parentheses denote the scalar product of vectors. Substituting relations (1,1) into equation (2,1), taking into account that \(x\) in (2,1) refers to the instant \(t\), while \(y\) refers to the instant \(t-\tau\), we then obtain the differential equation

\[ \frac{\partial S}{\partial t} + \min_{u\in U} \left( \frac{\partial S}{\partial x}, f(x,u,t) \right) + \max_{v\in V} \left( \frac{\partial S}{\partial y}, g(y,v,t-\tau) \right) =0. \tag{2,2} \]

The initial condition for equation (2,2) will be the condition satisfied by the function \(S\) at the end of the game. Denote by \(D(y,t)\) the reachability domain for side \(Y\) over time \(\tau\) under the initial condition \(y(t)=y\). In other words, the domain \(D(y,t)\) is the set of vectors \(y(t+\tau)\) that are obtained under the condition \(y(t)=y\) and all possible admissible controls \(v\in V\) on the interval \([t,t+\tau]\), provided that the functions \(y,v\) are subject to the second equation (1,1).

Let us derive the initial condition first in the case where the time \(T\) at which the game ends is fixed and equal to \(T_0 > t_0\), i.e., the function \(h\) in (1,2) has the form \(h = T_0 - t\). Then the function \(S\) at the end of the process, calculated for the case worst for \(X\), will be determined as the result of maximizing the function \(F\) from (1,4). We obtain, evidently,

\[ S(x,y,T_0)=F_0(x,y,t_0)\equiv \max_{z\in D(y,T_0-\tau)} F(x,z,T_0). \tag{2,3} \]

Passing to the consideration of the general case, let us assume, without loss of generality, that \(h>0\) for \(t<T\) (otherwise we simply multiply the function \(h\) by \(-1\)). For simplicity let us suppose that the game has the property that it is always advantageous for side \(Y\) to finish the game as late as possible, and for side \(X\) as early as possible. Such a case occurs, for example, for pursuit games, where \(F=t\). Then the time of termination of the game worst for \(X\) will be determined from the condition

\[ h_0(x,y,t)\equiv \max_{x\in D(y,t-\tau)} h(x,z,t)=0 \quad (x=x(t),\, y=y(t-\tau)). \tag{2,4} \]

If, on the other hand, it is always advantageous for side \(Y\) to finish the game as early as possible, and for side \(X\) as late as possible (for example, this occurs when \(F=-t\)), then instead of (2,4) we shall have

\[ h_0(x,y,t)\equiv \min_{z\in D(y,t-\tau)} h(x,z,t)=0, \quad (x=x(t),\, y=y(t-\tau)). \tag{2,5} \]

In both cases the initial condition for the function \(S\) takes the form (here again the calculation is made for the case worst for \(X\))

\[ S(x,y,t)=F_0(x,y,t)\quad \text{for } h_0(x,y,t)=0, \tag{2,6} \]

where the notation

\[ F_0(x,y,t)= \max_{z\in D_1(x,y,t-\tau)} F(x,z,t). \tag{2,7} \]

has been introduced. Here \(D_1(x,y,t-\tau)\) is the set of \(s\)-dimensional vectors \(z\) satisfying the conditions \(z\in D(y,t-\tau)\) and \(h(x,z,t)=0\). It follows from relations (2,4), (2,5) that for every point of the hypersurface \(h_0(x,y,t)=0\) the set \(D_1\) will be nonempty: this set will contain the point \(z\) delivering the extremum in (2,4), (2,5). We note that for pursuit games, in which \(F=t\) or \(F=-t\), we simply have \(F_0=F\).

Thus the problem posed has been reduced to the Cauchy problem for the nonlinear first-order partial differential equation (2,2) with initial condition (2,6). The functions \(h_0\), \(F_0\) in the various cases are determined by equalities (2,3)—(2,5), (2,7). Solving this Cauchy problem in the direction of decreasing time, i.e. in the domain \(h_0(x,y,t)\geq 0\), we obtain the function \(S(x,y,t)\). It is assumed that the solution \(S(x,y,t)\) exists and has the required smoothness properties; otherwise difficulties arise that require special consideration. Simultaneously with the solution \(S(x,y,t)\), equation (2,2) yields the function \(u(x,y,t)\), which gives the solution of the synthesis problem for the optimal control of side \(X\). To determine the optimal control of side \(Y\), it is necessary to specify the nature of player \(Y\)’s information.

Equivalence of games with delay and games without information delay. Equation (2,2) and condition (2,6) coincide with the basic equation and the initial condition for a differential game without information delay \((^1)\), for which the equations of motion, constraints, game termination condition, and payoff functional have the form

\[ d\xi/dt=f(\xi,u,t),\qquad d\eta/dt=g(\eta,v,t-\tau),\qquad u(t)\in U,\quad v(t)\in V, \tag{3,1} \]

\[ h_0(\xi(T),\eta(T),T)=0,\qquad J=F_0(\xi(T),\eta(T),T). \]

Here \(\xi(t)=x(t)\) and \(\eta(t)=y(t-\tau)\) are phase coordinates; the remaining notation in (3.1) is the same as above. The initial conditions (1.5) take the form \(\xi(t_0+\tau)=x^0,\ \eta(t_0+\tau)=y^0\).

Thus, the differential game (1.1)—(1.4) with information delay is equivalent to a certain differential game (3.1) without information delay. Therefore, for solving a game with information delay one may apply all known approaches and results of the theory of differential games without delay. In particular, differential games with delay can be solved by integrating the equations of optimal trajectories (the characteristics of equation (2.2); see \({}^{1}\)). It is only necessary first to compute the functions \(h_0, F_0\), which in the cases considered above are defined by the equalities (2.3)—(2.5), (2.7).

The noted fact that games with information delay are equivalent to games without delay is valid for a somewhat broader class of problems than (1.1)—(1.4). For example, it holds in the presence of restrictions on the phase coordinates and controls of the form

\[ \{x(t),u(t)\}\in M_x(t);\quad \{y(t),v(t)\}\in M_y(t);\quad t\geq t_0, \tag{3.2} \]

where \(M_x, M_y\) are sets in \((n+m)\)-dimensional and \((s+r)\)-dimensional spaces, respectively. An analogous assertion (on the equivalence of a game with information delay to some game without delay) is also valid for certain multistep games. For this it is necessary that the phase coordinates and controls of one side not enter into the equations describing the change of the phase state, and into the restrictions, for the other side (see (1.1), (1.3), (3.2)).

Institute for Problems in Mechanics
Academy of Sciences of the USSR
Moscow

Received
29 X 1968

REFERENCES

\({}^{1}\) R. Isaacs, Differential Games, Moscow, 1967.

Submission history

[v1] 1969-01-01

Full Text

CYBERNETICS AND CONTROL THEORY

ON DIFFERENTIAL GAMES WITH DELAYED INFORMATION

REFERENCES

Submission history

Access Paper

Citation

Share

Related Papers

Feedback

UDC 62.505