Abstract
Full Text
UDC 519.3+62.50
MATHEMATICS
V. A. YAKUBOVICH
ON THE SYNTHESIS OF OPTIMAL CONTROLS IN A LINEAR DIFFERENTIAL GAME WITH A QUADRATIC PAYOFF FUNCTIONAL
(Presented by Academician L. S. Pontryagin on 20 IV 1970)
1°. By \(R_k\) below we denote \(k\)-dimensional Euclidean space. Suppose that the states \(x\) and \(y\) and the controls \(u_1\) and \(u_2\) of two players at a fixed moment of time are described by vectors \(x \in R_{n_1}\), \(u_1 \in R_{m_1}\) for the first player and by vectors \(y \in R_{n_2}\), \(u_2 \in R_{m_2}\) for the second player, and that the equation for the change over time of the players’ states has the form
\[ \frac{dz}{dt}=Az+bu,\quad \text{where } z=\left\|\begin{matrix}x\\ y\end{matrix}\right\|,\quad u=\left\|\begin{matrix}u_1\\ u_2\end{matrix}\right\|. \tag{1} \]
Here \(A, b\) are constant real matrices of orders \(n \times n\) and \(n \times m\), where \(n=n_1+n_2\), \(m=m_1+m_2\). Below it is assumed that system (1) is controllable, i.e., that the rank of the \(n \times mn\)-matrix \(\|b, Ab, \ldots, A^{n-1}b\|\) is equal to \(n\). Let \(z_0 \in R_n\) be a given vector. We shall call the functions \(u_1=u_1(z,t)\), \(u_2=u_2(z,t)\) admissible controls if on \((0,\infty)\) there exists an absolutely continuous function \(z(t)\) (called the corresponding solution), satisfying equation (1) almost everywhere, in which \(u_j=u_j[z(t),t]\), and such that (I) \(z(0)=z_0\), (II) \(|z(t)|\in L_2(0,\infty)\), (III) \(|u[z(t),t]|\in L_2(0,\infty)\).
Let \(\mathcal F(z,u)\) be a real quadratic form of its arguments, with \(\mathcal F(0,u)=u_1^*\gamma_1u_1-u_2^*\gamma_2u_2\), where \(\gamma_1>0\), \(\gamma_2>0\) are matrices of orders \(n_1\times n_1\) and \(n_2\times n_2\).* Suppose that the first player seeks to minimize, and the second player to maximize, the functional
\[ J(u_1,u_2)=\int_0^\infty F\{z(t),u[z(t),t]\}\,dt, \tag{2} \]
where \(z\) is the corresponding solution. (For admissible \(u_1,u_2\), the integral (2) obviously converges.) The admissible controls \(u_1^0,u_2^0\) are called optimal if, for any controls \(u_1,u_2\) such that the pairs \((u_1^0,u_2)\), \((u_1,u_2^0)\) are admissible, one has*
\[ J(u_1^0,u_2)\leq J(u_1^0,u_2^0)\leq J(u_1,u_2^0). \]
* Here and below, the notation \(C>0\) (\(C\geq 0\)), where \(C\) is a matrix of order \(k\times k\), means that for any vector \(w\ne0\) of order \(k\) one has \(w^*Cw>0\) (\(w^*Cw\geq0\)). The asterisk denotes Hermitian conjugation (in particular, transposition in the case of real vectors and matrices and complex conjugation in the case of numbers). By \(I_k\) below we denote the identity \(k\times k\) matrix.
** Practically of interest is the case in which \(\mathcal F(z,u)=(x-y)^*G(x-y)+u_1^*\gamma_1u_1-u_2^*\gamma_2u_2\), where \(G\geq0\), \(\gamma_1>0\), \(\gamma_2>0\). In this case, figuratively speaking but imprecisely, the first player seeks to minimize and the second to maximize the quadratic deviation between them, while at the same time both players seek to minimize the costs of control.
*** We emphasize that, by virtue of the definitions introduced, the values of the control \(u_1^0\) at a fixed moment of time in the pairs \((u_1^0,u_2)\) and \((u_1^0,u_2^0)\) are, generally speaking, different. (An analogous assertion is of course also valid for \(u_2^0\).) Therefore the problem under consideration, unlike analogous problems of optimal control, is not equivalent to a problem in which the admissible and optimal controls are functions of time alone.
Below a sufficient condition is established for the existence of optimal controls \(u_1^0, u_2^0\), and an algorithm for constructing \(u_1^0, u_2^0\) is given. (The functions \(u_1^0, u_2^0\) turn out to be linear functions of \(z\).)
Let \(\lambda\) be a complex variable, \(\omega\) a real variable, and let \(B(\lambda)\) be some matrix or scalar polynomial
\(B(\lambda)=B_0\lambda^N+\cdots+B_N\). By \(B(\lambda)^\nabla\) we shall denote the polynomial
\(B(\lambda)^\nabla=[B(-\lambda^*)]^*=B_0^*(-\lambda)^N+\cdots+B_N^*\).
Let \(\alpha(\lambda)\), \(\beta(\lambda)\) be scalar polynomials. By \(\operatorname{rem}(\beta/\alpha)\) below is denoted the remainder obtained by dividing \(\beta(\lambda)\) by \(\alpha(\lambda)\). If \(B(\lambda)=\|\beta_{jk}(\lambda)\|\) is a matrix polynomial, then
\(\operatorname{rem}(B/\alpha)=\|\operatorname{rem}(\beta_{jk}/\alpha)\|\).
By \(\langle \alpha,\beta\rangle\) is denoted the greatest common divisor of the polynomials \(\alpha(\lambda)\) and \(\beta(\lambda)\), and by \(\langle \alpha,B\rangle\) the greatest common divisor of the polynomials \(\alpha(\lambda)\) and all \(\beta_{jk}(\lambda)\).
Extend, preserving Hermiticity, the form \(\mathfrak F\) to complex values of the arguments \(x,u\), and introduce the notation
\[ A_\lambda=\lambda I_n-A,\qquad \delta(\lambda)=\det A_\lambda,\qquad Q(\lambda)=\delta(\lambda)A_\lambda^{-1},\qquad \Gamma=\left\|\begin{array}{cc} \gamma_1 & 0\\ 0 & -\gamma_2 \end{array}\right\|, \]
\[ \mathfrak F(A_{i\omega}^{-1}bu,u)=u^*\Pi(i\omega)u,\qquad \Phi(i\omega)=|\delta(i\omega)|^2\Pi(i\omega). \tag{3} \]
Here \(\Pi(i\omega)=\Pi(i\omega)^*\) is the matrix of order \(m\) of the Hermitian form
\(\mathfrak F(A_{i\omega}^{-1}bu,u)\). In [1] the following propositions are established:
Lemma. \(\Phi(\lambda)\) is a polynomial with leading term \((-1)^n\lambda^{2n}\Gamma\),
\[
\det\Phi=\delta^{m-1}(\delta^\nabla)^{m-1}\det\Gamma\,\varphi,\qquad
\Phi^{-1}=\Omega/(\delta\delta^\nabla\varphi),
\]
where \(\varphi\) and \(\Omega\) are scalar and matrix polynomials,
\[
\varphi=\varphi^\nabla,\qquad \Omega=\Omega^\nabla,\qquad
\varphi(\lambda)=\lambda^{2n}(-1)^n[1+O(\lambda^{-1})],
\qquad \lambda\to\infty .
\]
Theorem 1. Suppose that
a) \(\langle\delta,\delta^\nabla\rangle=1\), \(\langle\varphi,\delta\delta^\nabla\rangle=1\), \(\langle\varphi,\Omega\rangle=1\);
b) \(\varphi(i\omega)\ne0\), \(\forall \omega\), and, consequently, \(\varphi(\lambda)\) admits the factorization
\[
\varphi(\lambda)=\psi(\lambda)\cdot\psi(\lambda)^\nabla,
\]
where \(\psi(\lambda)\) is a Hurwitz polynomial*;
c) there exists an \(n\times m\) matrix \(h\) satisfying the identity
\[
h^*q(\lambda)=\delta(\lambda)\Omega_0(\lambda),\qquad \forall\lambda,\quad
\text{where }\quad
q(\lambda)=\operatorname{rem}\left(\frac{Qb\Omega}{\psi\delta}\right),\qquad
\Omega_0=\operatorname{rem}\left(\frac{\Omega}{\psi}\right),
\tag{4}
\]
(which, after equating the coefficients at identical powers of \(\lambda\), becomes a system of linear equations with respect to the elements of the matrix \(h\)), or, equivalently to (4), such that the matrix
\[
(I_m-h^*A_\lambda^{-1}b)\times \frac{\Omega(\lambda)}{\psi(\lambda)}
\]
is a polynomial. Then for each \(z(0)=z_0\) optimal controls \(u_1^0,u_2^0\) exist and have the form
\[
u_1^0=h_1^*z,\qquad u_2^0=h_2^*z,
\]
where \(h_1,h_2\) are constant matrices of orders \(n\times m_1\), \(n\times m_2\), composed of the first \(m_1\) and the last \(m_2\) columns of the matrix \(h\):
\[
h=\|h_1,h_2\|.
\]
The matrix \(K=A+bh\) of the “synthesized” system is Hurwitz, and
\[
\det(\lambda I_n-K)=\psi(\lambda).
\]
The payoff functional for the optimal controls has the form
\[
J(u_1^0,u_2^0)=-z_0^*Hz_0,
\]
where \(H=H^*\). The matrix \(H\) is determined from the equation
\[
A^*H+HA=F_0-h\Gamma h^*,
\]
where \(F_0=F_0^*\) is the matrix of the form \(\mathfrak F(x,0)\).
Remarks. 1. It can be shown that the matrix polynomial \(Qb\Omega\) is divisible by \(\delta\), i.e., that \(q(\lambda)=\delta(\lambda)q_0(\lambda)\), where \(q_0(\lambda)\) is a polynomial. Therefore the identity (4) is transformed into
\[
h^*q_0(\lambda)=\Omega_0(\lambda).
\]
- Making in system (1) the substitution \(u=v+a^*x\), where \(a\) is some constant \(n\times m\) matrix, we obtain a new system
\[ dx/dt=A_1x+bv, \]
where \(A_1=A+ba^*\), and a new form
\[ \mathfrak F_1(x,v)=\mathfrak F(x,u). \]
It can be shown that, for any choice of the matrix \(a\), the new polynomial \(\varphi(\lambda)\) coincides with the old one, and also that, for a suitable choice of the matrix \(a\), the conditions
\[ \langle\delta,\delta^\nabla\rangle=1,\qquad \langle\varphi,\delta\delta^\nabla\rangle=1,\qquad \langle\varphi,\Omega\rangle=1 \]
will be satisfied for the new system. It can also be shown that, when the condition \(\varphi(i\omega)\ne0\) is fulfilled, for the new system there will be found a matrix
* The latter means that the roots of the polynomial \(\psi(\lambda)\) are located in the open left half-plane. The matrix \(K\) below is called Hurwitz if \(\det(\lambda I_n-K)\) is a Hurwitz polynomial.
\(h\), satisfying identity (4)*. Thus, a sufficient condition for the existence of optimal controls is the single condition \(\varphi(i\omega)\ne 0,\ \forall\omega\). The optimal controls \(u_1^0, u_2^0\) still have the form \(u_1^0=h_1^*z,\ u_2^0=h_2^*z\). The matrices \(h_1, h_2\) are found by means of the indicated substitution and by applying the procedure indicated in Theorem 1. This procedure reduces to determining the Hurwitz polynomial \(\psi(\lambda)\) from the factorization equation \(\varphi=\psi\psi^\vee\) (which is found, evidently, in a unique way) and to solving a system of linear equations obtained from identity (4) or from the identity \(h^*q_0(\lambda)=\Omega_0(\lambda),\ \forall\lambda\).
- Let \(m_1=m_2=1\) and let \(\varphi(i\omega)\) change sign. It can be shown that among the admissible controls \(u_1=h_1^*z,\ u_2=h_2^*z\) with constant matrices \(h_1,h_2\) there are no optimal ones. Moreover, for any admissible \(u_1^0=h_1^*z,\ u_2^0=h_2^*z\) there exists either a sequence \(u_1^{(k)}\) such that the pairs \((u_1^{(k)},u_2^0)\) are admissible and \(J(u_1^{(k)},u_2^0)\to-\infty\), or a sequence \(u_2^{(k)}\) such that the pairs \((u_1^0,u_2^{(k)})\) are admissible and \(J(u_1^0,u_2^{(k)})\to+\infty\), and this is true for any \(z_0\).
2°. The proof of Theorem 1 is based on the following algebraic proposition. Consider system (1) with complex, generally speaking, \(A\) and \(b\), and the Hermitian form \(\mathfrak F(z,u)\). System (1) is still assumed controllable. Let \(\mathfrak F(0,u)=u^*\Gamma u\), where \(\Gamma=\Gamma^*,\ \det\Gamma\ne0\). Introduce the notation (3) (excluding the notation for \(\Gamma\)). From (′) it follows that the lemma is valid also for this more general case of complex coefficients.
Theorem 2. I. Suppose that there exist an \(n\times n\) matrix \(H=H^*\) and an \(n\times m\) matrix \(h\) such that the representation
\[ \mathfrak F(z,u)=d(z^*Hz)/dt+(u-h^*z)^*\Gamma(u-h^*z),\qquad \forall z,u, \tag{5} \]
is valid, where the derivative is taken along system (1), i.e. \(d(z^*Hz)/dt=2\operatorname{Re}[z^*H(Az+bu)]\), and that, moreover, the matrix \(K=A+bh^*\) is Hurwitz. Then: a) the polynomial \(\varphi(\lambda)\) admits the factorization \(\varphi(\lambda)=\psi(\lambda)\psi(\lambda)^\vee\), where \(\psi(\lambda)=\det(\lambda I_n-K)\), and, consequently, \(\varphi(i\omega)\ne0,\ \forall\omega\); b) (4) is satisfied, where \(\psi(\lambda)=\det(\lambda I_n-K)\).
II. Let the conditions a), b), c) of Theorem 1 be satisfied and, in particular, let \(\psi(\lambda)\) and \(h\) be determined according to b) and c). Let \(F_0=F_0^*\) be the matrix of the Hermitian form \(\mathfrak F(z,0)\), and let \(H=H^*\) be the matrix determined from the linear system \(A^*H+HA=F_0-h\Gamma h^*\). Then identity (5) is satisfied and \(\psi(\lambda)=\det(\lambda I_n-K)\), where \(K=A+bh^*\).
Proof of Theorem 2. All assertions of Theorem 2 follow from (′), with the exception of the relation \(\psi(\lambda)=\det(\lambda I_n-K)\) in part II. Put \(\Psi=(I_m-h^*A_\lambda^{-1}b)\delta=\delta I_m-h^*Qb,\ \det(\lambda I_n-K)=\chi(\lambda)\), and show that \(\chi(\lambda)=\psi(\lambda)\). By Lemma 3 (′), any minor of order \(m-1\) of the matrix polynomial \(\Psi\) is divisible by \(\delta^{m-2}\). Consequently, \(\Psi^{-1}\det\Psi=\delta^{m-2}Z\), where \(Z\) is a matrix polynomial. By the corollary to Lemma 2 (′),
\[ \chi(\lambda)=\delta(\lambda)\det(I_m-h^*A_\lambda^{-1}b)=\det(\Psi/\delta),\qquad \det\Psi=\delta^{m-1}\chi. \]
Therefore \(\Psi^{-1}=Z/\delta\chi\). By the condition \((I_m-h^*A_\lambda^{-1}b)\Omega/\psi=\Psi\Omega/\delta\psi=Z_0\) is a polynomial. Consequently, \(\Omega=\Psi^{-1}Z_0\delta\psi=ZZ_0\psi/\chi\). Since \(\langle\varphi,\Omega\rangle=1\), we have \(\langle\psi,\Omega\rangle=1\), and hence \(\chi\) is divisible by \(\psi\). Since \(\varphi=\psi\psi^\vee,\ \varphi=(-1)^n\times \lambda^{2n}[1+O(\lambda^{-1})]\), it follows that \(\psi=\lambda^n[1+O(\lambda^{-1})]\), \(\lambda\to\infty\). From the coincidence of the leading terms of \(\chi\) and \(\psi\) it follows that \(\chi\equiv\psi\).
3°. Proof of Theorem 1. From Theorem 2, II, it follows that representation (5) is valid. Let \(h=\|h_1,h_2\|\), where \(h_j\) are matrices of orders \(b\times m_j\). Put \(u_1^0=h_1^*z,\ u_2^0=h_2^*z\). The controls \(u_1^0,u_2^0\) are admissible, since \(K\) is a Hurwitz matrix. Let \(u_1=u_1(z,t), u_2=u_2(z,t)\)—
* The proof of this proposition known to the author is very complicated; previously, the conditions for representing a matrix polynomial \(B(\lambda)=B(\lambda)^\vee\) in the form \(B(\lambda)=X(\lambda)^\vee C X(\lambda)\), where \(C=C^*=\mathrm{const}\), and \(X(\lambda)\) is a matrix polynomial, are clarified.
admissible controls. Substituting these values into (5), using the special form of the matrix \(\Gamma\) (see (3)), and integrating from \(t=0\) to \(t=\infty\), we obtain
\[
J(u_1,u_2)=-z_0^{*}Hz_0+
\int_0^\infty (u_1-h_1^{*}z)^{*}\gamma_1(u_1-h_1^{*}z)\,dt-
\]
\[
-\int_0^\infty (u_2^{*}-h_2^{*}z)^{*}\gamma_2(u_2-h_2^{*}z)\,dt.
\tag{6}
\]
From (6) and the assumption \(\gamma_1>0,\ \gamma_2>0\), it follows at once that
\[
J(u_1^0,u_2)\leq J(u_1^0,u_2^0)\leq J(u_1,u_2^0)
\]
for any \(u_1,u_2\) such that the pairs \((u_1^0,u_2)\), \((u_1,u_2^0)\) are admissible. The rule for determining
\[
J(u_1^0,u_2^0)=-z_0^{*}Hz
\]
follows from Theorem 2, II.
Leningrad State University
named after A. A. Zhdanov
Received
9 IV 1970
REFERENCES
\(^1\) V. A. Yakubovich, DAN, 193, No. 1 (1970).