Yu. P. KRIVENKOV
Unknown
Submitted 1964-01-01 | RussiaRxiv: ru-196401.30399 | Translated from Russian

Full Text

Yu. P. KRIVENKOV

SUFFICIENCY OF THE MAXIMUM PRINCIPLE FOR A LINEAR PROBLEM OF DYNAMIC PROGRAMMING

(Presented by Academician L. S. Pontryagin on 23 I 1964)

One of the branches of dynamic programming, namely the “bottleneck problem” \((^{1,2})\), reduces to problems for which the maximum principle of L. S. Pontryagin \((^{3,4})\) is valid. In the case under consideration, this principle must be applied in the presence of the following complicating circumstances: a) the control region is not constant, b) the domain of variation of the phase coordinates has a fixed boundary.

Taking into account that in \((^{3,4})\) only necessary conditions for optimality were studied, while in \((^5)\) a sufficient condition was proved without taking into account the circumstances indicated above, in the present paper we formulate and prove sufficient conditions for optimality for the cases noted.

1. Statement of the problem. Consider the system:

\[ dx/dt=f(x,u), \tag{1} \]

in which \(f(x,u)=Ax+Bu+A_0,\ x(t)=(x_1,x_2,\ldots,x_n)\) is the vector of phase coordinates; \(u(t)=(u_1,\ldots,u_r)\) is the control vector; \(A,\ B,\) and \(A_0\) are matrices and a vector of the corresponding orders.

As the class of admissible controls \(U\) we take the class of piecewise-continuous vector functions \(u(t)\), defined on the interval \(T_0\leq t\leq T\), whose values belong to the \(r\)-dimensional closed convex polyhedron \(U(x)\), determined by the inequalities:

\[ S(x,u)\leq 0, \tag{2} \]

\[ O(u)\leq 0, \tag{3} \]

in which \(S(x,u)=(s_1,\ldots,s_p)=Pu-Qx-R;\ O(u)=(o_1,\ldots,o_q)=Mu+N\), where \(P,\ Q,\ M,\ R,\ N\) are matrices and vectors of the corresponding orders. Moreover, the values of the vectors \(x(t)\) and \(u(t)\) are related by equation (1) on the whole interval \(T_0\leq t\leq T\). The domain of variation of the phase coordinates, i.e., of the coordinates of the vector \(x\), determined by the system of inequalities (2), will be denoted by \(X(u)\).

From the condition that the vector function \(x(t)\) at the moment \(T_0\) lies on the linear manifold \(L_1\), i.e.

\[ x=x_1+\lambda_1^1 e_1^1+\ldots+\lambda_{k_1}^1 e_{k_1}^1, \tag{4} \]

and at the moment \(T\) lies on the linear manifold \(L_2\), i.e.

\[ x=x_2+\lambda_1^2 e_1^2+\ldots+\lambda_{k_2}^2 e_{k_2}^2, \tag{5} \]

where \(e_i^1\) and \(e_j^2\) are systems of linearly independent vectors \((0\leq i\leq k_1\leq n;\ 0\leq j\leq k_2\leq n)\), we shall seek an admissible control \(u(t)\) such that the functional

\[ I=\int_{T_0}^{T}(F(t),u(t))\,dt \tag{6} \]

(\(F(t)\) is a given \(r\)-dimensional vector function defined on \([T_0,T]\)) assumes its maximum value.

2. Some auxiliary constructions. Add to the vector function \(x(t)\) the coordinate \(x_0(t)\), defined by the equation: \(dx_0/dt=-(F(t),u(t))=f_0(u,t)\), and consider the system

\[ d\bar{x}/dt=\bar{f}(\bar{x},u,t), \tag{7} \]

in which \(\bar{x}(t)=(x_0,x_1,\ldots,x_n)\), \(\bar{f}(x,u,t)=(f_0,f_1,\ldots,f_n)=\bar{A}\bar{x}+\bar{B}u+\bar{A}_0\), and \(\bar{A}, \bar{B}, \bar{A}_0\) are the new values of \(A, B, A_0\) extended in the corresponding manner.

Introduce a certain nonzero continuous piecewise-smooth vector function \(\psi(t)=(\psi_0,\psi_1,\ldots,\psi_n)\), defined on the interval \([T_0,T]\), with the aid of which we form the expression

\[ H(\bar{x},\psi,u)=(\psi,\bar{f})=(\psi,\bar{A}\bar{x}+\bar{A}_0)+(\psi,\bar{B}u) =(\psi,\bar{A}\bar{x}+\bar{A}_0)+(\varphi,u), \tag{8} \]

where the continuous piecewise-smooth, on the interval \([T_0,T]\), \(r\)-dimensional vector function \(\varphi(t)\) is defined by the expression \(\varphi(t)=B^*\psi(t)\), in which \(B^*\) is the matrix transposed to \(\bar{B}\).

In what follows we shall call an \(n\)-dimensional vector \(x\) admissible if there exists for it an \(r\)-dimensional vector \(u\in U(x)\).

The theory of linear programming (see \({}^{6}\)) gives that, for any admissible vector \(x^0\), each value of the vector function \(\psi(t)\) at some time \(t\in[T_0,T]\) determines such a value for the vector \(u=u_m\), subject to conditions (2) and (3), at which the quantity \(\sum_{i=1}^{n}\psi_i f_i\) attains a maximum.

Moreover, the maximum \(\sum_{i=1}^{n}\psi_i f_i\) is attained at a vertex of the polyhedron \(U(x^0)\).

The latter means that \(p_1+q_1\) of the expressions \(S_{i_1},\ldots,S_{i_{p_1}}\) and \(O_{j_1},\ldots,O_{j_{q_1}}\), where \(p_1+q_1\ge r\), become equalities for the value \(u=u_m\).

If the rows of the matrices \(P\) and \(M\) are denoted by \(\bar{S}_i\) and \(\bar{O}_j\) \((0\le i\le p_1,\;0\le j\le q_1)\), then it can be shown that the expression \(\sum_{i=1}^{n}\psi_i f_i=\operatorname{const}+(\varphi,u)\) attains its maximum value at \(u_m\) only in the case when there exist nonnegative numbers \(s_{i\alpha}\) and \(o_{j\beta}\) \((\alpha=1,2,\ldots,p_1;\;\beta=1,2,\ldots,q_1)\) such that the equality

\[ \varphi(t)=\sum_{\alpha=1}^{p_1}s_{i\alpha}\bar{S}_{i\alpha} +\sum_{\beta=1}^{q_1}o_{j\beta}\bar{O}_{j\beta}. \tag{9} \]

holds.

From this equality, taking into account the nonnegativity of \(s_{i\alpha}\) and \(o_{j\beta}\), it is easy to conclude that

\[ \varphi(t)(u-u_m)\le 0, \tag{10} \]

where \(u\;(u\ne u_m)\) is an arbitrary point of the polyhedron \(U(x^0)\).

If, in turn, the rows of the matrix \(Q\) are denoted by \(\bar{S}_i^*\), and with the aid of the coefficients \(s_{i\alpha}\) \((\alpha=1,2,\ldots,p_1)\) an \(n\)-dimensional vector \(\omega_i(t)\) is formed with coordinates:

\[ \omega_i(t)=\sum_{\alpha=1}^{p_1}s_{i\alpha}-\bar{S}_{i\alpha}^{*}, \quad \text{where }\; \bar{S}_{i\alpha}^{*}= \left( -\frac{\partial s_{i\alpha}}{\partial x_1}; -\frac{\partial s_{i\alpha}}{\partial x_2}; \ldots; -\frac{\partial s_{i\alpha}}{\partial x_n} \right), \tag{11} \]

which, as is not difficult to see, belongs to the cone formed by the inward normals to the hyperplanes bounding the region \(X(u_m)\) in a neighborhood of \(x^0\), then, in view of the nonnegativity of \(s_{i\alpha}\), we obtain, for all admissible values \(x\in X(u_m)\), the inequality

\[ \omega(t)(x-x^0)\ge 0. \tag{12} \]

Form the system:

\[ \frac{d\psi_0}{dt}=0;\quad \frac{d\psi_i}{dt} = -\frac{\partial H}{\partial x_i} +\sum_{\alpha=1}^{p_i}s_{i\alpha}\frac{\partial S_{i\alpha}}{\partial x_i} = -\frac{\partial H}{\partial x_i} -\omega_i(t) \quad (i=1,2,\ldots,n) \tag{13} \]

and consider the boundary conditions for \(\psi(t)\) in the form of transversality conditions

\[ \begin{aligned} (\psi(T_0), e_i^1) &= 0 && \text{for all } i=1,2,\ldots,k_1,\\ (\psi(T), e_i^2) &= 0 && \text{for all } i=1,2,\ldots,k_2, \end{aligned} \tag{14} \]

supplemented by the condition on the coordinate \(\psi_0(t)\) in the form

\[ \psi_0(T_0)=-1. \tag{15} \]

3. Theorem. If, for the optimal problem under consideration, there exists a nonzero, continuous, piecewise-smooth vector-function \(\psi(t)\), defined on \([T_0,T]\), satisfying conditions (14), (15) and system (13), in which the coefficients \(s_{i\alpha}\) \((\alpha=1,2,\ldots,p_1)\) are determined for any \(t\in[T_0,T]\) by expression (9), composed for the value \(u_m^*(t)\) corresponding to the maximum of the functional \(\sum_{i=1}^n \psi_i f_i\) at each instant \(t\in[T_0,T]\), then the resulting control \(u=u_m(t)\) is optimal on \([T_0,T]\) among all admissible controls of the problem under consideration.

Proof. We divide the entire set of admissible phase vectors \(x\) into two types. To the second type we assign any admissible vector \(x\) such that every vector \(u\) corresponding to \(x\) turns some group of rows \(O_j(u)\) of the matrix of inequalities \(\dot O(u)\) from (3) into equalities: \(M_j u+N_j=0\). In this case there must exist a row \(P_i u-Q_i x-R_i\) of the matrix \(S(x,u)\), in which the expression \(P_i u\) is constant for any \(u\) corresponding to \(x\). To the first type we assign all other admissible vectors \(x\). We assign any domain of variation of the phase coordinates \(X(u)\) to the first type if it is entirely represented by vectors of the first type. Otherwise we assign the domain to the second type.

In problems where the domain of variation of the phase coordinates belongs to the first type, only one of the complicating circumstances listed above is possible, namely the nonconstancy of the control domain \(U(x)\), whereas in domains of the second type, in addition, the domain of variation of the phase coordinates has a fixed boundary.

We shall first prove the theorem for the case in which the domain of possible values of the phase coordinates belongs to the first type.

Fulfillment of the conditions of the theorem means that the entire interval \([T_0,T]\) is divided into \(m\geqslant 1\) intervals in such a way that to each interval there corresponds one fully determined set of rows of the matrices of inequalities (2) and (3), which become equalities for the values \(u_m(t)\) and the corresponding values of the phase vector \(x_m(t)\) obtained from system (1) under conditions (4) and (5).

In other words, to each interval there corresponds some working vertex of the polyhedron, composed of a certain set of hyperplanes, and its own set of nonnegative numbers \(s_{i\alpha}\) and \(o_{i\beta}\) \((\alpha=1,2,\ldots,p_1;\ \beta=1,2,\ldots,q_1)\).

Let the division points of the interval \([T_0,T]\) be \(t_0=T_0,t_1,t_2,\ldots,t_m=T\). With the help of the function \(\psi(t)\) existing by the condition of the theorem, an arbitrary admissible control \(u(t)\), and the corresponding value \(x(t)\), compose the expression:

\[ I(\bar x,\psi,u)= \sum_{k=1}^{m}\int_{t_{k-1}}^{t_k} \left[ \sum_{i=0}^{n}\psi_i\frac{d\bar x_i}{dt} - H(\bar x,\psi,u) \right]dt, \]

which, according to relations (7) and (8), is identically equal to zero, i.e. \(I(\bar x,\psi,u)\equiv 0\). The latter will also be true for the control \(u_m(t)\), which, as is not difficult to show, will also be admissible, and for the value \(x_m(t)\) obtained with its help.

We vary the control \(u_m(t)\). To this end, take an arbitrary admissible control \(u(t)\) such that the corresponding value \(x(t)\) not only satisfies system (1), but also satisfies the boundary conditions (3) and (4).

Denoting \(u(t)=u_m(t)+\delta u\), \(\bar{x}(t)=\bar{x}_m(t)+\delta \bar{x}\) and considering the difference
\(\Delta=I(\bar{x}_m+\delta \bar{x},\psi,u_m+\delta u)-I(\bar{x}_m,\psi,u_m)\), which, by the condition introduced above, is identically equal to zero, we obtain

\[ \Delta \equiv \sum_{k=1}^{m}\left\{ \int_{t_{k-1}}^{t_k} \left(\psi(t), \frac{d}{dt}\,\delta\bar{x}\right)\,dt - \int_{t_{k-1}}^{t_k} \left[(\psi(t),\bar{A}\delta\bar{x})+(\varphi,\delta u)\right]\,dt \right\}\equiv 0. \]

Integrating the first term by parts and taking into account conditions (11), (14), and (15), we shall have

\[ \delta\int_{T_0}^{T}(F(t),u(t))\,dt = \sum_{k=1}^{m}\left[ \int_{t_{k-1}}^{t_k}(\delta\bar{x},-\omega(t))\,dt + \int_{t_{k-1}}^{t_k}(\varphi(t),\delta u)\,dt \right], \]

whence, taking into account inequalities (10) and (12), we obtain that, indeed, the functional (6) attains its maximum for the control \(u=u_m(t)\), i.e. the theorem in this case is proved.

In the case where the domain of possible values for the phase coordinates belongs to the second type, we make a partition of the time interval \([T_0,T]\) similar to the preceding one. It is obvious from the meaning of the partition that the endpoints of any intervals on which, in at least one row, the condition \(P_i u=\mathrm{const}\) holds coincide with the endpoints of the partition intervals \(t=t_k\) \((k=0,1,2,\ldots,m)\). Then on these intervals those rows \(S_i(x,u)\) of the matrix \(S(x,u)\) in which \(P_i u=\mathrm{const}\) will have the form

\[ C_i-R_i-Q_i x(t)=0. \tag{16} \]

This form of restriction on the vector function of the control may be replaced by another form, preserving its content. Indeed, if in row \(i\) on the interval \([t_{k-1},t]\), where \(t\in[t_{k-1},t_k]\), \(P_i u=C_i=\mathrm{const}\) holds, then at the point \(t_1>t\), where \(t_1\in[t_{k-1},t_k]\), generally speaking, there will hold

\[ C_i-R_i-Q_i x(t_1)\leq 0. \tag{17} \]

Subtracting equality (16) from inequality (17) and dividing by \(t_i-t>0\), we obtain

\[ -Q_i\,\frac{x(t_1)-x(t)}{t_1-t}\leq 0, \]

whence, passing to the limit, we shall have

\[ -Q_i\,\frac{dx}{dt}\leq 0. \]

Taking into account system (1), we obtain the inequality

\[ S_i^{**}(x,u)=-Q_i f(x,u)\leq 0, \]

which ensures the fulfillment of equality (16) on this interval and has a form that permits, after replacing each equality of type (16) in the rows of the matrix \(S(x,u)\) by the corresponding inequalities \(S_i^{**}(x,u)\), the problem to be reduced to the preceding case. The theorem is proved.

Moscow Institute of Physics and Technology

Received
20 I 1964

References

  1. R. Bellman, Dynamic Programming, IL, 1960, pp. 219–261.
  2. Collection: Modern Mathematics for Engineers, ed. E. F. Beckenbach, IL, 1958, pp. 259–269.
  3. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, E. F. Mishchenko, Mathematical Theory of Optimal Processes, Moscow, 1961.
  4. R. V. Gamkrelidze, Izv. AN SSSR, Ser. Mat., 24, no. 3, 315 (1960).
  5. L. I. Rozonoer, DAN, 127, no. 3, 520 (1959).
  6. S. Gass, Linear Programming, Moscow, 1961.

Submission history

Yu. P. KRIVENKOV