UDC 62-50
Unknown
Submitted 1967-01-01 | RussiaRxiv: ru-196701.96354 | Translated from Russian

Full Text

UDC 62-50

CYBERNETICS AND CONTROL THEORY

A. I. PROPOI, Ya. Z. TSYPKIN

ON THE SYNTHESIS OF OPTIMAL-ON-AVERAGE AUTOMATIC SYSTEMS

(Presented by Academician B. N. Petrov on 12 XI 1966)

The problem of synthesizing optimal systems \((^{1,2})\) (i.e., finding the optimal control as a function of the state coordinates) is one of the most important in automatic control. However, its solution encounters considerable difficulties. First of all, there is as yet no effective method for determining the synthesizing function itself. Moreover, even if this function is found, it often has such a complicated character that it is difficult to realize.

The reasons that have caused these difficulties consist in the fact that no restrictions are imposed on the control device and, consequently, on the form of the control law \((^2)\). In practice, however, when a control device is created, certain requirements are imposed on it, connected with simplicity, reliability, weight, dimensions, etc. These requirements are dictated both by the purpose of the control system and by the possibilities at our disposal. In the present note one possible approach to the synthesis of optimal systems is proposed. In the formulation of the problem, the form of the control law is determined in advance, except for a certain finite number of parameters.

For definiteness, let us consider the following optimal-control problem. The equations of motion are given and have the form

\[ \dot{x}(t)=f(x(t),u(t)), \tag{1} \]

where \(x=\{x_1,\ldots,x_n\}\) is the state of the system and \(u=\{u_1,\ldots,u_r\}\) is the control action.

The quality of control is evaluated by a functional of the form

\[ J=\int_0^T f_0(x(t))\,dt. \tag{2} \]

The problem of synthesizing an optimal control for the given problem consists in finding such a control law

\[ u(t)=g(x(t),t), \tag{3} \]

which minimizes the performance index (2) for any initial states of the system (1). At the same time, various restrictions may be imposed on the control actions and on the state coordinates of the system.

We shall not seek the function (3), but shall determine the control law in the form

\[ u(t)=\hat{g}(x(t),c,t), \tag{4} \]

where the function \(\hat{g}\) is specified in advance, and the vector \(c=\{c_1,\ldots,c_q\}\) is an unknown parameter.

The form of the function (4) is determined by the design possibilities. For example, it is often impossible to measure all the state coordinates.

Then

\[ \hat g=\hat g(y,c,t), \]

where \(y=\{y_1,\ldots,y_m\}\) is the vector of measurable state coordinates.

In a number of cases, (4) may be regarded as an approximation of the control law (3), i.e., the function (4) has the form

\[ u(t)=\sum_{i=1}^{q} c_i\varphi_i(x,t), \]

where \(\{\varphi_i\}\) is a system of linearly independent functions.

In either case, the problem consists in finding such a vector \(c\) which, by virtue of equations (1), (4), would minimize the performance index (2). In this formulation of the problem, the choice of the vector \(c\) depends substantially on the initial state \(x_0\). Indeed, from (1) and (4) we obtain

\[ \dot{x}(t)=f(x(t),\hat g(x(t),c,t),t)=f_1(x(t),c,t). \tag{5} \]

Denote by

\[ x(t)=\psi(x_0,c,t) \tag{6} \]

the solution of equation (5) with initial state \(x_0\). Substituting (6) into (2), we obtain

\[ J=\int_{0}^{T} f_0(\psi(x_0,c,t))\,dt=J(x_0,c). \tag{7} \]

Thus, the choice of the vector \(c\) that minimizes the function (7) is not unique. To avoid this indeterminacy, it is natural to choose the vector \(c\) so that the performance index (7) would be minimal on the average for the existing distribution of initial states \(x_0\), i.e., we shall seek a vector \(c=c^*\) that minimizes

\[ \bar J(c)=\int_{X}\int_{0}^{T} f_0(\psi(x_0,c,t))p(x_0)\,dx_0\,dt, \tag{8} \]

where \(p(x_0)\) is the probability density of the initial states, given on some fixed bounded set \(X\).

To find the optimal vector \(c=c^*\), the following procedure may be used. Introduce the adjoint system

\[ \dot p_i(t)=-\frac{\partial f_0(x)}{\partial x_i} -\sum_{j=1}^{n}p_j(t)\frac{\partial f_{1j}(x,c)}{\partial x_i} \qquad (i=1,\ldots,n) \tag{9} \]

with the boundary condition

\[ p_i(T)=0 \qquad (i=1,\ldots,n) \tag{10} \]

and the Hamiltonian function

\[ H(p,x,c)=f_0(x)+\sum_{i=1}^{n}p_i f_{1i}(x,c). \tag{11} \]

Using (9)—(11), it is not difficult to write the differential of the performance index (2) in the form (cf. \((3')\)).

\[ \delta_c J(x_0,c)=\int_{0}^{T}\left(\sum_{i=1}^{q}\frac{\partial H}{\partial c_i}\,\delta c_i\right)\,dt. \tag{12} \]

It is clear from (12) that the vector \(Q(x_0,c)\), with coordinates

\[ \left\{\int_0^T \frac{\partial H}{\partial c_1}\,dt,\ldots,\int_0^T \frac{\partial H}{\partial c_q}\,dt\right\}, \]

plays the role of the gradient of the performance index (2) in the space of the parameters \(c\).

Now, using the ideas of the adaptive approach \({}^{(4)}\), one can propose an algorithm for determining the optimal value \(c^*\):

\[ c[n]=c[n-1]+\gamma[n]Q(x_0[n-1],c[n-1])\quad (n=1,2,\ldots). \tag{13} \]

The algorithm works as follows. First consider the case where the distribution density \(p(x_0)\) is known. Then the optimal value of the vector \(c^*\) can be determined in advance. Choose an arbitrary value \(c[0]\) and measure, in accordance with the distribution density \(p(x_0)\), the initial state \(x_0=x_0[0]\). For these values \(c[0]\) and \(x_0[0]\), from (5), (9)—(11) we compute \(Q(c[0],x_0[0])\). Then from (13) we determine \(c[1]\), and measure the new initial state \(x_0[1]\), etc.

If, however, the density \(p(x_0)\) is not known in advance, then realizations \(x_0[n]\) can be determined in the course of normal operation of the system. In this case, the current phase coordinates \(x(t_n)\) \((n=1,2,\ldots)\) may be taken as the initial states of the system (if it is stationary). The time between measurements of the coordinates \(x(t_{n-1})\) and \(x(t_n)\) must be chosen in such a way that the computing device has time to determine the values \(c[n]\) from (13).

Finally, when the equations of motion (1) are unknown, implementation of the algorithm (13) requires identification of the adjoint system (9). In this case, for each iteration it is necessary, over a time interval equal to \(T\), to measure the input and output of the system at the constant value \(c=c[n]\).

Under certain requirements on the conditions of the problem and with an appropriate choice of \(\gamma[n]\), one can ensure (probabilistic) convergence of the vector \(c[n]\) to one of the stationary values of the performance index (8).

We note that the choice of the coefficient \(\gamma[n]\) here is apparently the usual one for the adaptive approach \({}^{(4)}\): if the output \(x(t)\) is measured exactly, then \(\gamma[n]\) can be chosen constant; if \(x(t)\) is observed with noise, then the coefficient \(\gamma[n]\) must decrease with increasing \(n\) according to a definite law.

In conclusion, it is necessary to emphasize that the questions of convergence of the algorithm (13) are complex (in the general case the problem is multiextremal) and are not considered here.

Institute
of Automation and Telemechanics Received
28 X 1966

REFERENCES

  1. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, E. F. Mishchenko, The Mathematical Theory of Optimal Processes, Moscow, 1961.
  2. A. A. Feldbaum, Foundations of the Theory of Optimal Automatic Systems, “Nauka,” 1966.
  3. L. I. Rozonoer, Automation and Telemechanics, No. 10, 1320 (1959); No. 11, 1441 (1959); No. 12, 1561 (1959).
  4. Ya. Z. Tsypkin, Automation and Telemechanics, No. 1, 23 (1966).

Submission history

UDC 62-50