N. Ya. Bagaeva, N. N. Moiseev
Mathematics
Submitted 1963-01-01 | RussiaRxiv: ru-196301.73083 | Translated from Russian

Abstract

Full Text

N. Ya. Bagaeva, N. N. Moiseev

On One Method for the Numerical Solution of Optimal Control Problems

(Presented by Academician A. A. Dorodnitsyn, 5 IV 1963)

Mathematics

  1. Suppose that the motion of a system is described by the differential equation:

[
x' = X(x,u),
\tag{1}
]

where (x) is an (n)-dimensional vector, and (u(t)) is the control function. We shall consider the problem of finding a control—a function (u(t))—which transfers the system from the point (x) to the point (x_T), or to the surface (x_{jk} = x_{jT}), (k = 1, 2, \ldots, l < n), in minimum time.

Suppose that equation (1) has the form

[
\begin{aligned}
x_i' &= f_i(x_1,\ldots,x_n), && i = 1,2,\ldots,m;\
x_j' &= f_j(x_1,\ldots,x_n,u), && j = m+1,\ldots,n.
\end{aligned}
\tag{2}
]

The first (m) equations, which do not contain the control, we shall agree to call kinematic relations.

We shall denote the constraints on the phase coordinates and on the control as follows:

[
x \in M(t),
\tag{3}
]

[
u \in G(x),
\tag{4}
]

where (M) and (G) are certain closed sets, or, somewhat more generally:

[
(x,u) \in W.
]

The conditions at the right-hand end of the phase trajectory of system (1) will be called the control objective.

Let us also introduce a function (\Pi(x)), characterizing the distance of the representative point from the control objective. For example, if the control objective is the point (x^{(T)}), this function may be taken in the form

[
\Pi(x) = \frac{1}{2}\sum_{k=1}^{n} \mu_k^2 \bigl(x_k - x_k^{(T)}\bigr)^2,
\tag{5}
]

where (\mu_k) are weighting factors.

  1. For an effective solution of the problem, some preliminary information on the behavior of the phase trajectories is necessary. This information makes it possible to specify an initial (zero) approximation. Since the choice of the initial approximation is not a universal operation, but depends on the particular conditions of the problem, we shall not discuss this question here. We shall only note that in problems where the “distance” to the control objective is small (problems of the correction type), an elementary operation (see item 3) or analogous considerations may be used to construct the zero approximation.

  2. The basic element in the construction of the algorithm is an elementary operation (a (B_\tau)-operation). This operation solves a local variational problem: to two nearby points of the phase space, (P_0) and (P_\tau), it assigns a control (and a trajectory of system (1)) transferring the representative point from position (P_0) to position (P_\tau) in time (\tau), this time being minimal or differing from it by a quantity (O(\tau^k)), where (k > 1).

To construct the elementary operation one may use the maximum principle. Suppose that the number (m) in system (2) is equal to zero. Replace system (2) by the following:

[
x_i'=\bar f_i(u),
\tag{6}
]

where

[
\bar f_i=f_i(\bar x_1,\ldots,\bar x_n,u),\qquad
\dot x_i=\frac{x_i^{(\tau)}-x_i^{(0)}}{2}.
]

The system adjoint to system (6) is in this case integrated explicitly in the form

[
p_i=a_i,\qquad i=1,2,\ldots,n,
]

where the (a_i) are arbitrary constants connected by one relation (for example, by the normalization condition (\sum a_i^2=1)).

From the maximum principle we find (u) as a function of the constants (a_i). Equation (6) can now be integrated. At the time (t=\tau) the phase trajectory must pass through the point (P_\tau). This gives (n) equations

[
x_i(\tau)=x_i^{(\tau)}
\tag{7}
]

for determining the constants (a_i) and (\tau). Thus, in the given problem the (B_\tau)-operation reduces to the solution of the system of transcendental equations (7).

If (m\ne 0), then the problem proves to be more complicated. In this case we replace system (2) by the following:

[
\begin{aligned}
x_i'&=\bar f_i(\delta x_{m+1},\ldots,\delta x_n),\qquad i=1,2,\ldots,m,\
\delta x_j'&=\bar f_j(u),\qquad j=m+1,\ldots,n,
\end{aligned}
\tag{8}
]

where

[
\bar f_i(\delta x_{m+1},\ldots,\delta x_n)
=
f_i(\bar x_1,\bar x_2,\ldots,\bar x_n)
+
\sum_{k=m+1}^{n}
\left(\frac{\partial f_i}{\partial x_k}\right)_{x=\bar x}
\delta x_k,
]

[
\bar f_j(u)=f_j(\bar x_1,\ldots,\bar x_n,u).
]

The system adjoint to system (7) is also integrated explicitly:

[
p_i=a_i,\qquad i=1,2,\ldots,m,
]

[
p_j=a_j+\sum_{i=1}^{m} b_{ij}a_i t,\qquad j=m+1,\ldots,n,
]

where the (a_s) are arbitrary constants (among them (n-1) independent ones).

On the basis of the maximum principle we find

[
u=F(a_1,\ldots,a_n,t).
]

Let us note that in this case system (8) can no longer be integrated explicitly, which considerably complicates the solution of system (7).

Remarks. I. The practical implementation of the elementary operation can be considerably simplified if there is a good approximation. The latter is always available when one is dealing with a process of successive approximations (see item 4). In the latter case, the implementation of the elementary operation can always be reduced to solving a system of linear algebraic equations.

II. The maximum principle is a convenient, but not the only, means of implementing the elementary operation. Using the proximity of the points (P_0) and (P_\tau), it is sometimes possible to construct a direct solution of the local variational problem.

  1. Thus, suppose that at our disposal we have some zero approximation and a method for constructing a trajectory that, in an optimal way, transfers system (1) from one point of phase space to another point close to it. The task of the algorithm is to construct, relying on these two facts, an optimal solution. The algorithm set forth below has much in common with algorithms used in economic problems (¹).

On the trajectory of the zero approximation, mark the points (P_1, P_2, \ldots, P_N), and construct ((n-1))-dimensional hyperspheres (S_i) with centers at these points

[
x_1 = x_1^{(i)}, \qquad \sum_{k=2}^{n} (x_k - x_k^{(i)}) = \delta^2,
\tag{9}
]

where (\delta) is a small number. On the spheres (S_i) place the sets (P_{ij}). The number of points in these sets depends on the memory of the machine. In solving concrete problems, the authors constructed sets (P_{ij}) consisting of (2(n-1)) points:

[
x_s \pm \delta; \qquad x_k = x_k^{(i)} \quad (k \ne s).
]

Remark. The indicated method of constructing the set (P_{ij}) is convenient if the coordinate (x_1) changes monotonically. In some cases it is expedient to replace the first of the conditions by the following:

[
\Pi = \Pi_i,
]

where (\Pi) is determined by formula (5), and the index (i) corresponds to the value of (\Pi) at the point (P_i). Then in the second condition the summation must be carried out from (k = 1) to (n).

With the aid of the (B_\tau)-operation, connect the point (P_0) with each of the points (P_{1j}). This means that to each point (P_{1j}) we have put in correspondence a control and a transition time (\tau_0^{1j}). Next take any one of the points of the second set, for example the point (P_{2k}), and connect it with each of the points of the first set. Let (\tau_{1j}^{2k}) denote the transition time from the point (P_{1j}) to the point (P_{2k}). The transition time of the system from the initial position to the point (P_{2k}) will be

[
\tau_0^{1j} + \tau_{1j}^{2k}.
]

This quantity is a function of the intermediate point (index (j)). Define

[
\tau_0^{2k} = \min_j \left(\tau_0^{1j} + \tau_{1j}^{2k}\right);
]

(\tau_0^{2k}) is the optimal transition time of the system (among the selected bundle of trajectories) from the initial position to the point (P_{2k}). We remember only the control that realizes this transition and the time (\tau_0^{2k}).

In this way we construct (2(n-1)) trajectories connecting the point (P_0) with each of the points of the set ({P_{N-1,s}}). The last step depends on the nature of the objective of control. If the objective of control is the point (x^T), then as the optimal control we take the one that realizes

[
\min_s \left(\tau_0^{N-1,s} + \tau_{N-1,s}^{T}\right) = T,
]

where (\tau_{N-1,s}^{T}) is the transition time of the system from the point (P_{N-1,s}) to the point (P_N). If the objective of control is some manifold, then around the point (P_N) we construct the set (P_{N,j}) and choose the control that realizes (\min_j \tau_0^{Nj}).

The constructed control is called the first approximation (u^{(1)}). To construct the second approximation, we shall take the trajectory realized when (u = u^{(1)}), divide it into (N_1) points, and with it repeat the process described.

  1. Constraints on the phase coordinates are taken into account when constructing the points (P_{ij}). If these points do not satisfy constraints of type (3), then the corresponding trajectories are not computed. It may turn out that, because of constraints (4), the transition from the point (P_{s,k}) to the point (P_{s+1,j}) is impossible. These trajec-

we discard. The constraints may also have a more complicated structure: for example, suppose the trajectory must satisfy the condition

[
I=\int J(x_1,\ldots,x_n)\,dt

Submission history

N. Ya. Bagaeva, N. N. Moiseev