Full Text
CYBERNETICS AND CONTROL THEORY
I. V. ROMANOVSKII
ASYMPTOTICS OF RECURRENCE RELATIONS OF DYNAMIC PROGRAMMING AND OPTIMAL STATIONARY CONTROL
(Presented by Academician A. N. Kolmogorov, 27 III 1964)
In this note we study the limiting behavior of recurrence relations of dynamic programming in connection with problems of optimal stationary control. Similar questions have been considered repeatedly in the literature. The results presented here are a development of results of R. Bellman, R. Howard, and D. White \((^{1-4})\). The duration of transition from state to state introduced here makes it possible to considerably broaden the range of application of this model.
1. Consider the following problem.
There is a process that may be in one of \(N\) states. For each state \(i\) a finite nonempty set of controls \(Q_i\) is given. Each control \(q_i\) transfers the process from state \(i\) to some other state \(j(q_i)\). The time \(t_i(q_i)>0\) of transferring the process from \(i\) to \(j(q_i)\) is given, as well as the income \(c_i(q_i)>0\) obtained in such a transfer. We shall assume that, with a proper choice of controls, it is possible in a finite number of steps to get from any state \(i\) to any other state \(j\).
Let a time \(T\) be given, limiting the duration of the process, and let the initial state of the process be \(i_0\). Denote by \(f_t(i)\) the maximal income obtained from the process when \(T=t\) and \(i_0=i\). Then, using the principle of optimality \((^5)\), we obtain
\[ f_t(i)=\max_{Q_i(t)}\left[c_i(q_i)+f_{t-t_i(q_i)}\bigl(j(q_i)\bigr)\right], \]
\[ f_t(i)=0 \quad \text{when } Q_i(t)=\Lambda, \tag{1} \]
where
\[ Q_i(t)=\{q_i\mid q_i\in Q_i,\ t_i(q_i)=t\}. \]
Introduce into consideration a directed graph \(\Gamma\) (we use the terminology of C. Berge \((^6)\)), whose vertices are the states of the process, and whose arcs, emanating from vertices, are the corresponding controls. Naturally, the end of an arc is taken to be the resulting state of the control to which this arc corresponds. To each arc \(u=q_i\) there correspond a length \(t(u)=t_i(q_i)\) and an income \(c(u)=c_i(q_i)\).
Since all the sets \(Q_i\) are nonempty, the graph \(\Gamma\) has cycles. To each cycle \(C\) we assign the characteristic of the cycle \(d_C\)
\[ d_C=\frac{\sum_{u\in C} c(u)}{\sum_{u\in C} t(u)}, \tag{2} \]
equal to the average income on the cycle per unit length (or time). The characteristic of an arc will mean the ratio of the income on the arc to the length of the arc. Denote by \(L\) the length of the maximal path having no self-intersections, and by \(d\) the maximal characteristic of cycles
\[ d=\max_C d_C . \tag{3} \]
through \(\bar d\) and \(\underline d\), respectively, the maximum and minimum characteristics of arcs.
Theorem 1.
\[ 2L(d-\underline d)\leq f_T(i)-Td\leq L(\bar d-d). \tag{4} \]
Thus, as \(T\to\infty\), the maximum income differs from the income obtained under stationary motion along an optimal cycle—a cycle with characteristic \(d\)—only by a bounded quantity.
Let us now denote by \(d'\) the characteristic of the cycle next in magnitude after the optimal one (only elementary cycles are meant), and by \(k\) the length of some optimal cycle.
Theorem 2. Under optimal control, the length of a path not belonging to full circuits along optimal cycles is bounded above by the quantity
\[ L+\frac{(\bar d-d)L+(d-\underline d)k}{d-d'}, \]
which does not depend on the duration \(T\) of the process.
- The search in \(\Gamma\) for an optimal cycle and for its characteristic \(d\) can be carried out by means of linear programming. Consider the following linear programming problem:
Find a vector \(Y=\{y_q\}\) satisfying the conditions:
a) \(y_q\geq 0,\quad q\in Q_i,\quad i=1,2,\ldots,N;\)
\[
\tag{5}
\]
b)
\[
\sum_{Q_i} y_q=\sum_k\sum_{q_k:\,i(q_k)=i} y_q,\quad i=1,2,\ldots,N;
\tag{6}
\]
c)
\[
\sum_i\sum_{Q_i} t_i(q)y_q=\tau(Y)=1;
\tag{7}
\]
d) the quantity
\[
\gamma(Y)=\sum_i\sum_{Q_i} c_i(q)y_q
\tag{8}
\]
attains its maximum value.
The dual problem to it is:
Find a vector \(Z=(z_1,\ldots,z_N,u)\) satisfying the conditions:
\[ \alpha)\quad z_i-z_{j(q_i)}+t_i(q_i)u\geq c_i(q_i)\quad \text{for all } i \text{ and } q_i\in Q_i; \tag{9} \]
\(\beta)\) the quantity \(u\) attains its minimum.
The duality theorem for linear programming problems gives the following optimality criterion for the vector \(Y\):
Theorem 3. In order that a vector \(Y\), satisfying conditions (5)—(8), be optimal, it is necessary and sufficient that there exist a vector \(Z\), satisfying condition (9), for which
\[ z_i=z_{j(q_i)}+t_i(q_i)u=c_i(q_i)\quad \text{when } y_{q_i}>0; \tag{10} \]
\[ y_{q_i}=0\quad \text{when } z_i-z_{j(q_i)}+t_i(q_i)u>c_i(q_i). \tag{11} \]
The connection between the problem of an optimal cycle of a graph and problem (5)—(8) is determined by the fact that to each cycle \(C\) one can assign a vector \(Y_C\), setting \(y_{q_i}=\left(\sum_{q\in C} t(q)\right)^{-1}\) for arcs from \(C\) and \(y_{q_i}=0\) for the remaining arcs. In this case the value of the linear form (8) will be equal to the characteristic of the corresponding cycle. Moreover, the vectors corresponding to cycles (we shall call them cyclic) are extreme points of the set of admissible vectors \(Y\). Namely, the following theorem holds:
Theorem 4. Any vector \(Y\) satisfying conditions (5)—(7) is representable in the form of a linearly convex combination of cyclic vectors.
This theorem has some similarity with the Birkhoff—Neumann theorem on the decomposition of a bistochastic matrix into permutation matrices \((^6)\). A special case of it is Berge’s theorem \((^7)\) on the decomposition of flows in a network. An analogous theorem for the case of nondeterministic transitions and fixed transition time is given in \((^8)\).
It follows from Theorems 3 and 4, in particular, that the value of problem (5)—(8) is equal to \(d\).
Remark 1. If the order in which the vertices are traversed in the optimal cycle is known (as happens in many real problems), then to find the optimal cycle one can use the Bellman—Karp method \((^9,{}^{10})\) (see also \((^{11})\)). This is especially convenient if the set \(Q_i\) is not finite (for example, if the control is given by a function \(c_i(t)\), \(i=1,2,\ldots,N\)).
Remark 2. If by \(M_{ij}\) we denote the set of pairs \((c,t)\), where \(c=c_i(q_i)\), \(t\leqslant t_i(q_i)\) and \(j(q_i)=j\), and by \(\overline{M}_{ij}\) the convex hull of \(M_{ij}\), then in the optimal cycle only those \(q_i\) can participate which correspond to boundary points of \(\overline{M}_{i\,j(q_i)}\).
Remark 3. Instead of the problem set forth here, one may consider another, in a certain sense dual to the first:
Let an income \(c\), which is required to be obtained from the process, and an initial state of the process \(i_0\) be given. It is necessary to find the minimum time in which an income not less than \(c\) can be obtained. Denote this time by \(\varphi_c(i_0)\). Then, from the principle of optimality,
\[ \varphi_c(i)=\min_{Q_i}\,[t_i(q_i)+\varphi_{c-c_i(q_i)}(j(q_i))], \]
\[ \varphi_c(i)=0 \quad \text{for } c\leqslant 0. \]
It is easy to see that
\[ \varphi_c(i)=\frac{c}{d}+O(1). \]
The corresponding linear programming problem is:
Find a vector \(Y\) satisfying conditions (5), (6)] and
\(c')\) \(\gamma(Y)=1\);
\(d')\) the quantity \(\tau(Y)\) attains a minimum.
It has optimal solutions differing from the solutions of problem (5)—(8) only by the factor \(\dfrac{1}{d}\).
- In the case when \(t_i^l(q_i)\equiv 1\) (the Bellman—Howard case), Theorem 1 refines Bellman’s result \((^2)\) for the equation
\[ f_n(i)=\max_{Q_i}[c_i(q_i)+f_{n-1}(j(q_i))]. \tag{12} \]
Theorem 2 retains its formulation.
For this case it is natural to consider the equation
\[ q(i)=\max_{Q_i}[c_i(q_i)-d+g(j(q_i))]. \tag{13} \]
This equation always has a solution. It is easy to see that each of its solutions is determined up to an additive constant. Studying the entire set of solutions requires the introduction of the graph \(\overline{\Gamma}\), consisting of cycles with maximal characteristic.
Theorem 5. The solutions of equation (13) form a convex set whose dimension is equal to the number of connected components of the graph \(\overline{\Gamma}\). In particular, if the graph \(\overline{\Gamma}\) is connected, then the solution of (13) is unique up to an additive constant.
This theorem is also valid for the problem considered in Sec. 1. It is only necessary, instead of (13), to consider the equation
\[ g(i)=\max_{Q_i}\,[c_i(q_i)-t_i(q_i)d+g(j(q_i))]. \tag{13'} \]
The following theorem applies only to the case \(t_i\equiv 1\).
Theorem 6. If the graph \(\overline{\Gamma}\) is connected and the greatest common divisor of the numbers of arcs in its cycles is equal to 1, then, for sufficiently large \(n\), the optimal path consists of motion along optimal cycles and of a path with maximum income containing at least one vertex of \(\overline{\Gamma}\).
In this case, for sufficiently large \(n\),
\[ f_n(i)=v_i+c+nd, \tag{14} \]
where
\[ c=\max_{\overline{\Gamma}} g(i)=g(i_0), \]
\(g(i)\) is the limit of the sequence of functions
\[ g_0(i)\equiv 0, \]
\[ g_n(i)=\max\{g_{n-1}(i),\max_{Q_i}[c_i(q_i)-d+g_{n-1}(j(q_i))]\}, \]
and \(v\) is the limit of the sequence of functions
\[ v_0(i)= \begin{cases} 0, & i=i_0,\\ -N(\overline{d}-d), & i\ne i_0, \end{cases} \]
\[ v_n(i)=\max\{v_{n-1}(i),\max_{Q_i}[c_i(q_i)-d+v_{n-1}(j(q_i))]\}. \]
Leningrad State University
named after A. A. Zhdanov
Received
27 III 1964
REFERENCES
- R. Bellman, J. Math. Mech., 6, No. 5, 679 (1957).
- R. Bellman, Rend. Circ. Mat. Palermo, 8, No. 3, 343 (1959).
- R. Howard, Dynamic Programming and Markov Processes, N. Y., 1960.
- D. J. White, J. Math. Analysis and Appl., 4, No. 3, 353 (1962).
- R. Bellman, Dynamic Programming, Moscow, 1960.
- K. Berge, Theory of Graphs and Its Applications, Moscow, 1962.
- C. Berge, A. Ghouila-Houri, Programmes, Jeux et Reseaux de Transport, Paris, 1962.
- P. Wolfe, G. Dantzig, Operations Res., 10, No. 5, 702 (1962).
- R. Bellman, W. Karush, Bull. Am. Math. Soc., 67, 5 (1961).
- R. Bellman, W. Karush, J. Soc. Industr. Appl. Math., 10, 3 (1962).
- I. V. Romanovskii, Vestn. LGU, No. 13, 148 (1962).