Full Text
UDC 518.9 + 519.95
CYBERNETICS AND CONTROL THEORY
V. A. YAKUBOVICH
ADAPTIVE SYSTEMS WITH MULTISTEP TARGET CONDITIONS
(Presented by Academician L. S. Pontryagin on 8 IV 1968)
1°. Let us repeat definition (1) of the simplest robot, with the changes needed for what follows. We shall assume that the time \(t\) takes the values \(t = 0, 1, 2, \ldots\). A given set of certain elements \(z\) will be denoted by \(\{z\}\). Elements \(z\) that change in time (in accordance with some mapping of \(\{t\}\) into \(\{z\}\)) will be called variables, and the value of \(z\) at the moment \(t\) will be denoted by \(z_t\). Quantities that do not change in time will be called parameters (p.). We shall regard as given the sets \(\{x\}\), \(\{s\}\), \(\{\sigma\}\), \(\{u\}\), \(\{\xi\}=M\), and a set \(\{\tau\}\) subject to determination (in accordance with the conditions formulated below), whose elements are called as follows: \(x\)—the external coordinates of the robot, \(s\)—the environment, \(\sigma\)—sensors, \(u\)—controls, \(\xi\)—variable parameters (v.p.), \(\tau\)—tactics. Suppose that, by some rule, for each \(t\) there is determined (depending on \(x_0, s_0, \ldots, x_t, s_t, \xi\)) a number \(\mu_t=0\) or \(\mu_t=1\), called the game signal. (The game continues for \(\mu_t=1\) and terminates for \(\mu_t=0\).) Let \(\Delta_j=[t'_j,t''_j]\) be intervals (called the time of the \(j\)-th game) on which \(\mu_t=1\), while outside them \(\mu_t=0\), and
\[
0\le t'_1\le t''_1<t'_2\le t''_2<\ldots .
\]
Let real functions \(F_j(x,s,t,\xi)\), \(j=1,2,\ldots\), be given. By a multistep target condition (TC) we shall mean one of the two conditions: (A) \(F_{j,t+1}=F_j(x_{t+1},s_{t+1},t+1,\xi)>0\) for \(t\in\Delta_j\); (B) there exists \(t_{0j}\in\Delta_j\) such that \(F_{j,t}>0\) for \(t\in[t_{0j},t''_j]\) (here \(j=1,2,\ldots\)). The one-step TC (1) coincides with (A) for \(t''_j=t'_j\).
We shall regard as given: (I) the sensor equation
\[
\sigma_t=\sigma(x_t,s_t,t,\xi),
\]
which determines what the robot “sees” at the moment \(t\). (II) The motor equation
\[
x_{t+1}=X(x_t,u_t,t,\xi),
\]
which determines the motion of the robot. (III) The equation of change of the environment
\[
s_{t+1}=S(s_t,x_t,t,\xi).
\]
The “brain equations” of the robot are to be determined (their right-hand sides, we emphasize, do not depend on the v.p. \(\xi\)). (IV)
\[
u_t=u(\sigma_t,\tau_t).
\]
(V)
\[
\tau_{t+1}=A(\sigma_t,\sigma_{t+1},\tau_t),\quad \text{if } \mu_t=1;\qquad
\tau_{t+1}=\tau_t,\quad \text{if } \mu_t=0.
\]
For any \(x_0,s_0,\tau_0\) (which, without loss of generality, we shall regard as depending on \(\xi\)), equations (I)—(V) make it possible to find successively the values \(x_t,s_t,u_t,\sigma_t,\tau_t\) for all \(t\).
If all that has been specified above is defined, then we shall say that a simplest robot is given.* If, for some \(j\), the TC is fulfilled, then we shall say that the corresponding game has been won by the robot. For each \(\xi\in M\) we have an infinite (generally speaking) sequence of games. The robot is called reasonable in the class of problems \(M\) if, for any \(\xi\in M\), there is a number \(j_0\) such that all games with number \(j\ge j_0\) are won by the robot and \(\tau_t=\mathrm{const}\) for \(t\ge t'_{j_0}\). The brain equations (IV), (V) must be constructed so that the robot becomes reasonable in the class of problems \(M\).
In the two simple examples of simplest robots given below, secondary details have been omitted (so as not to encumber the exposition), and also—
* The dependence on \(\xi\) of the functions \(\sigma, X, S, \mu, F_j\) reflects the circumstance that certain characteristics of the environment, of the system, and of the TC may be unknown to the designer and may change in the course of the “life” of the robot. The robot must solve the posed problem under any conditions determined by all possible \(\xi\in M\).
remain undetermined by the brain equations. In §§ 4 and 5 it will be shown how to construct their brain equations so that these robots become rational in the indicated classes of problems.
2°. The “hawk” robot (Я). The target and Я are described by points in a rectangle \(D\) in the plane, which move simultaneously at the instants \(t=0,1,\ldots\). Their velocities of motion are bounded by the quantities \(v_{\mathrm{ц}}\) and \(v_{\mathrm{я}}\). The task of the target is to fly through the “dangerous” zone \(D\); the task of Я is to catch the target, i.e., to enter the \(\varepsilon\)-neighborhood of the point where the target will be at the next instant. The target reacts to Я, i.e., the displacement of the target depends on the relative position of the target and Я. The law of motion of the target (determined by the values of certain parameters) is unknown to Я, and Я must find it in the course of pursuit, “studying” the target’s reactions. (More precisely, the brain of Я must find the needed controls as functions of the sensors.) Let us pass to a more exact exposition.
Below \(z,z',c\) are complex variables; \(\varphi,f,r,\zeta,\psi,\lambda\) are real variables; \(v,\delta\) are parameters; \(L,H,v_{\mathrm{я}},v_{\mathrm{ц}},\xi=\|\xi_j\|\) are external parameters. The external coordinates of Я are \(x=\|z,\varphi\|\), where \(z\) are Cartesian coordinates and \(\varphi\) is the course angle. Here \(z\in D\{|\operatorname{Re} z|\le L,\;0\le \operatorname{Im} z\le H\}\). The environment is \(c\in D\) (the coordinates of the target). Я “sees” the target, the “ground,” and the “horizon.” Namely, the sensors are
\[
\sigma=\|\varphi,\operatorname{Im} z,\zeta,\psi,\lambda\|,
\]
where \(\zeta,\psi\) are expressed through the coordinates of the target in the coordinate system of Я:
\[
\zeta=(|z-c|+v)^{-1}\delta,\qquad \psi=\arg(c-z)-\varphi,
\]
and \(\lambda\) is a certain characteristic of the target, specified below. Я turns through the angle \(f\) and moves a distance \(r\le v_{\mathrm{я}}\) (\(u=\|r,f\|\) are the controls). The motor equations are:
\[
\varphi_{t+1}=\varphi_t+f_t,\qquad z_{t+1}=z'_{t+1},
\]
where
\[
z'_{t+1}=z_t+r_t\exp(i\varphi_{t+1}),
\]
provided that \(z'_{t+1}\in D\). If \(z'_{t+1}\notin D\) (Я “wants” to fly out of \(D\)), then \(z_{t+1}\) is determined otherwise and in such a way that \(z_{t+1}\in D\).
Let us describe the equation of change of the environment. At \(t=0\) the first target \(c_0\) appears quasi-randomly on one of the boundaries \(\operatorname{Re} z=\pm L\) (with \(\lambda=\pm1\)) and then moves toward the opposite boundary according to the law
\[
c_{t+1}=S(c_t,z_t,\lambda,\xi),
\]
where \(|c_{t+1}-c_t|\le v_{\mathrm{ц}}\) (the first game, \(j=1\)). If at time \(t\) the target has been caught (\(|c_t-z_t|<\varepsilon\)) or has flown out of \(D\) (\(c_t\notin D\)), then at time \(t+1\) on one of the boundaries \(\operatorname{Re} z=\pm L\) a second target \(c_{t+1}\) appears quasi-randomly, which then moves toward the opposite boundary according to the same law, i.e., with the same value \(\xi\in M\) (the second game, \(j=2\)), and so on. It is assumed that for any \(\xi\in M\) the function \(S\) and its derivatives with respect to \(\operatorname{Re} c,\operatorname{Im} c,\operatorname{Re} z,\operatorname{Im} z\) are bounded for \(c\in D,\; z\in D,\; \lambda=\pm1\). The game signal is:
\[
\mu_t=1
\]
if \(c_t\in D\) and \(|c_t-z_t|\ge \varepsilon\). The target condition is the requirement to catch the appearing target. Namely, the target condition is of type (B): if \(\mu_{t'}=1\), then \(|z_{t_0}-c_{t_0}|<\varepsilon\) for some \(t_0>t'\) and \(\mu_t=1\) for \(t'\le t<t_0\).
A rational (in the sense of the definition introduced) Я, starting with some number \(j_0\), catches any target, and this is true for any law of target motion \(\xi\in M\). Under certain conditions (in particular, when \(v_{\mathrm{я}}\ge 3/2\,v_{\mathrm{ц}}\) and when \(L/v_{\mathrm{ц}}\) is sufficiently large) the brain equations can be constructed so that Я catches all targets, beginning with the first (\(j_0=1\)).
3°. The “bicyclist” robot (B)*. Let \(\chi\) be the angle between the plane of the bicycle frame and the vertical plane, and \(\psi\) the turning angle of the handlebars. Under a number of assumptions and after replacing derivatives by differences
* The problem described in this section is the simplest, idealized problem of constructing a robot that would itself learn to ride any bicycle from a certain class. It is assumed here that the bicycle moves at a constant speed (which varies from case to case), while the robot, by turning the handlebars, must learn to maintain balance. We note that the problem of constructing a control that stabilizes a bicycle moving at a constant speed has been solved for various idealizations \((^2,^3)\). However, in order to implement this control, one must know the parameters of the bicycle and its speed (which in the problem under consideration are external parameters), on which this control naturally depends, according to the general formulation of the problem; in the problem considered here the external parameters are unknown to the designer, and the robot itself must find the needed control, using only the reactions of the system to the supplied controls. The author takes this opportunity to thank A. Kh. Gelig, who pointed out that the problem described in this section fits into the general scheme.
by the relations the equations of motion of the bicycle (2, 3) are written in the form: (a) \(\chi_{t+1}=\xi_1\chi_t-\xi_2\chi_{t-1}+\xi_3\psi_t+\varphi_t\), (b) \(\psi_t=-\gamma_1\chi_t-\gamma_2\chi_{t-1}\). Here \(\varphi_t\) is an unknown external action, \(|\varphi_t|\leq \Phi\); \(\xi_j\) are random variables depending on the velocity, structural parameters, and \(\Delta t\), \(0<\xi_j\leq \varkappa_j\), where the \(\varkappa_j\) are known. Suppose that at first the bicycle is set vertically (i.e., random values \(\xi_0,\xi_1\), \(|\xi_0|<\delta\), \(|\xi_1|<\delta\), are assigned), then it moves, being controlled by the robot, and this motion takes place until \(|\chi_t|<r\) and while the time of this motion does not exceed the value \(T_0\) (the first game, \(t\in\Delta_1\), \(\mu_t=1\)); then it is again set vertically, again moves with the same condition (the second game, \(t\in\Delta_2\), \(\mu_t=1\)), and so on. Let \(\beta<1\), \(\varepsilon>0\), \(V_t=\chi_t^2+\beta\chi_{t-1}^2\). For sufficiently small \(\Phi\), on the plane \(\gamma_1,\gamma_2\) there exists (depending on \(\xi_j\)) a bounded stability region \(E\) such that, for \((\gamma_1,\gamma_2)\in E\), for all \(t\geq 0\) the following is satisfied: (c) \(V_{t+1}<V_t\), if \(V_{t+1}\geq \varepsilon^2\beta\). On the plane \(\{\chi_t,\chi_{t+1}\}\) the region \(V_{t+1}\geq \varepsilon^2\beta\) will be an invariant region of attraction. We shall assume that \(\Phi\) satisfies the indicated condition, and we shall take (c) as the TC. The problem fits into the general scheme if we put \(\sigma_t=x_t=\|\chi_t,\chi_{t-1}\|\), \(u_t=\psi_t\). Let \((1+\beta)\delta^2<\beta\varepsilon^2\), \(r>\varepsilon\). If \(B\) is intelligent, then for any values of the random variables the number of “falls” (\(|\chi_t|>r\)) will be finite and the robot will learn in a finite time \(t_0\), i.e., for \(t>t_0\) we shall have \(V_t<\varepsilon^2\beta\), and hence also \(|\chi_t|<\varepsilon\) (for an infinite number of games).
\(4^\circ\). Let us return to the general case. We shall denote by \(R_n\) the Euclidean space of dimension \(n\). Suppose that \(\{\sigma\}\) is a compact set in \(R_n\), \(\sigma=\|\sigma_j\|_{j=1}^n\). We shall assume that the following four conditions are satisfied:
\((\Pi_1)\) There exist new controls \(v\), where \(\{v\}\subset R_q\), connected with the old ones by the formula \(u=u(v)\) (where \(u(v)\) is some function), such that if in the course of the \(j\)-th game, for every \(t\) (while \(\mu_t=1\)), \(k\) inequalities \(|\gamma_t^{(h)}|<\varepsilon_t^{(h)}\) are satisfied, where \(\gamma_t^{(h)}=(c_h,v_t)\alpha_t^{(h)}+\beta_t^{(h)}\), \(h=1,\ldots,k\) (the corresponding \(v_t\) are called correct controls), then the robot wins the \(j\)-th game. Here \(\varepsilon'\geq \varepsilon_t^{(h)}\geq \varepsilon''>0\), \(0<\alpha_t^{(h)}\leq \varkappa^{(h)}\), where \(c_h\in R_q\), \(\varepsilon'\), \(\varepsilon''\), \(\varkappa^{(h)}\) are known constants, while the numbers \(\alpha_t^{(h)}, \beta_t^{(h)}\), generally speaking, are “unknown to the brain,” i.e., are expressed in terms of \(\xi,x_t,s_t,x_{t+1},s_{t+1}\).
\((\Pi_2)\) There exists a function \(V^{\mathrm{и}}(\sigma,\xi)\), called the ideal control, such that \(v_t=V^{\mathrm{и}}(\sigma_t,\xi)\) is, for any \(\xi\in M\), a correct control and, moreover, for it \(|\gamma_t^{(h)}|\leq \rho\varepsilon_t^{(h)}\), where \(\rho<1\). (Since \(v_t=V^{\mathrm{и}}(\sigma_t,\xi)\) depends on the unknown \(\xi\), this control, obviously, cannot be used.)
\((\Pi_3)\) For any \(v_t\), the values \(\gamma_t^{(h)}\), \(\varepsilon_t^{(h)}\) can be expressed through data “known to the brain” at the instants \(t\) and \(t+1\), namely through \(v_t,\sigma_t,\sigma_{t+1}\).
\((\Pi_4)\) For all \(\xi\in M\) there exist \(\partial V^{\mathrm{и}}/\partial\sigma_j\), and \(|V^{\mathrm{и}}|\leq \mathrm{const}\), \(|\partial V^{\mathrm{и}}/\partial\sigma_j|\leq \mathrm{const}\).
Instead of \((\Pi_2)\) we shall also consider the assumption \((\Pi_2')\), consisting in the fact that, for \(v_t=V^{\mathrm{и}}(\sigma_t,\xi)\), \(|\gamma_t^{(h)}|\leq \rho\varepsilon_t^{(h)}\), \(h=1,\ldots,k\), where \(\rho<1/2\).
Theorem 1. If \((\Pi_1)-(\Pi_4)\) are satisfied, then brain equations can be constructed so that the resulting robot is intelligent in the class of problems \(M\).
Theorem 2. Suppose that conditions \((\Pi_1)\) with \(k=1\), \((\Pi_2')\), \((\Pi_3)\), \((\Pi_4)\) are satisfied. Suppose the number \(N\) and the “neuron functions” \(v_j(\sigma)\), \(j=1,\ldots,N\), are determined from \(V^{\mathrm{и}}(\sigma,\xi)\) as indicated below in the proof of the theorem. Put \(\{\tau\}=R_N\), \(\tau=\|\tau^{(j)}\|_{j=1}^N\). Define equation (IV) by the relations \(u_t=u(v_t)\), \(v_t=\tau_t^{(1)}v_1(\sigma_t)+\ldots+\tau_t^{(N)}v_N(\sigma_t)\), and equation (V) by the relations: \(\tau_{t+1}=\tau_t\), if \(|\gamma_t^{(1)}|<\varepsilon_t^{(1)}\) or \(\mu_t=0\); \(\tau_{t+1}=\tau_t-\zeta_t a_t\), \(\zeta_t=\gamma_t^{(1)}/\varkappa^{(1)}|a_t|^2\), \(a_t=\|(c_1,v_j(\sigma_t))\|_{j=1}^N\), if \(|\gamma_t^{(1)}|>\varepsilon\) and \(\mu_t=1\). Then the robot will be intelligent in the class of problems \(M\).
Proof of Theorem 2. Let \(\rho_0>0\) be such that \(\rho'=\rho+\varkappa^{(1)}|c_1|\rho_0/\varepsilon^2<1/2\). Using \((\Pi_4)\), find an integer \(N\) such that for
for any \(\sigma \in \{\sigma\}\), \(\xi \in M\), the inequality
\[
\left|V^n(\sigma,\xi)-\{\tau^{(1)}(\xi)v_1(\sigma)+\ldots+\tau^{(N)}(\xi)v_N(\sigma)\}\right|<\rho_0
\]
is satisfied, where \(v_j(\sigma)\in R_q\) are certain continuous vector functions and \(\tau^{(j)}(\xi)\) are scalar functions. We define equation (IV) as indicated in the theorem. For any \(\xi\in M\) and for any equation (V), all variables are successively determined for all \(t=0,1,\ldots\). The inequalities \(|\gamma_t^{(1)}|<\varepsilon_t^{(1)}\) for \(\tau_t=\|\tau^{(j)}\|\) will be written in the form
\[
\left|(\tau_t,a_t)\alpha_t^{(1)}+\beta_t^{(1)}\right|<\varepsilon_t^{(1)},
\tag{1}
\]
where the \(a_t\) have the form indicated in Theorem 2. (Here \(t=0,1,\ldots\), and the inequalities (1) are written down for those \(t\) for which \(\mu_t=1\).) Increasing, if necessary, the number \(N\) by one and setting \(v_N(\sigma)\equiv c_1\), \(\tau^{(N)}(\xi)\equiv 0\), we obtain \(|a_t|\ne 0\), which we shall assume.* If (V) defines a finitely convergent algorithm \((^4)\) for solving the countable system (1), then for any \(\xi\in M\) there exists \(t_0\) such that \(\tau_t=\mathrm{const}\) for \(t\ge t_0\), and all inequalities (1) for \(t\ge t_0\) will be satisfied. According to \((\Pi_1)\), in this case the robot will be intelligent. If \(\alpha_t^{(1)}\equiv 1\), \(\varepsilon_t^{(1)}\equiv\varepsilon\), then to define (V) one can use Theorem 3 \((^4)\). Indeed, it follows from \((\Pi_2)\), \((\Pi_4)\) that, for the indicated choice of \(\rho\) and for \(\tau_t=\|\tau^{(j)}(\xi)\|\), all inequalities (1) are satisfied with \(\varepsilon_t^{(1)}\) replaced by \(\rho\varepsilon_t^{(1)}\). Applying Theorem 3 \((^4)\), we obtain the algorithm indicated in Theorem 2. In the general case Theorem 3 \((^4)\) cannot be used, since the value \(\tau_{t+1}\) supplied by it may depend on \(\xi, x_{t+1}, s_{t+1},\ldots\). However, analogously to \((^4)\), one can easily prove the following assertion, by applying which we obtain, for equation (V), the relations indicated in Theorem 2:
Theorem 3. The formulas given in Theorem 2 for \(\tau_{t+1}\), in which \(\gamma_t^{(1)}\) is equal to the expression under the modulus sign in (1), define a finitely convergent \((^4)\) algorithm for solving the countable system of inequalities (1) for arbitrary vectors \(a_t\) (in general, depending on \(\tau_1,\ldots,\tau_t\)), provided only that \(|a_t^{(1)}|\le \mathrm{const}\) and that it is known that the inequalities (1), in which \(\varepsilon_t^{(1)}\) is replaced by \(\rho'\varepsilon_t^{(1)}\) with \(\rho'<1/2\), have the solution \(\tau_t\equiv\tau_*\).
Theorem 1 is proved in a more complicated way. In the case where \(\alpha_t^{(h)}\equiv 1\), \(\varepsilon_t^{(h)}\equiv\varepsilon_t^{(h)}\), the proof is carried out according to the scheme indicated above, with a superposition of \(k\) algorithms of Theorem 5 \((^4)\), whose proof see in \((^5)\). In this case the brain equations are constructed effectively.
Theorems 1 and 2 remain valid if, in condition \((\Pi_1)\), the inequalities \(|\gamma_t^{(h)}|<\varepsilon_t^{(h)}\), \(h=1,\ldots,k\), are replaced by the inequalities \(|\gamma_t^{(h)}|<\varepsilon_t^{(h)}\) for \(h=1,\ldots,k\), \(\gamma_t^{(h)}>0\) for \(h=k_1+1,\ldots,k_1+k_2=k\). If, in addition, \(\alpha_t^{(h)}\equiv 1\), \(\varepsilon_t^{(h)}\equiv\varepsilon\), then the algorithms \((^4)\) are used.
\(5^\circ\). For robots \(\mathbf{Я}, \mathbf{В}\) the assumptions \(\Pi_1\)—\(\Pi_4\) are satisfied. The ideal control for \(\mathbf{В}\) is defined by equation (6) with \((\gamma_1,\gamma_2)\in E\) (whence \(N=2\), \(v_1(\sigma_t)=\chi_t\), \(v_2(\sigma_t)=\chi_{t-1}\)). Under certain simple assumptions (when Theorem 2 can be applied), the brain equations of the intelligent \(\mathbf{В}\) have the form
\[
\psi_t=\tau_t^{(1)}\chi_t+\tau_t^{(2)}\chi_{t-1},\quad
\tau_{t+1}^{(j)}=\tau_t^{(j)},\quad
\text{if either } V_{t+1}<\varepsilon^2\beta,\ \text{or } V_{t+1}\ge \varepsilon^2R,
\]
\[
V_{t+1}<V_t,\quad \text{or } \mu_t=0,\quad \text{or } \chi_t=\chi_{t-1}=0;
\]
in the remaining cases
\[
\tau_{t+1}^{(1)}=\tau_t^{(1)}-\zeta_t\chi_t,\quad
\tau_{t+1}^{(2)}=\tau_t^{(2)}-\zeta_t\chi_{t-1},\quad
\zeta_t=\chi_{t+1}/(\chi_t^2+\chi_{t-1}^2)\chi_3.
\]
Leningrad State University
named after A. A. Zhdanov
Received
2 IV 1968
REFERENCES
- V. A. Yakubovich, DAN, 182, No. 3 (1968).
- A. G. Loitsyanskii, E. I. Lur’e, Course of Theoretical Mechanics, 3, 1934.
- Yu. I. Neimark, M. A. Fufaev, Mechanics of a Rigid Body, No. 2, 12 (1967).
- V. A. Yakubovich, DAN, 166, No. 6 (1966).
- V. A. Yakubovich, in: Self-Adjusting Systems. Pattern Recognition, Finite Automata, and Relay Devices, “Nauka,” 1967, p. 183.
* This procedure is not needed if the following condition is satisfied: for \(a_t=0\), \(t\in\Delta_j\), the robot wins the \(j\)-th game (the latter takes place for robot \(\mathbf{B}\)). We note that the proof \((^1)\) must be supplemented by an analogous argument. In particular, in Theorem 2 \((^1)\) one must introduce the obvious assumption that
\[
v_1(\sigma_t)^2+\ldots+v_n(\sigma_t)^2\ne 0.
\]