Full Text
CYBERNETICS AND THE THEORY OF REGULATION
V. I. BRYZGALOV, I. I. PYATETSKII-SHAPIRO, M. L. SHIK
ON A TWO-LEVEL MODEL OF INTERACTION OF AUTOMATA
(Presented by Academician M. V. Keldysh on 5 IX 1964)
The work is devoted to the consideration of a system consisting of automata, the game interaction among which is corrected by a certain device. This device changes the system of interaction of the automata so that the behavior of the automata in the game will be prescribed.
The principal difficulty that arises here is the following. The time over which the average fraction of automata performing a given action (behavior) is reliably established, for any specified and fixed interaction, is very large. Thus, for \(N = 50\) this time is of the order of \(10^5\)—\(10^6\) periods of the game (see \((^{8-10})\)). At present it is shown that one can correct the system of interaction within several tens of periods. Of course, each time the behavior of the aggregate of \(N\) automata corresponds very inaccurately to the prescribed system of interaction, but on the average the correspondence proves to be sufficiently exact. We note that only thanks to such a procedure is it possible to solve the problem in an acceptable time (\(10^3\)—\(10^4\) periods).
The method of controlling interaction investigated on the model may be useful in considering the regulation of the activity of homogeneous neuronal systems, in particular, the motor units of a muscle \((^4)\).
In the present work the results of modeling on a computer are presented. It is of considerable interest to construct a general mathematical theory of correction of a system of interaction and, in particular, an analytical consideration of the model described below.
Let us pass to a detailed description of the model. Let \(A_1,\ldots,A_N\) be automata having two actions, 0 and 1. Two signals arrive at the input of each, which we shall conventionally call a win and a loss. These automata participate in a game \((^1)\). At each period of the game the automaton \(A_l\) \((1 \le l \le N)\) wins with probability \(p(\varepsilon_l,k)\), where \(\varepsilon_l\) is the action that the automaton made in the preceding period, and \(k\) is the number of automata that made action 1 in the preceding period \((^{1-3})\). The numbers \(p(\varepsilon,k)\), \(\varepsilon = 0,1;\ 0 \le k \le N\), determine the game interaction of the automata. Such a game is called symmetric. The work \((^8)\) is devoted to modeling such a game on a computer.
In the work the behavior is considered of a system consisting of \(N\) automata with linear tactics, participating in the described game, and of a device B, which changes the game interaction, i.e. the set of numbers \(p(\varepsilon,k)\) (only such sets were considered that \(p(0,k) + p(1,k) = 1\) for all \(k\), and \(0.2 \le p(0,k) \le 0.8\)), depending on the behavior of the automata in the game. The task of device B consists in producing such an interaction under which the average number of automata performing action 1 is equal to a previously specified number \(\theta_0\). The operation of device B proceeds as follows. At the initial moment \(p(0,k) = 0.5,\ 0 \le k \le N\). Device B simultaneously changed by \(h\) the \(v\) adjacent numbers \(p(0,k)\). After this the automata were given the opportunity to play \(\tau\) periods, and the average fraction of automata \(\theta_\tau\), performing during \(\tau\)
steps, action 1. According to the value of $\theta_T$, the change in the interaction was classified as favorable, unfavorable, or indifferent. A change was called favorable if, for it, $\theta_T$ was closer to the prescribed $\theta_0$ (the final effect) than the preceding $\theta_T$. If $\theta_T$ differed little from the preceding one, the change was called indifferent. Otherwise it was called unfavorable. A favorable change of the numbers $p(o,k)$ was retained and, if possible, an additional change in the same direction was made. If a change in the same direction was no longer possible, while the desired interaction had not yet been found, then device B changed $p(o,k)$ for $k_0+\nu \leq k < k_0+2\nu$. From an indifferent change device B refused, and passed to the next value of $k$. In the case of an unfavorable change, device B refused it and made a change of $p(o,k)$ $(k_0 \leq k < k_0+\nu)$ in the other direction.
Experience in modeling the game of automata on a digital computer showed that stable average characteristics are obtained only for a sufficiently large number of steps of the game (of the order of $10^4$–$10^5$ steps when the number of automata is $\sim 50$). On the other hand, the number of changes of the game interaction must be, on average, no less than several tens. Thus, with such a method of searching for a solution, $\sim 10^6$ steps would be required.
In the present work it is shown that the game interaction can be changed after $\tau = 10 \div 100$ steps of the game. In this case, for 50 automata one can select optimal parameters (memory $n=5$, $\nu=2$, $\tau = 10 \div 50$), for which the solution was found with satisfactory accuracy in 70% of the cases (with other parameter values, more rarely).
The system was required to find such an interaction for which the fraction of automata performing, on average, action 1 would vary within the limits from 0.15 to 0.25, while in the follow-up calculation, by which in what follows is meant a game with a fixed interaction over 10,000 steps, it would vary from 0.12 to 0.28.
The efficiency of the search could have been substantially increased by means of simple additional techniques (for example, by increasing $\tau$ as the required value $\theta_0$ is approached, etc.), but in this work we did not do this.
The least time $T=\tau r$, where $r$ is the number of changes of interaction over which the required final effect is achieved, for a system of 50 automata is obtained for $\nu=2$, $n=5$, $\tau=10 \div 50$. In this case the magnitude $T$ is of the order of 1000–3000 and $r=150 \div 70$. The choice $\nu=2$ is connected with the fact that, for larger $\nu$, in view of the decrease in the number of degrees of freedom of the learning system, the answer can be found only roughly. For $\nu=1$ the majority of changes turn out to be indifferent, and the prescribed final effect cannot be found. At each change the probability $p(0,k)$ was changed by 0.15. With smaller changes, a larger part of the changes of interaction turned out to be indifferent.
In the process of the search, the number of changes of actions per step, for all 50 automata, with small memory $(n=3)$ was 5–8; with large memory $(n=8)$ only 1–2. This is probably connected with the impossibility of solving the problem for large $\tau$ by automata with large memory. During the follow-up calculation, the number of changes of actions by all automata decreased with the growth of memory (106,000 for $n=2$, 61,000 for $n=3$, 35,000 for $n=5$, 17,000 for $n=8$). The behavior of the system during the follow-up calculation characterizes the stability of the interaction that has been worked out.
With optimal memory $(n=5)$, the number $r$ increases with increasing $\tau$. On the other hand, the number $r$ also increases when the memory of the automata is increased to $n=7 \div 8$. Apparently, these phenomena are explained by the fact that many automata pass into deep states, as a result of which replaying when the interaction is changed is made difficult. The experiment suggests that, with sufficiently large memory of the automata, the problem cannot be solved at all.
For small \(\tau\) and memory \(n = 2 \div 4\), there are few indifferent changes of interaction. Therefore, after the changes \(p(0,k)\) have been tried once and the required interaction has not been found, it is unlikely that, when it is found, it will turn out to be correct. In connection with this, when the number of changes of interaction for small \(\tau\) and \(n = 2 \div 4\) was large, the interaction found was often unsatisfactory.
In the search process the change \(\theta_\tau\) does not occur uniformly, but “step by step.” Here the duration of a step is usually 20–30 cycles. This apparently means that, in the set of numbers \(p(0,k)\), only a small part of them has substantial significance.
We did not set ourselves the goal of tying the model investigated to any specific physiological problem. Nevertheless, certain features of it should be noted that recall known characteristics of the process of learning the simplest motor tasks.
It has been noted more than once \((^5,^6)\) that sensory corrections are necessary in the course of performing (learning) a movement, without waiting for the final effect. In the model described, the replacement of the system of interaction also occurred long before the moment at which it could be guaranteed that the mean number of automata performing action 1 under the given game interaction had become established. Then, thanks to this, the solution of the problem becomes possible within an acceptable time.
We hope that the method of control investigated on this model, in which the correction of the game interaction is carried out long before the reliable establishment of the corresponding behavior of the interacting elements, may be useful in considering the control of motor units of a muscle in the problem of finding and maintaining a posture (a definite joint angle).
The model considered has the property that the accuracy of solving the problem is bounded from below. This is connected with the fact that changes in the system of interaction must be sufficiently abrupt. The point is that, as was indicated above, the time between two successive changes of the interaction system must be small. But then the effect of a small change in the interaction system will in most cases be evaluated incorrectly by the system. When learning the task of maintaining a posture under conditions not previously encountered in the subject’s life (in a “tilted field”), the amplitude of tremor increases severalfold in comparison with the initial one \((^7)\). It may be supposed that this is connected with the need for sufficiently abrupt corrections in the learning task. From this point of view, an increase of tremor when learning a new motor task is inevitable.
We express our gratitude to I. M. Gel'fand and M. L. Tsetlin, whose idea of control by changing the system of interaction among controlled elements served as the starting point for this investigation.
Received
18 VIII 1964
CITED LITERATURE
\(^1\) M. L. Tsetlin, UMN, 18 (112) (1963).
\(^2\) I. M. Gel'fand, V. S. Gurfinkel, M. L. Tsetlin, Biological Aspects of Cybernetics, Publishing House of the Academy of Sciences of the USSR, 1962.
\(^3\) I. M. Gel'fand, M. L. Tsetlin, I. I. Pyatetskii-Shapiro, DAN, 152, No. 4 (1963).
\(^4\) I. I. Pyatetskii-Shapiro, M. L. Shik, Biophysics, 9, 494 (1964).
\(^5\) N. A. Bernstein, On the Construction of Movements, 1947.
\(^6\) F. Szentagothai, Gy. Székely, Acta Physiol. Acad. sci. Hung., 10, 43 (1956).
\(^7\) V. I. Krinskii, M. L. Shik, Biophysics, 8, 513 (1963).
\(^8\) V. I. Bryzgalov, V. A. Borovkov, Automation and Remote Control, 26, issue 3 (1965).
\(^9\) M. L. Tsetlin, S. L. Ginzburg, V. Yu. Krylov, Automation and Remote Control, 25, issue 5 (1964).
\(^ {10}\) V. I. Bryzgalov, I. M. Gel'fand, I. I. Pyatetskii-Shapiro, M. L. Tsetlin, Automation and Remote Control, 25, issue 11 (1964).