UDC 519.95
Yu. I. SHMUKLER
Submitted 1968-01-01 | RussiaRxiv: ru-196801.51177 | Translated from Russian

Abstract

Full Text

UDC 519.95

CYBERNETICS AND CONTROL THEORY

Yu. I. SHMUKLER

A THERMODYNAMIC MODEL OF ADAPTATION

(Presented by Academician V. A. Trapeznikov on 29 II 1968)

One of the fundamental problems of cybernetics is the study of systems that exhibit “purposeful behavior” (in a sense that is, in some respects, especially specified for each problem). The presence of “purposefulness” is usually associated with processes of learning, adaptation, etc.

As is shown in the present note, a broad class of processes denoted by the terms learning, adaptation, etc., can be interpreted as relaxation processes (transitions to thermodynamic equilibrium) in a certain physical system with a definite spectrum of energy levels, in contact with a thermostat. In this case, analogously to the transition of a physical system to equilibrium, in which an extremum of the corresponding thermodynamic potential is attained, in learning processes a certain functional is minimized, which in the general case does not coincide with the “payment” functional, whose value determines the degree of purposefulness of the system’s behavior.

Let us consider this approach using as an example the problem of the purposeful behavior of automata (^1).

Let an automaton with linear tactics \(L_{2n,2}\) be given, whose functioning is described by a stationary Markov chain with matrix \(P\) of probabilities \(p_{ij}\) of transitions from the \(i\)-th state to the \(j\)-th,* with

\[ \sum_j p_{ij}=1. \]

We shall say that a force field acts on the automaton if, for the matrix \(P\),

\[ \sum_i p_{ij}\ne 1 \tag{1} \]

(the matrix is not doubly stochastic). In the opposite case the final probabilities of the automaton states are equal (^2), and no purposeful behavior (in Tsetlin’s sense) is observed.

The similarity is striking between transitions of the automaton from state to state under the action of random “rewards” and “punishments” (the automaton is in a “field”) and the Brownian motion of particles in a force field, occurring under the action of random thermal perturbations.

We shall interpret the automaton (under the action of the field) as a physical system with a discrete spectrum of energy levels (each state of the automaton is characterized by an energy \(\varepsilon_i\)). (In the absence of a “field,” the spectrum of the automaton consists of a single level, \(2n\)-fold degenerate.)

In accordance with the construction of the automaton \(L_{2n,2}\), the system passes from each state only to two neighboring states, with probabilities \(p_{i,i-1}\) and \(p_{i,i+1}\). We shall interpret the environment as a thermostat with temperature \(T\) (\(T=\mathrm{const}\)). Then the transition of the probability distribution of the states of the system to the final distribution can be interpreted as the process of transition of a physical system to a state of thermodynamic equilibrium, characterized by the Gibbs distribution

\[ w_i=\frac{1}{Z}e^{-\varepsilon_i/T}, \tag{2} \]

where

\[ Z=\sum_i e^{-\varepsilon_i/T} \]

is the so-called partition function.

* Here and below the states are renumbered in the order of possible transitions.

In statistical physics the energy $\varepsilon_i$ and the temperature $T$ are given, and the main difficulty lies in determining the statistical sum $Z$. In our case, however, $Z$ is given (it is easily computed from the final probability distribution), but $T$ and $\varepsilon_i$ cannot be determined separately, only in the form of the ratio $\varepsilon_i/T$.

As is known (3), when such a system passes to thermodynamic equilibrium, the minimum of the free energy $F$ is realized

\[ F = E - TS, \tag{3} \]

where $S$ is the entropy and $E$ is the mean energy of the system.

The free energy (defined in the present case also up to a multiplicative constant) is a functional of the probability distribution over energy levels:

\[ \frac{F}{T} = \sum_i \frac{\varepsilon_i}{T} w\left(\frac{\varepsilon_i}{T}\right) + \sum_i w\left(\frac{\varepsilon_i}{T}\right) \ln w\left(\frac{\varepsilon_i}{T}\right). \tag{4} \]

The energy spectrum of the system has the form

\[ \begin{aligned} \varepsilon_i / T &= (i-1)\ln(q_1/p_1) \qquad (1 < i < n),\\ \varepsilon_{n+i} / T &= (n-1)\ln(q_1/p_1) + \ln(p_2/p_1) + \\ &\quad + (i-1)\ln(p_2/q_2) \qquad (n < i < 2n). \end{aligned} \tag{5} \]

The system has an equidistant spectrum when $\ln(q_1/p_1)=\ln(p_2/q_2)$, i.e., when $p_1 = 1 - p_2$.

It is easy to see that the free-energy functional does not coincide with the payoff functional: $F$ reaches its minimum at the Gibbs distribution, while the mathematical expectation of the payoff $M$ (for $p_2 > p_1$) reaches it on a step function such that

\[ \sum_{i=n+1}^{2n} w_i = 0. \]

The approach under consideration makes it possible to solve complex problems concerning the interaction of a collective of automata.

Let there be $N$ automata with a linear tactic with an infinite number of states. We specify the following rules of interaction:

1) At each cycle of operation of the system only two automata interact; for any pair the probabilities of interaction are equal

\[ P_{\text{inter.}} = 1/C_N^2. \]

2) Suppose that, before the interaction, a pair of automata $i$ and $j$ are in states $k_i$ and $k_j$. The conditional transition probabilities are equal to:

\[ \begin{aligned} P(k_i - 1,\, k_j + 1 / k_i,\, k_j) &= 0.5,\\ P(k_i + 1,\, k_j - 1 / k_i,\, k_j) &= 0.5,\\ P(1,\, k_j - 1 / 0,\, k_j) &= 0.5,\\ P(0,\, k_j / 0,\, k_j) &= 0.5,\\ P(0,\, 0 / 0,\, 0) &= 1. \end{aligned} \tag{6} \]

The state of an ensemble of $N$ automata is described by an $N$-dimensional vector whose components are the numbers of the states of the automata $(k_1, k_2, \ldots, k_i, \ldots, k_j, \ldots, k_N)$. Each such state can pass into no more than $2C_N^2$ states (the presence of zero components reduces the number of possibilities).

The state $\alpha = (k_1, k_2, \ldots, k_i, \ldots, k_j, \ldots, k_N)$ passes with probability $0.5$ into the state $\beta_1 = (k_1, k_2, \ldots, k_i + 1, \ldots, k_j - 1, \ldots, k_N)$ and with probability $0.5$ into the state $\beta_2 = (k_1, k_2, \ldots, k_i - 1, \ldots, k_j + 1, \ldots, k_N)$.

Let us specify the quantity \(K\)—the sum of the state numbers of all automata (\(k\) is of order \(N\)).

Since \(K\) and \(N\) are large numbers, the matrix of transition probabilities \(A_{\alpha\beta}\) will be almost impossible to survey. However, we would like to answer the question: what is the final probability distribution of the states of an individual automaton after sufficiently long interaction with the other \(N-1\) automata, which play for it the role of an environment? Will its behavior be expedient?

Let us give a statistical interpretation of the problem. We shall regard the ensemble of \(N\) automata as an ideal gas of \(N\) molecules, the states of the ensemble as the microstates of the system, and the \(k\)-th state of an automaton as a discrete energy level \(\varepsilon_k=k\). Note that the quantity \(K\), as is seen from the interaction law, preserves its value also for those states into which the system passes. Therefore \(K\) may be identified with the energy of the whole gas. The quantity \(K\) determines the constant-energy surface in the phase space of states, along which the system moves.

It can be shown that the coefficients of the matrix \(A_{\alpha\beta}\) are symmetric: \(A_{\alpha\beta}=A_{\beta\alpha}\). Their symmetry is an expression of the principle of microscopic reversibility (also called the hypothesis of molecular chaos) and ensures equality of the final probabilities of the microstates (by virtue of the double stochasticity of the matrix \(A\)). Thus we obtain the microcanonical distribution corresponding to the state of thermodynamic equilibrium of the system with exactly specified energy \(K\).

As is well known, the statistical distribution for a small subsystem that is part of a large closed system in equilibrium is the Gibbs distribution corresponding to some fixed temperature \(T\) (the role of the thermostat is played by the large closed system).

Taking one of the automata as the subsystem and taking into account that, by symmetry considerations, the mean energy of one automaton is equal to \(\bar{k}=K/N\), we can immediately write for it the distribution of the final probabilities in the form:

\[ w_k=e^{-k/T}/\sum_k e^{-k/T}=(1-e^{-1/T})e^{-k/T}, \tag{7} \]

where the temperature \(T\) is determined from the condition

\[ \bar{k}=\sum_{k=0}^{\infty} k w_k=\frac{1}{e^{1/T}-1}. \tag{8} \]

From this we find

\[ w_k=\frac{N}{N+K}\left(\frac{K}{N+K}\right)^k =\frac{N}{N+K}\,e^{-k\ln (N+K)/K}. \tag{9} \]

The quantity \(\ln (N+K)/K\) plays the role of \(1/T\).

In the present case formula (9) can be checked on the basis of combinatorial considerations. The number of microstates \(\Omega\) corresponding to a given \(K\) is determined as the number of ways in which the sum \(K\) can be formed from \(N\) summands (3):

\[ \Omega=C_{K+N-1}^{N-1}. \]

(Each summand may take any value from 0 to \(K\), which corresponds to Bose–Einstein statistics.)

If an automaton is in the \(k\)-th state, then the sum of the state numbers of the remaining \(N-1\) automata must be \((K-k)\), which can be realized in the number of ways

\[ \Omega_k=C_{N+K-k-2}^{N-2}. \]

Hence, by virtue of the equiprobability of microstates:

\[ w_k=\Omega_k/\Omega=C_{N+K-k-2}^{N-2}/C_{N+K}^{N-1}. \]

Using Stirling’s formula, we obtain (9).

For each of the automata, interaction with the environment, issuing “rewards” and “punishments,” is equivalent to interaction with the remaining \(N-1\) automata.

It is easy to show that the transition probabilities established in the automaton as a result of the interaction are equal to:

\[ p_{k,k-1}=\sum_{j=0}^{\infty} \frac{1}{2} w_j=\frac{1}{2}; \]

\[ p_{k-1,k}=\sum_{j=1}^{\infty} \frac{1}{2} w_j=\frac{1}{2}(1-w_0). \]

The logarithm of their ratio has the meaning of a quantity reciprocal to the temperature. The final probabilities depend only on the temperature, whereas the relaxation time is determined by the values of the transition probabilities themselves. In this case each automaton is asymptotically optimal.

The interpretation introduced here of the purposive behavior of a system at the macrolevel as the evolution of a relaxing physical system at the microlevel leads to the conclusion that ordinary mixing processes, which lead to the establishment of thermodynamic equilibrium, may possess, from the point of view of cybernetics, features of purposiveness—a property attributed to systems that are purely nonequilibrium and ordered.

There is exactly as much purposiveness in the behavior of an automaton as there is in the motion of a Brownian particle in a gravitational field.

The author expresses deep gratitude to A. Ya. Lerner for useful discussions and support in the work.

Institute of Automation and Telemechanics
(Technical Cybernetics)

Received
19 II 1968

CITED LITERATURE

  1. M. L. Tsetlin, UMN, 18, no. 4 (112), 7 (1963).
  2. V. M. Romanovskii, Discrete Markov Chains, Moscow, 1949, p. 65.
  3. L. D. Landau, E. M. Lifshitz, Statistical Physics, “Nauka,” Moscow, 1964, p. 183.

Submission history

UDC 519.95