Full Text
CYBERNETICS AND CONTROL THEORY
A. S. KHOLEVO
AUTOMATA THAT PREDICT A RANDOM PROCESS
(Presented by Academician A. A. Dorodnitsyn, 26 II 1965)
The theory of prediction of random processes makes it possible to construct the best prediction starting from a known spectral or correlation function (see, for example, (¹)). However, often the correlation function is not known in advance and is computed after observation of a sufficiently long segment of the process. It is natural to pose the problem of finding the best prediction starting from direct observation of the course of the process, bypassing the stage of computing the correlation function. In this note it is shown that such a problem can be solved by a certain asymptotically optimal sequence of automata (the terminology will be specified below).
We consider processes with discrete time, taking two different values (0 and 1). In what follows they are called binary. Binary processes, in the approximation adopted, are a model of processes occurring along nerve fibers, and the corresponding automata have the character of “neural networks.”
The following restriction is imposed on the random process to be predicted: the process must be Markovian, homogeneous, \(N\)-dependent, and must have a final probability distribution (¹). Let \(\{\xi_t,\ t=0,1,\ldots\}\) be a process possessing the listed properties. Introduce notation for the transition probabilities
\[ p(\xi_t=\alpha_0 \mid \xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N) = p(\alpha_0 \mid \alpha_1,\ldots,\alpha_N) \qquad (\alpha_i=0,1). \]
In accordance with the assumption of homogeneity, these probabilities do not depend on \(t\), and there exist limits
\[ \lim_{t\to\infty} p(\xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N) = \pi_{\alpha_1\ldots\alpha_N}, \]
which form the final probability distribution.
It is natural to choose the predicting function \(H\) from the class \(B_N\) of Boolean functions of \(N\) variables. Denoting by \(\Delta_t\) the prediction error, we have
\[ \Delta_t=\xi_t \dot{+} \tilde{\xi}_t = \xi_t \dot{+} H(\xi_{t-1},\ldots,\xi_{t-N}), \]
where the operation \(\dot{+}\) is addition modulo 2.
The best prediction is understood in the sense that the mathematical expectation of the error, under the condition that the prediction is made by the function \(H\), in the limit (as \(t\to\infty\)) must be the smallest possible in the class \(B_N\). Marking by the subscript \(H\) conditional expectations and probabilities under the condition that the prediction is made by the function \(H\), we have
\[ \lim_{t\to\infty} E_H\Delta_t = \]
\[ = \lim_{t\to\infty} \sum_{\alpha_1\ldots\alpha_N} E_H(\Delta_t \mid \xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N)\, p(\xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N)= \]
\[ = \lim_{t\to\infty} \sum_{\alpha_1\ldots\alpha_N} p_H(\Delta_t=1 \mid \xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N)\, p(\xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N). \]
Let us denote by \(\bar{x}\) the negation of the quantity \(x\) and, using the notation for the final distribution, we obtain further
\[ \lim_{t\to\infty} E_H \Delta_t = \]
\[ = \sum_{\alpha_1\ldots \alpha_N} \lim_{t\to\infty} p\{\xi_t=\overline{H(\xi_{t-1},\ldots,\xi_{t-N})}\mid \xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N\}\pi_{\alpha_1\ldots\alpha_N}. \]
Let
\[
p_{\alpha_1\ldots\alpha_N}
=
\min\{p(1\mid \alpha_1,\ldots,\alpha_N),\,
p(0\mid \alpha_1,\ldots,\alpha_N)\}.
\]
From the equality obtained it follows that for any function \(H\in B_N\)
\[ \lim_{t\to\infty} E_H\Delta_t \geq \sum_{\alpha_1\ldots\alpha_N} p_{\alpha_1\ldots\alpha_N}\pi_{\alpha_1\ldots\alpha_N}. \tag{1} \]
Therefore, denoting
\[
\rho := \min_{H\in B_N}\lim_{t\to\infty}E_H\Delta_t,
\]
we have
\[ \rho \geq \sum_{\alpha_1\ldots\alpha_N} p_{\alpha_1\ldots\alpha_N}\pi_{\alpha_1\ldots\alpha_N}. \]
On the other hand, choose the function \(H\) as follows:
\[ H(\alpha_1,\ldots,\alpha_N)= \begin{cases} 1, & \text{if } p(1\mid \alpha_1,\ldots,\alpha_N)\geq p(0\mid \alpha_1,\ldots,\alpha_N),\\ 0, & \text{if } p(1\mid \alpha_1,\ldots,\alpha_N)< p(0\mid \alpha_1,\ldots,\alpha_N). \end{cases} \]
For this function
\[ \lim_{t\to\infty} E_H\Delta_t = \sum_{\alpha_1\ldots\alpha_N} p_{\alpha_1\ldots\alpha_N}\pi_{\alpha_1\ldots\alpha_N}. \tag{2} \]
From (1) and (2) it follows that
\[ \rho = \sum_{\alpha_1\ldots\alpha_N} p_{\alpha_1\ldots\alpha_N}\pi_{\alpha_1\ldots\alpha_N}. \tag{3} \]
Thus the minimum mean prediction error is given by expression (3).
We shall now describe a sequence of automata \(\Pi_{n,N}\) \((n=1,2,\ldots)\) for which the asymptotic equality
\[ \lim_{n\to\infty} p\lim_{t\to\infty} E_{\Pi_{n,N}}\Delta_t=\rho \tag{4} \]
holds.
Here \(E_{\Pi_{n,N}}\Delta_t\) is the mathematical expectation of the error of the automaton \(\Pi_{n,N}\) at time \(t\). It is natural to call the property of the automata \(\Pi_{n,N}\), expressed by equality (4), asymptotic trainability to the best prediction. In this sense the sequence of automata \(\Pi_{n,N}\) \((n=1,2,\ldots)\) is asymptotically optimal.
The automata \(\Pi_{n,N}\) are constructed as follows. An arbitrary function \(H\in B_N\) is uniquely represented in the form
\[ H(\xi_{t-1},\ldots,\xi_{t-N}) = \bigcup_{\alpha_1\ldots\alpha_N} c_{\alpha_1\ldots\alpha_N}\xi_{t-1}^{\alpha_1}\cdots \xi_{t-N}^{\alpha_N} \quad (c_{\alpha_1\ldots\alpha_N}=0,1), \]
where \(\xi^0=\bar{\xi}\), \(\xi^1=\xi\) (see, for example, (2)). For convenience denote
\[
\zeta_t^{\alpha_1\ldots\alpha_N}
=
\xi_{t-1}^{\alpha_1}\cdots \xi_{t-N}^{\alpha_N}.
\]
The quantity \(\zeta_t^{\alpha_1\ldots\alpha_N}\) is equal to one if and only if
\(\xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N\). The choice of the function \(H\) is equivalent to the choice of a specific set of constants \(c_{\alpha_1\ldots\alpha_N}\). The learning process reduces to selecting suitable constants \(c_{\alpha_1\ldots\alpha_N}\). This selection is carried out by automata \(A_{\alpha_1\ldots\alpha_N}^n\) (there are \(2^N\) automata in all), each of which is isomorphic to an automaton \(A^n\) with two actions, and the sequence \(A^n\) \((n=1,2,\ldots)\) is asymptotically optimal in any stationary random environment as \(n\to\infty\) (\(^3\)). The first action of the automaton \(A_{\alpha_1\ldots\alpha_N}^n\) consists in assigning the value 0 to the constant \(c_{\alpha_1\ldots\alpha_N}\), the second—the value 1. The input signal is the error
\[
\Delta_t=\xi_t \dotplus H(\xi_{t-1},\ldots,\xi_{t-N}).
\]
\(\Delta_t=1\) is perceived as a penalty, \(\Delta_t=0\) as a non-penalty. The automaton \(A_{\alpha_1\ldots\alpha_N}^n\) must operate, i.e., make the next transition, only in the case when
\[
\zeta_t^{\alpha_1\ldots\alpha_N}=1.
\]
It is important to note that at each moment of time-
only one of the quantities \(\xi_t^{\alpha_1\ldots \alpha_N}\) is equal to 1. Therefore, at each moment of time only one of the automata makes the next transition. The described collection of automata \(A_{\alpha_1\ldots \alpha_N}^n\), corresponding to all possible sets \((\alpha_1,\ldots,\alpha_N)\), for fixed \(n\), forms the automaton \(\Pi_{n,N}\). The scheme of the automaton \(\Pi_{n,N}\) is shown in Fig. 1.
Fig. 1. Scheme of the automaton \(\Pi_{n,N}\). Notation: \(\vee\) — disjunction, \(\wedge\) — conjunction, \(+\) — addition modulo 2, \(\bigcirc\) — unit-delay element. The automaton \(A_{\alpha_1\ldots \alpha_N}^n\) makes the next transition only under the condition \(\xi_t^{\alpha_1\ldots \alpha_N}=1\)
Theorem. The sequence of automata \(\Pi_{n,N}\) \((n=1,2,\ldots)\) is asymptotically trained to the best prediction.
The assertion of the theorem is contained in equality (4). For the proof, fix some set \((\alpha_1,\ldots,\alpha_N)\) and consider the automaton \(A_{\alpha_1\ldots \alpha_N}^n\). If \(\xi_t^{\alpha_1\ldots \alpha_N}=1\), then
\[ \Delta_t=\xi_t+c_{\alpha_1\ldots \alpha_N}^t \]
(\(c_{\alpha_1\ldots \alpha_N}^t\) is the value of the constant \(c_{\alpha_1\ldots \alpha_N}\) at time \(t\)). Therefore, if the automaton \(A_{\alpha_1\ldots \alpha_N}^n\) makes the first action at time \(t\) \((c_{\alpha_1\ldots \alpha_N}^t=0)\), it is penalized with probability \(p(\xi_t=1\mid \xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N,\ldots,\xi_0=\alpha_t)\) (\(\alpha_k\) is the value of the process already known by time \(t\) at time \(t-k\)). By virtue of \(N\)-dependence, this probability is equal to \(p(\xi_t=1\mid \xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N)=p(1\mid \alpha_1,\ldots,\alpha_N)\). Analogously, if the automaton makes the second action, it is penalized with probability \(p(0\mid \alpha_1,\ldots,\alpha_N)\).
Thus, during the operation of the automaton, i.e., under the condition \(\xi_t^{\alpha_1\ldots \alpha_N}=1\), the automaton \(A_{\alpha_1\ldots \alpha_N}^n\) is in a stationary random environment. Define the proper time \(t_{\alpha_1\ldots \alpha_N}\) of the automaton \(A_{\alpha_1\ldots \alpha_N}^n\) as the total duration of operation of the automaton during time \(t\). Obviously, \(t_{\alpha_1\ldots \alpha_N}\) is equal to that interval of the total time \(t\) during which \(\xi_\tau^{\alpha_1\ldots \alpha_N}=1\). It follows from this that, for those sets \((\alpha_1,\ldots,\alpha_N)\) for which
\[ \lim_{t\to\infty} p(\xi_t^{\alpha_1\ldots \alpha_N}=1)=\pi_{\alpha_1\ldots \alpha_N}>0, \]
\[ p\lim_{t\to\infty} t_{\alpha_1\ldots \alpha_N}=\infty. \]
If we denote by \(\rho_{\alpha_1\ldots \alpha_N}^{t,n}\) the mathematical expectation of the error of the automaton \(A_{\alpha_1\ldots \alpha_N}^n\) at time \(t\) under the condi-
we see that $\zeta_t^{\alpha_1\ldots\alpha_N}=1$; then, by virtue of the asymptotic optimality of $A_{\alpha_1\ldots\alpha_N}^n$ in a stationary random environment, we obtain
\[ \lim_{t\to\infty} p\lim_{t\to\infty}\rho_{\alpha_1\ldots\alpha_N}^{t,n} = \min\{p(1\mid \alpha_1\ldots\alpha_N),\,p(0\mid \alpha_1\ldots\alpha_N)\} = p_{\alpha_1\ldots\alpha_N}. \tag{5} \]
The mathematical expectation of the error of the entire automaton $\Pi_{n,N}$ can be written as
\[ E_{\Pi_{n,N}}\Delta_t = \sum_{\alpha_1\ldots\alpha_N} E_{\Pi_{n,N}}(\Delta_t\mid \zeta_t^{\alpha_1\ldots\alpha_N}=1)\, p(\zeta_t^{\alpha_1\ldots\alpha_N}=1) = \]
\[ = \sum_{\alpha_1\ldots\alpha_N} E_{\Pi_{n,N}}(\Delta_t\mid \zeta_t^{\alpha_1\ldots\alpha_N}=1)\, p(\xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N). \tag{6} \]
But since at time $t$ only one of the automata $A_{\alpha_1\ldots\alpha_N}^n$ is operating (the one for which $\zeta_t^{\alpha_1\ldots\alpha_N}=1$), we have
\[ E_{\Pi_{n,N}}(\Delta_t\mid \zeta_t^{\alpha_1\ldots\alpha_N}=1) = \rho_{\alpha_1\ldots\alpha_N}^{t,n}. \tag{7} \]
Taking into account (7), (5), (6), and also the fact that $p(\xi_{t-1}=\alpha_1,\ldots,\xi_{t-N}=\alpha_N)$ tends to the final distribution $\pi_{\alpha_1\ldots\alpha_N}$, we obtain:
\[ \lim_{n\to\infty} p\lim_{t\to\infty} E_{\Pi_{n,N}}\Delta_t = \sum_{\alpha_1\ldots\alpha_N} p_{\alpha_1\ldots\alpha_N}\pi_{\alpha_1\ldots\alpha_N}. \]
Hence, using (3), we obtain (4), as was required.
I express my gratitude to V. G. Sragovich for posing the problem and for valuable advice.
Computing Center
Academy of Sciences of the USSR
Received
22 II 1965
REFERENCES
- J. L. Doob, Stochastic Processes, Moscow, 1956.
- V. M. Glushkov, Synthesis of Digital Automata, Moscow, 1962.
- M. L. Tsetlin, UMN, 18, no. 4 (1963).