CYBERNETICS AND CONTROL THEORY
For the mathematical description of certain experiments on learning, the following scheme was proposed in (1).
Submitted 1964-01-01 | RussiaRxiv: ru-196401.49015 | Translated from Russian

Full Text

CYBERNETICS AND CONTROL THEORY

E. S. USACHEV

ON LIMITING DISTRIBUTIONS IN A STOCHASTIC MODEL OF LEARNING

(Presented by Academician A. A. Dorodnitsyn, 29 IV 1964)

For the mathematical description of certain experiments on learning, the following scheme was proposed in (1).

The learner (hereinafter called the subject), at time \(\tau\) (in the \(\tau\)-th experiment), produces a response \(R_j \in R = (R_1, \ldots, R_r)\) with probability \(p_\tau(R_j)\). In reply to it, the environment (or the experimenter) carries out an action \(S_k \in S = \{S_1, \ldots, S_r\}\) with probability \(\pi_{kj} = p(S_k/R_j)\) (\(p(S_k/R_j)\) is the conditional probability of the environmental action \(S_k\), if the subject has made the response \(R_j\)). The entire past “experience” of the subject, all the “knowledge” about the environment accumulated by him up to time \(\tau\), is contained in the mode of his behavior—in the probability distribution of his responses
\[ \mathbf p_\tau = (p_\tau(R_1), \ldots, p_\tau(R_r)). \]
Additional “knowledge” about the environment, obtained by the subject as a result of the \(\tau\)-th experiment, serves to change the probabilities of his responses and is completely contained in the event \((R_j, S_k)\)—the subject’s response in this experiment and the environment’s reply to it. Mathematically this is expressed by specifying functions \(\varphi_{kj}\) mapping the set of probability measures on \(R = \{R_1 \ldots R_r\}\) into itself. Moreover, if \(\mathbf p_{\tau'} = \mathbf p_{\tau''}\) and both experiments (the \(\tau'\)-th and the \(\tau''\)-th) ended with one and the same event \((R_j, S_k)\), then
\[ \mathbf p_{\tau'+1} = \mathbf p_{\tau''+1} = \varphi_{kj}(\mathbf p_{\tau'}) = \varphi_{kj}(\mathbf p_{\tau''}). \]
The response of the environment in the \(\tau\)-th experiment is random, and the probability that it will end with the event \((R_j, S_k)\) depends only on \(\mathbf p_\tau\); therefore \(\mathbf p_\tau\) is a homogeneous Markov chain.

Under what conditions will the subject eventually learn something? What happens to the subject under an unbounded increase of learning time? To what extent does the model agree with the experimental data?

To answer these questions it is necessary to study the asymptotic properties of the chain \(\mathbf p_\tau\).

In (1) it is assumed that the functions \(\varphi_{kj}\) are linear:
\[ \varphi_{kj}=\alpha_{kj}E+(1-\alpha_{kj})Q_{kj}, \]
where \(0 \le \alpha_{kj} \le 1\), \(E\) is the identity matrix, and \(Q_{kj}\) is a stochastic matrix whose rows are the vector
\[ \mathbf q_{kj}=(q_{kj1}, \ldots, q_{kjr}). \]
Suppose that the set of the subject’s responses consists of two elements, \(R=\{R_1,R_2\}\), and denote \(p_\tau(R_1)=p_\tau\), \(q_{kj1}=q_{kj}\), \(\alpha_{kj}=(1-a_{kj})q_{kj}\), \(M_{m,\tau}\) the moment of order \(m\) of the random variable \(p_\tau\).

Consider three types of environmental responses:

a) the environmental responses do not depend on the subject’s responses
\[ (\pi_{kj}=\pi_k,\quad a_{kj}=a_k,\quad q_{kj}=q_k); \]

b) the environmental responses are uniquely determined by the subject’s responses
\[ (S=(S_1,S_2),\quad \pi_{kj}=\delta_{kj},\quad a_{kj}=a_k,\quad q_{kj}=q_k); \]

c) the environmental responses depend stationarily and stochastically on the subject’s responses
\[ (\pi_{kj}=p(S_k/R_j)). \]

For each of these three types of environmental responses, formulas are easily derived (see (1), § 4, 3—§ 4, 6) expressing \(M_{m,\tau+1}\) in terms of \(M_{n,\tau}\), \(n=1,\ldots,m+1\). These formulas have, in all three cases, the form
\[ \mathbf M_{\tau+1}=C+A\mathbf M_\tau=\left(\sum_{k=0}^{\tau-1}A^k\right)C+A^\tau\mathbf M_1, \tag{1} \]
where
\[ \mathbf M_\tau=(M_{1,\tau},M_{2,\tau}\ldots,M_{m,\tau},\ldots); \quad C=(C_1,\ldots,C_m,\ldots), \quad A=\|C_{kj}\|_{k,j=1}^{\infty}; \]
\(C_m\), \(C_{kj}\) are real numbers uniquely determined through \(\pi_{kj}\), \(a_{kj}\), \(q_{kj}\).

Let \(M\) be the set of all vectors
\[ \mathbf M=(M_1,\ldots,M_m,\ldots) \]
such that,

that

\[ M_m=\int_0^1 x^m\, dF(x), \]

where \(F(x)\) are the distribution functions of random variables \(\xi\) equal to 0 outside \([0,1]\). \(M\) is a closed subset of the \(B\)-space of bounded sequences. It is easy to show that in case a) always, and under certain restrictions on \(a_{kj}, \alpha_{kj}\) (and perhaps always), and in cases b) and c), \(A\) is a contraction operator on \(M\). Passing to the limit \((\tau \to \infty)\), where possible, we obtain equations for the moments of the limiting distribution in the form

\[ \mathbf{M}=\mathbf{C}+A\mathbf{M}. \tag{2} \]

Hence, using elementary formulas of combinatorial analysis (see (2), Ch. 2, § 4), we obtain equations for the characteristic functions \(f(t)\) of the limiting distributions:

\[ f(t)=\sum_{k=1}^{s}\pi_k e^{ia_k t} f(\alpha_k t), \tag{3} \]

\[ f(t)=e^{ia_2 t}f(\alpha_2 t)-\frac{ie^{ia_1 t}}{\alpha_1}f'(\alpha_1 t)+\frac{ie^{ia_2 t}}{\alpha_2}f'(\alpha_2 t), \tag{4} \]

\[ f(t)=\sum_{k=1}^{s}\left\{\pi_{k,2}e^{ia_{k,2}t}\left[f(\alpha_{k,2}t)+\frac{i}{\alpha_{k,2}}f'(\alpha_{k,2}t)\right]+\pi_{k,1}\frac{ie^{ia_{k,1}t}}{\alpha_{k,1}}f'(\alpha_{k,1}t)\right\}. \tag{5} \]

Examples.

1) Let \(q_{ki}=q\); then for all three types there exists a unique limiting distribution. Equations (3)—(5) have the solution \(e^{iqt}\), i.e. the distribution is concentrated at the point \(q\).

2) Consider the first type of environmental responses. In (3) put \(\alpha_k=\alpha\). Then

\[ f(t)=f(\alpha t)\left(\sum_{k=1}^{s}\pi_k e^{ia_k t}\right). \]

Iterating this relation, we obtain

\[ f(t)=f(\alpha^k t)\prod_{m=0}^{n}\left(\sum_{k=1}^{s}\pi_k e^{ia_k\alpha^m t}\right) =\prod_{m=0}^{\infty}\left(\sum_{k=1}^{s}\pi_k e^{ia_k\alpha^m t}\right). \]

This is the characteristic function of the sum \(\sum_{m=0}^{\infty}\xi_m\), where \(\xi_m\) are independent random variables, and \(\xi_m\) takes the values \(a_k\alpha^m\), \(k=1,\ldots,s\), with probabilities \(\pi_k\) \((0<\alpha<1)\).

For particular values of the parameters this expression occurs in the problem of uniqueness of Fourier series (3). For example, for \(S=\{S_1,S_2\}\), \(\alpha=1/3\), \(\pi_1=\pi_2=1/2\), \(a_1=0\), \(a_2=2/3\) (i.e. \(q_1=0,\ q_2=1\)),

\[ f(t)=\lim_n \frac{1}{2^n}\prod_{m=0}^{n}\left(1+e^{\frac{2it}{3^{m+1}}}\right) =e^{it}\prod_{m=0}^{\infty}\cos\left(\frac{2t}{3^{m+1}}\right) =\int_0^1 e^{itx}\,dF(x). \]

Hence ((3), p. 828) \(F(x)\) is the Cantor singular curve on \([0,1]\), and \(p_\infty\), corresponding to \(F(x)\), takes values from the Cantor perfect set. It is shown in (4) that this may also occur in case b). There is reason to suppose that, in the general case, when \(R=\{R_1\ldots R_r\}\), \(S=\{S_1\ldots S_s\}\), \(\alpha_{kj}<1-\alpha_{k'j'}\), \(p_\infty\) is also concentrated on a perfect, nowhere dense set without isolated points.

I express my sincere gratitude to V. G. Sragovich for constant assistance in the work and to Yu. A. Shreider for valuable advice.

Computing Center
Academy of Sciences of the USSR

Received
25 IV 1964

References

  1. R. Bush, F. Mosteller, Stochastic Models of Learnability, Moscow, 1962.
  2. J. Riordan, Introduction to Combinatorial Analysis, IL, 1963.
  3. N. K. Bari, Trigonometric Series, Moscow, 1961.
  4. S. Karlin, Pacific J. Math., 3, 725 (1953).

Submission history

CYBERNETICS AND CONTROL THEORY