Full Text
CYBERNETICS AND THE THEORY OF REGULATION
Yu. V. GLEBSKII
CODING BY MEANS OF FINITE AUTOMATA
(Presented by Academician A. I. Berg on 24 XI 1960)
- The method of coding usually studied (see \((^1)\) and the bibliography cited there, as well as \((^2)\)) assigns to each letter of a message (written as a word over the letters of a finite alphabet) some fixed code (a word over the letters of another alphabet), so that the coding can be realized in devices without memory. (In our terminology, the operators of the usual method of coding have weight 1; see below.) On the other hand, in works on the theory of finite automata \((^{3,4})\) operators are studied whose values on individual input letters change from cycle to cycle.
However, the requirement of synchronizing input and output, adopted in these works, is unnatural in the study of coding problems. In this note the question of the mutual uniqueness of coding by means of discrete devices with memory, but without synchronization, is considered. The coded sets of messages can be described by means of sources with a finite number of states \((^5)\). The criterion of mutual uniqueness is a generalization of the analogous criterion in \((^2)\).
- Let \(\widetilde A,\ \widetilde D\) denote the sets of all words over the alphabets \(A=\{a_1,a_2,\ldots,a_m\}\) and \(D=\{d_1,d_2,\ldots,d_{m_1}\}\), respectively. An operator \(P\) mapping \(\widetilde A\) into \(\widetilde D\) will be called finitely determined (abbreviated f.d.) if the following conditions are satisfied: 1) \(P(\Lambda)=\Lambda\) (\(\Lambda\) denotes the empty word); 2) for every \(\alpha\in\widetilde A\) there exists an operator \(P_\alpha\) such that \(P(\alpha\beta)=P(\alpha)P_\alpha(\beta)\) for any \(\beta\in\widetilde A\); 3) the set of distinct \(P_\alpha,\ \alpha\in\widetilde A\), is finite. The number \(h\) of elements of the latter set will be called the weight of the operator (cf. \((^4)\)).
Each operator \(P_\alpha\) is also f.d. with weight \(\le h\). Denote by \(P^i,\ i=1,2,\ldots,h\), the distinct operators \(P_\alpha\), where it is assumed that \(P^1=P_\Lambda=P\). Then one can define a function \(i'=f(i,j)\) and words \(\delta_j^i\in\widetilde D\) such that
\[
P^i_{a_j}=P^{i'},\quad P^i(a_j)=\delta_j^i,\quad i=1,2,\ldots,h;\quad j=1,2,\ldots,m;\quad 1\le i'\le m,
\]
and for any nonempty word \(\alpha=a_{j_1}a_{j_2}\ldots a_{j_l}\) the relation holds
\[
P^i(\alpha)=\delta_{j_1}^i\delta_{j_2}^{i_2}\ldots\delta_{j_l}^{i_l},
\tag{1}
\]
where \(i_1=i,\ i_{\nu+1}=f(i_\nu,j_\nu),\ \nu=1,2,\ldots,l-1\). Conversely, if a function \(f\) and words \(\delta_j^i\in\widetilde D\) are given, then relation (1) (together with the requirement \(P^i(\Lambda)=\Lambda\)) determines a system of f.d. operators \(P^i\).
The product (superposition) of f.d. operators is also an f.d. operator. The connection between f.d. operators and operators realizable in synchronous finite automata is given by the fact that any f.d. operator \(P\) can be represented in the form \(P=TSR\), where \(R\) is an operator of weight 1 mapping \(\widetilde A\) into \(\widetilde A\), such that \(R(a_j)=a_ja_j\ldots a_j\) (\(n\) times); \(S\) is a boundedly determined ope-
operator in the sense of (4), mapping \(\widetilde A\) into \(\widetilde D'\) (where \(D' = D \cup \{d_0\}\)); \(T\) is a weight-1 operator such that \(T(d_j)=d_j\) for \(j\ne 0\) and \(T(d_0)=\Lambda\) (\(T\) maps \(\widetilde D'\) into \(\widetilde D\)).
- A set \(\theta \subset \widetilde A\) will be called finitely enumerable (abbreviated f.e.) if there exists a finite alphabet \(B\) and a f.d. operator \(Q\), mapping \(\widetilde B\) into \(\widetilde A\), such that \(Q(\widetilde B)=\theta\). Finite sets of words, including \(\Lambda\), and also \(\widetilde A\) are finitely enumerable. If one somewhat narrows the definition of a source in (5), then the sets generated by such sources are finitely enumerable. Let \(\Phi \subset \widetilde A\) be such that \(\Lambda \in \Phi\). Then any \(\alpha \in \widetilde A\) can be represented in the form \(\alpha=\varphi\beta\), where \(\varphi\in\Phi\). Among all these representations choose one, \(\alpha=\varphi_\alpha \beta_\alpha\), in which the length of the word \(\varphi_\alpha\) is maximal. This representation is unique; denote by \(U^\Phi\) the operator mapping \(\widetilde A\) into \(\widetilde A\), defined by the equality \(U^\Phi(\alpha)=\varphi_\alpha\).
Lemma. In order that \(\Phi\) be f.e., it is necessary and sufficient that \(U^\Phi\) be f.d.
Let \(P\) be a f.d. operator mapping \(\widetilde A\) into \(\widetilde D\). Denote by \(\Gamma_P\) the set of words from \(\widetilde A\) such that \(\alpha\in\Gamma_P\) if and only if either \(\alpha=\Lambda\), or \(P_{\alpha'}(a_j)\ne\Lambda\), where \(\alpha=\alpha'a_j,\ a_j\in A\). Let \(\theta\subset\widetilde A\) be such that \(\Lambda\in\theta\). Then the operators \(P\) and \(PU^\theta\) coincide on \(\theta\), and if \(P\) is one-to-one on \(\theta\), then \(\Gamma_{PU^\theta}=\theta\). Hence it follows:
Theorem 1. In order that the f.d. operator \(P\) be one-to-one on \(\theta\), it is necessary and sufficient that \(\Gamma_{PU^\theta}=\theta\) and the operator \(PU^\theta\) be one-to-one on \(\Gamma_{PU^\theta}\).
From the lemma and Theorem 1 it follows that the study of the coding of a f.e. set of words by a f.d. operator in the case of mutual uniqueness reduces to the study of the coding, by some f.d. operator \(P\), of the set \(\Gamma_P\). The latter, however, is also of independent interest. Namely, if \(P\) is chosen so that \(\Gamma_P\) is the set of completed messages that interests us, then for \(\alpha\in\Gamma_P\) the word \(P(\alpha)\) encodes a completed message; if now new letters are successively appended to \(\alpha\) (it is assumed that the coding is performed by some device in time), then as long as the continuation of \(\alpha\) is not a completed message, its image does not change. But as soon as \(\alpha\beta\in\Gamma_P\) begins to hold, the coding device notifies us of the receipt of a new completed message \(\alpha\beta\) by the fact that \(P(\alpha\beta)\ne P(\alpha)\) begins to hold.
- We shall call an operator \(P\) an \(N\)-operator if \(P\) is f.d. and for every \(\alpha\in\widetilde A\) the following holds: either \(P_\alpha(\beta)=\Lambda\) for all \(\beta\in\widetilde A\), or for all nonempty words \(\gamma\in\widetilde A\), from the equality \(P_\alpha(\gamma)=\Lambda\) it follows that \(P_{\alpha\gamma}\ne P_\alpha\).
Theorem 2. a) In order that the f.d. operator \(P\) be one-to-one on \(\Gamma_P\), it is necessary that it be an \(N\)-operator; b) if \(P\) is an \(N\)-operator, then \(\Gamma_P\) is f.e.
It follows from the theorem that if a f.d. operator \(P\) is one-to-one on \(\Gamma_P\), then \(\Gamma_P\) is f.e.
Furthermore, according to the theorem, in studying the conditions for mutual uniqueness of \(P\) on \(\Gamma_P\), one may restrict oneself to \(N\)-operators.
- Let the \(N\)-operator \(P=P^1\) be given by a function \(f\) and by the set \(\delta_j^i\in\widetilde D\) as indicated in item 2 (therefore the operators \(P_i,\ i=2,\ldots,h\), are also given). By a \((k,i)\)-decoding of a word \(\delta\in\widetilde D\) we shall mean a (nonempty) sequence of pairs of indices
\[ \begin{pmatrix} i_1 i_2 \ldots i_l\\ j_1 j_2 \ldots j_l \end{pmatrix} \]
such that \(i_1=k,\ i_{\nu+1}=f(i_\nu,j_\nu),\ i_l=i,\ j_l=j,\ 1\le i_\nu\le h,\ 1\le j_\nu\le m,\ \nu=1,2,\ldots,l;\ \delta=\delta_{j_1}^{i_1}\delta_{j_2}^{i_2}\ldots\delta_{j_l}^{i_l},\ \delta_j^i=\)
\(= \delta^{i_1}_{j_1} \ne \Lambda\). Obviously, if we denote \(\alpha = a_{j_1}a_{j_2}\ldots a_{j_l}\), then \(P^k(\alpha)=\delta\), \(P^k_{\alpha}=P^{i'}\), where \(i'=f(i,j)\), and if \(i=1\), then \(\alpha\in\Gamma_P\). We shall call a \(k\)-decoding of the form \(\begin{pmatrix} i \\ j \end{pmatrix}\) a \(\begin{pmatrix} k, i \\ j \end{pmatrix}\)-decoding for certain \(i,j\), \(1\le i\le h;\ 1\le j\le m\). Any word \(\delta\in\widetilde D\) can have only a finite number of decodings, for the finding of which it is not difficult to give an effective procedure. The operator \(P=P^1\) is one-to-one on \(\Gamma_P\) if and only if, for every \(i=1,2,\ldots,h\), any word \(\delta\in\widetilde D\) has no more than one \(i\)-decoding.
Similarly to (2), in order to formulate a necessary and sufficient criterion for one-to-one correspondence, we construct a certain directed graph \(G\), whose “admissible” paths will correspond to words having at least two \(i\)-decodings. Denote: \(\mathfrak A=\{1,2,\ldots,h\}\); \(\mathfrak M\) is the set of quadruples of the form \(\begin{pmatrix} k,\varepsilon, i \\ j \end{pmatrix}\), where \(1\le k\le h\), and the word \(\varepsilon\) is a nonempty proper suffix of the word \(\delta^i_j\). As the set of vertices of the graph \(G\) we take \(\mathfrak A\cup\mathfrak M\cup\{Z\}\), where a “terminal” vertex \(Z\) has been introduced.
The vertices of the graph are joined by (directed) edges according to the following rules: 1) no edge has elements of \(\mathfrak A\) as its end-points, and no edge starts at \(Z\); 2) a vertex \(i\in\mathfrak A\) is joined to \(\begin{pmatrix} k,\varepsilon, i' \\ j' \end{pmatrix}\in\mathfrak M\) if and only if \(\delta^{i'}_{j'}\) has an \(\begin{pmatrix} i, i' \\ j' \end{pmatrix}\)-decoding, and the word \(\eta\) such that \(\delta^{i'}_{j'}=\eta\varepsilon\) \((\eta\ne\Lambda)\) has an \(\begin{pmatrix} i, i'' \\ j'' \end{pmatrix}\)-decoding such that \(k=f(i'',j'')\); 3) a vertex \(i\in\mathfrak A\) is joined to \(Z\) if and only if there exist pairs of indices \(\begin{pmatrix} i' \\ j' \end{pmatrix}\) and \(\begin{pmatrix} i'' \\ j'' \end{pmatrix}\) (not necessarily distinct) such that \(\delta^{i'}_{j'}=\delta^{i''}_{j''}\), for which there are two distinct \(\begin{pmatrix} i, i' \\ j' \end{pmatrix}\)- and \(\begin{pmatrix} i, i'' \\ j'' \end{pmatrix}\)-decodings of the word \(\delta^{i'}_{j'}\); 4) a vertex \(\begin{pmatrix} k,\varepsilon, i \\ j \end{pmatrix}\in\mathfrak M\) is joined to \(\begin{pmatrix} k',\varepsilon', i' \\ j' \end{pmatrix}\in\mathfrak M\) if and only if \(k'=f(i,j)\) and there is a \(\begin{pmatrix} k, i' \\ j' \end{pmatrix}\)-decoding of the word \(\varepsilon\varepsilon'\); 5) \(\begin{pmatrix} k,\varepsilon, i \\ j \end{pmatrix}\in\mathfrak M\) is joined to \(Z\) if and only if there exists a \(k\)-decoding of the word \(\varepsilon\).
All these connections are effectively found by a finite number of decodings of a finite number of words.
Theorem 3. In order that the \(N\)-operator \(P\) be one-to-one on \(\Gamma_P\), it is necessary and sufficient that the graph \(G\) contain no paths joining one of the vertices of \(\mathfrak A\) with the vertex \(Z\).
Introduce the notation: \(H\) is the maximum of the lengths of the \(\begin{pmatrix} k, i \\ j \end{pmatrix}\)-decodings of the word \(\delta^i_j\) over all possible \(i,j,k\); \(M\) is the maximum of the lengths of the words \(\delta^i_j\) over \(i,j\); \(N\) is the number of elements in \(\mathfrak M\). From the existence of a path in the graph there follows the existence of a path without self-intersections joining the same vertices. Similarly to (2), to each admissible (i.e., leading from \(\mathfrak A\) to \(Z\)) path one can assign a pair of preimages from \(\Gamma_P\) having one and the same image in \(\widetilde D\). Counting the lengths of these words leads to the following theorem.
Theorem 4. The \(N\)-operator \(P\) is one-to-one on \(\Gamma_P\) if \(P\) is one-to-one on the set of words from \(\Gamma_P\) whose lengths do not exceed
\[
h-1+2([N/2](M-1)+1)\cdot H
\]
(where \([x]\) denotes the integer part of the number \(x\)).
This estimate can be sharpened at the cost of complicating it. The case of one-to-one coding of the whole set \(\widehat A\) is a special case admitting certain simplifications.
Let us note that the constructed graph also makes it possible to judge the presence of (infinite) messages with “infinite delay” (1) (depending on whether it contains cycles).
As possible applications of the described methods of coding with memory to statistical information theory, one may point to the case in which the source of messages is nonstationary.
Research Physicotechnical Institute
of Gorky State University
named after N. I. Lobachevsky
Received
18 XI 1960
REFERENCES
- E. N. Gilbert, E. F. Moore, Bell Syst. Techn. J., 38, No. 4, 933 (1959).
- Al. A. Markov, DAN, 132, No. 3 (1960).
- Automata, IL, 1956.
- N. E. Kobrinckii, B. A. Trakhtenbrot, Logical Investigations, Publishing House of the Academy of Sciences of the USSR, 1959, p. 352.
- N. Chomsky, G. A. Miller, Inf. and Control, 1, No. 2, 91 (1958).