Full Text
Mathematics
Yu. Ofman
ON THE APPROXIMATE REALIZATION OF CONTINUOUS FUNCTIONS BY AUTOMATA
(Presented by Academician A. N. Kolmogorov on 29 VI 1963)
This work is a continuation of two lines of investigation: a) Shannon’s work on the algorithmic complexity of arbitrary functions of the algebra of logic \((^1)\); b) work on \(\varepsilon\)-entropy in functional spaces \((^2)\). The formulation of the problems and a number of the basic ideas of the present work belong to A. N. Kolmogorov.
- I shall set forth Shannon’s results as applied to the automata of paper \((^3)\). The number of different discrete functions mapping \(D^k\) into \(D^n\) (the set of binary sequences of length \(k\) into the set of binary sequences of length \(n\)) is equal to \((2^n)^{2^k}\). We shall call these functions the functions of the class \(M_k^n\). We shall assume that the order of growth of \(n\) does not exceed the order of growth of \(k\):
\[ k \succcurlyeq n^* . \]
The number of different automata consisting of \(N\) elements, as is not hard to verify, does not exceed \(N^{CN}\), where \(C\) is some constant. Suppose that every function of the class \(M_k^n\) can be realized by an automaton consisting of \(N\) elements. Then it is obvious that
\[ N^{CN} \geqslant (2^n)^{2^k}, \]
whence
\[ N \succcurlyeq \frac{n \cdot 2^k}{k}. \]
At the same time, the operating time of the automaton is
\[ T \succcurlyeq k, \]
since, if the automaton operated for a shorter time, not all elements would be used.
Thus, we have obtained the following result. Among the functions of the class \(M_k^n\) there exist such functions (and they are the majority) that any automaton realizing them has parameters
\[ N \succcurlyeq \frac{n \cdot 2^k}{k}, \qquad T \succcurlyeq k. \]
Assertions of this type we shall call entropic lower bounds.
Shannon constructed an automaton realizing any function from \(M_k^n\) with parameters
\[ N \asymp \frac{n \cdot 2^k}{k}, \qquad T \asymp k . \]
Thus, the upper and lower estimates coincide in order.**
* Notation of the type \(f(n) \succcurlyeq g(n)\), \(f(n) \asymp g(n)\) is introduced in paper \((^2)\).
** In paper \((^1)\) this theorem is proved for contact circuits. The result is easily carried over to automata and to superpositions as defined in paper \((^3)\). Superpositions will be considered here.
2. Let a continuous function \(f(x)\) be given on the interval \([0,1]\), such that
\[ \max_{x\in[0,1]} |f(x)| \leq 1 . \]
Let \(\varepsilon>0\) be given. Then there exists \(\delta>0\) such that, if \(|x^1-x^2|\leq\delta\), then \(|f(x^1)-f(x^2)|<\varepsilon/2\), where \(x^1,x^2\) is an arbitrary pair of points of the interval.
On the interval \([0,1]\) consider the collection of points (a grid) that divide this interval into \(2^{k-1}\) equal parts, where \(k\) is chosen so that
\[ \frac{1}{2^{k-1}}=\delta_1 \leq 2\delta . \]
We consider the same grid, but with step
\[ \frac{1}{2^{n-1}}=\varepsilon_1 \leq \frac{\varepsilon}{2} \]
on the interval \([-1,1]\), the range of values of \(f(x)\). To the points of the grids we assign binary sequences of lengths \(k\) and \(n\), respectively—the binary coordinates of these points. We shall denote the grids by the same symbols \(D^k\) and \(D^n\) as the sets of binary sequences used for numbering the grid points. Let \(x\) be an arbitrary point of the interval \([0,1]\). By \(x_0\) we denote a point such that
\[ x_0 \in D^k,\qquad |x-x_0|\leq \delta_1 . \]
Definition. A discrete function \(d\in M_k^n\) is called an \(\varepsilon\)-equal function of the continuous function \(f(x)\), if
\[ |f(x)-d(x_0)|<\varepsilon \]
for every \(x\in[0,1]\).
Let us note that if \(f(x)\) satisfies a Hölder condition with exponent \(\alpha\):
\[ |f(x^1)-f(x^2)|\leq |x^1-x^2|^\alpha \qquad (0<\alpha\leq 1), \]
then it is natural to choose
\[ \alpha k=n\sim \log\frac{1}{\varepsilon}, \]
since the increment of the function on an interval \(\delta\) does not exceed \(\varepsilon=\delta^\alpha\).
In particular, if \(f\) satisfies the Lipschitz condition \((\alpha=1)\), then one may put \(k=n\). In this case, instead of \(k\) and \(n\), we shall use a single letter \(n\).
3. Consider the class \(F_r\subset C[0,1]\) of continuous functions on \([0,1]\) having \(r\) derivatives. The functions \(f\in F_r\) and their derivatives satisfy the inequalities
\[ \max_{x\in[0,1]} |f^{(i)}(x)|\leq 1,\qquad i=0,1,\ldots,r-1, \]
and are subject to the conditions
\[ f^{(i)}(0)=0,\qquad i=0,1,\ldots,r-1. \]
It follows from this that \(f^{(i)}(x)\), \(i=0,1,\ldots,r-1\), satisfy a Lipschitz condition with constant 1.
To the class \(F_r\) there corresponds a class of discrete functions \(M^{r,\varepsilon}\), \(\varepsilon\)-equal to functions from \(F_r\): every function \(d\in M^{r,\varepsilon}\) is \(\varepsilon\)-equal to some \(f\in F_r\), and for every \(f\in F_r\) there exists \(d\in M^{r,\varepsilon}\) that is \(\varepsilon\)-equal to \(f\).
Let us consider one more class of continuous functions \(F_A \subset C_{[0,1]}\) and the corresponding class of discrete functions \(M^{A,\varepsilon}\). \(f \in F_A\), if \(f\) is an analytic function and its derivatives satisfy the inequalities
\[ \max \left|\frac{f^{(i)}(x)}{i!}\right| \leq \left(\frac12\right)^i,\qquad i=0,1,\ldots \]
The class \(M^{A\varepsilon}\) is defined analogously to \(M^{r\varepsilon}\). We shall study \(M^{r\varepsilon}\) and \(M^{A\varepsilon}\) in the same way as \(M_k^n\) (see § 1). In order to obtain, for these classes, a lower entropy estimate, one must estimate the number of discrete functions contained in them. This can be done on the basis of the results of the paper [2] on \(\varepsilon\)-entropy. There it is proved that in the compact set \(F_r\) there exists a \(2\varepsilon\)-distinguishable set (the distance between any two functions of this set is greater than \(2\varepsilon\)) containing
\[ s_1 = 2^{c_1\left(\frac1\varepsilon\right)^{1/r}} \]
functions. In the class \(F_A\) there exists a \(2\varepsilon\)-distinguishable set containing
\[ s_2 = 2^{c_2 \log^2 \frac1\varepsilon} \]
functions.*
One discrete function cannot be \(\varepsilon\)-equal to two continuous functions the distance between which is greater than \(2\varepsilon\). Therefore, in the classes \(M^{r,\varepsilon}\) and \(M^{A\varepsilon}\), the number of discrete functions is no less than \(s_1\) and \(s_2\), respectively. Hence, with the aid of exactly the same arguments as in obtaining the lower entropy estimate for \(M_k^n\), we obtain the following results.
Among the functions of the class \(M^{r,\varepsilon}\) there are some such that every automaton realizing them has parameters
\[ N \succcurlyeq \frac{\left(\frac1\varepsilon\right)^{1/r}}{\log \frac1\varepsilon}, \qquad T \succcurlyeq \log \frac1\varepsilon \quad\text{or}\quad N \succcurlyeq \frac{2^{n/2}}{n},\qquad T \succcurlyeq n^{**}, \]
Among the functions of the class \(M^{A,\varepsilon}\) there are some such that every automaton realizing them has parameters
\[ N \succcurlyeq \frac{\log^2 \frac1\varepsilon}{\log \log \frac1\varepsilon}, \qquad T \succcurlyeq \log \log \frac1\varepsilon \quad\text{or}\quad N \succcurlyeq \frac{n^2}{\log n},\qquad T \succcurlyeq \log n^{**}. \]
4. In this paragraph we shall deal with upper estimates for the classes \(M^{r\varepsilon}\) and \(M^{A\varepsilon}\). For \(M^{r\varepsilon}\) it is possible to obtain upper estimates which, in order, coincide with the lower ones.
Theorem 1. Every function of the class \(M^{r\varepsilon}\) can be realized by an automaton with parameters
\[ N \preccurlyeq \frac{2^{n/r}}{n},\qquad T \preccurlyeq n. \]
The proof of this theorem is cumbersome, and we do not give it here. A statement somewhat weaker than Theorem 2 is easily proved: every function of the class \(M^{r\varepsilon}\) can be realized by an automaton with parameters
\[ N \preccurlyeq 2^{n/r},\qquad T \preccurlyeq n. \]
* The logarithm is everywhere taken to base 2.
** Since the functions satisfy the Lipschitz condition, \(k=n \sim \log \frac1\varepsilon\).
Theorem 2. Every function of the class \(M^{A\varepsilon}\) can be realized by an automaton with parameters
\[ N \precsim n^{2+\gamma}, \qquad T \precsim \log^3 n, \]
where \(\gamma > 0\) may be chosen arbitrarily small.
An analytic function of the class \(F_A^\varepsilon\) is approximated with accuracy up to \(\varepsilon/4\) by a segment of the Taylor series consisting of \(n+2\) terms. The accuracy of the coefficients of the Taylor series, as well as of the results of intermediate computations, will be more than sufficient if \(2n\)-digit numbers are used. Then, in computing the polynomial, in each arithmetic operation the rounding error does not exceed \(\varepsilon^2\), and the total error due to rounding will be less than \(\varepsilon/4\). To compute the values of the polynomial we shall need the addition and multiplication automata described in papers \((^3,\ ^4)\).
In \((^3)\) the existence is proved of an automaton performing the addition of two \(n\)-digit numbers with parameters
\[ N \precsim n, \qquad T \precsim \log n. \]
In \((^4)\) the existence is proved of an automaton multiplying two \(n\)-digit numbers with parameters
\[ N \precsim n C^{\sqrt{\log n}}, \qquad T \precsim C^{\sqrt{\log n}} \]
for sufficiently large \(C\), for example for \(C=2^5\). That work does not indicate other estimates obtained by its author for the multiplication automaton, namely: for every \(\gamma>0\) there exists an automaton performing the multiplication of two \(n\)-digit numbers with parameters
\[ N \precsim n^{1+\gamma}, \qquad T \precsim \log^2 n. \]
These estimates are obtained in the same way as those whose derivation is described in \((^4)\), with the sole difference that \(r\) must be regarded as a sufficiently large number, but constant as \(n \to \infty\), depending only on \(\gamma\). We shall use these latter estimates.
Using the addition and multiplication automata for \(n\)-digit numbers, it is not difficult to construct an automaton for computing the polynomial with the parameters indicated in the theorem.
Table 1
Summary of results
| Upper estimate \(N\) | Upper estimate \(T\) | Lower estimate \(N\) | Lower estimate \(T\) | |
|---|---|---|---|---|
| Addition of two \(n\)-digit numbers | \(n\) | \(\log n\) | \(n\) | \(\log n\) |
| Multiplication of two \(n\)-digit numbers | \(n^{1+\gamma}\) | \(\log^2 n\) | \(n\) | \(\log n\) |
| Class \(M_k^n\) | \(\dfrac{n\cdot 2^k}{k}\) | \(n\) | \(\dfrac{n\cdot 2^k}{k}\) | \(n\) |
| Class \(M^{r\varepsilon}\) | \(\dfrac{2^{n/r}}{n}\) | \(n\) | \(\dfrac{2^{n/r}}{n}\) | \(n\) |
| Class \(M^{A\varepsilon}\) | \(n^{2+\gamma}\) | \(\log^3 n\) | \(\dfrac{n^2}{\log n}\) | \(\log n\) |
Remark. The upper estimates written in one row are compatible in a single automaton. The lower estimates are independent and valid for every automaton.
Received
25 VI 1963
References
- C. E. Shannon, Bell Syst. Techn. J., 28, No. 1, 59 (1949).
- A. N. Kolmogorov, V. M. Tikhomirov. UMN, 14, issue 2 (86) (1959).
- Yu. Ofman, DAN, 145, No. 1, 48 (1962).
- A. Toom, DAN, 150, No. 3, 496 (1963).