Reports of the Academy of Sciences of the USSR
Academician A. N. KOLMOGOROV
Submitted 1958-01-01 | RussiaRxiv: ru-195801.96025 | Translated from Russian

Full Text

Reports of the Academy of Sciences of the USSR
1958. Volume 119, No. 5

MATHEMATICS

Academician A. N. KOLMOGOROV

A NEW METRIC INVARIANT OF TRANSITIVE DYNAMICAL SYSTEMS AND AUTOMORPHISMS OF LEBESGUE SPACES

It is well known that a substantial part of the metric theory of dynamical systems can be presented as an abstract theory of “flows” \(\{S_t\}\) on “Lebesgue spaces” \(M\) with measure \(\mu\), in terms invariant with respect to “isomorphisms modulo zero” (see the survey article by V. A. Rokhlin \((^1)\), to which the following exposition adheres as regards definitions and notation). We shall assume the measure on \(M\) to be normalized by the condition

\[ \mu(M)=1 \tag{1} \]

and nontrivial (i.e., assume the existence of a set \(A \subseteq M\) with \(0<\mu(A)<1\)). Many examples are known of transitive automorphisms and transitive flows with the so-called “countable Lebesgue spectrum” (for automorphisms see \((^1)\), § 4; for flows, \((^{2-5})\)). From the spectral point of view we have here one type of automorphisms \(\Omega_0^\omega\) and one type of flows \(\Omega^\omega\). The question whether all automorphisms of type \(\Omega_0^\omega\) (respectively, flows of type \(\Omega^\omega\)) are isomorphic to one another \(\bmod 0\) has until now remained open. We show in §§ 3–4 that the answer to this question is negative both in the case of automorphisms and in the case of flows. The new invariant that makes it possible to split the class of automorphisms \(\Omega_0^\omega\) and the class of flows \(\Omega^\omega\) into a continuum of invariant subclasses is entropy per unit time. In § 1 the necessary facts from information theory are presented (the notions of conditional entropy and conditional information introduced here and their properties are probably of broader interest as well, although the entire exposition directly adjoins the definition of the amount of information from \((^7)\) and the numerous works developing that definition). In § 2 the characteristic \(h\) is defined and its invariance is proved. In §§ 3–4 examples are indicated of automorphisms and flows with arbitrary values of \(h\) in the range \(0<h\leq\infty\). In the case of automorphisms these are long-constructed examples; in the case of flows the construction of examples with finite \(h\) is a more delicate problem connected with certain curious questions in the theory of Markov processes.

§ 1. Properties of conditional entropy and conditional amount of information. In accordance with (1), denote by \(\mathfrak S\) the Boolean algebra of measurable sets of the space \(M\), considered \(\bmod 0\). Let \(\mathfrak C\) be a subalgebra of the algebra \(\mathfrak S\) closed in the metric \(\rho(A,B)=\mu((A-B)\cup(B-A))\). It generates a partition \(\xi_{\mathfrak C}\) of the space \(M\), defined \(\bmod 0\), determined by the condition that \(A\in\mathfrak C\) if and only if \(\bmod 0\) all of \(A\) can be composed of entire elements of the partition \(\xi_{\mathfrak C}\). On the elements \(C\) of the partition \(\xi_{\mathfrak C}\) the “canonical system of measures \(\mu_C\)” \((^1)\) is defined. For any \(x\in C\) we shall put

\[ \mu_x(A\mid\mathfrak C)=\mu_C(A\cap C). \tag{2} \]

From the point of view of probability theory (where any measurable function of an element \(x\in M\) is called a “random variable”), the random variable \(\mu_x(A\mid \mathfrak C)\) is the “conditional probability” of the event \(A\) given the known outcome of the “experiment” \(\mathfrak C\) (\(^{6}\), Ch. I, § 7).

For three subalgebras \(\mathfrak A,\mathfrak B\), and \(\mathfrak C\) of the algebra \(\mathfrak S\), and for \(C\in\xi_{\mathfrak C}\), set

\[ I_C(\mathfrak A,\mathfrak B\mid \mathfrak C)= \sup \sum_{i,j}\mu_x(A_i\cap B_j)\log \frac{\mu_x(A_i\cap B_j)}{\mu_x(A_i)\mu_x(B_j)}, \tag{3} \]

where the least upper bound is taken over all finite decompositions
\(M=A_1\cup A_2\cup\cdots\cup A_n,\quad M=B_1\cup B_2\cup\cdots\cup B_n\),
for which \(A_i\cap A_j=N,\ B_i\cap B_j=N,\ i\ne j,\ A_i\in\mathfrak A,\ B_j\in\mathfrak B\) (\(N\) is the empty set). If \(\mathfrak C\) is the trivial algebra \(\mathfrak N=\{N,M\}\), then (3) becomes the definition of the unconditional information \(I(\mathfrak A,\mathfrak B)\) from Appendix 7 to (\(^{7}\))* . The quantity (3) itself is interpreted as the “amount of information in the results of the experiment \(\mathfrak A\) relative to the experiment \(\mathfrak B\), given the known outcome \(C\) of the experiment \(\mathfrak C\).” If one does not fix \(C\in\xi_{\mathfrak C}\), then it is natural to consider the random variable \(I(\mathfrak A,\mathfrak B\mid \mathfrak C)\), which for \(x\in C\) is equal to
\(I_x(\mathfrak A,\mathfrak B\mid \mathfrak C)=I_C(\mathfrak A,\mathfrak B\mid \mathfrak C)\).
In what follows we shall deal with its mathematical expectation

\[ \mathbf M I(\mathfrak A,\mathfrak B\mid \mathfrak C) = \int_M I_x(\mathfrak A,\mathfrak B\mid \mathfrak C)\,\mu(dx). \tag{4} \]

The definitions of conditional entropy and of mean conditional entropy require no special explanation:
\(H(\mathfrak A\mid\mathfrak C)=I(\mathfrak A,\mathfrak A\mid\mathfrak C)\),
\(\mathbf M H(\mathfrak A,\mathfrak C)=\int_M H_x(\mathfrak A\mid\mathfrak C)\mu(dx)\).

We note those properties of the conditional amount of information and conditional entropy which we shall need below. Properties \((\alpha)\) and \((\delta)\) for the case of unconditional amount of information and entropy are well known; property \((\varepsilon)\) for the unconditional amount of information constitutes the content of Theorem 2 of note (\(^{8}\)). Properties \((\beta)\) and \((\gamma)\) are proved without difficulty. Concerning property \((\beta)\), it should only be noted that the analogous assertion for the amount of information (from \(\mathfrak C\supseteq\mathfrak C'\) there follows:
\(I(\mathfrak A,\mathfrak B\mid\mathfrak C)\leq I(\mathfrak A,\mathfrak B\mid\mathfrak C')\))
would already be erroneous. Connected with this is the fact that in property \((\zeta)\) there stands the lower limit and the sign \(\geq\): the corresponding limit may fail to exist, and the lower limit may in some cases turn out to be greater than \(\mathbf M I(\mathfrak A,\mathfrak B\mid\mathfrak C)\).

\((\alpha)\) \(I(\mathfrak A,\mathfrak B\mid\mathfrak C)\leq H(\mathfrak A\mid\mathfrak C)\), and equality is certainly attained when \(\mathfrak B\supseteq\mathfrak A\).

\((\beta)\) If \(\mathfrak C\supseteq\mathfrak C'\), then \(H(\mathfrak A\mid\mathfrak C)\leq H(\mathfrak A\mid\mathfrak C')\), mod 0.

\((\gamma)\) If \(\mathfrak B\supseteq\mathfrak B'\), then
\[ \mathbf M I(\mathfrak A,\mathfrak B\mid\mathfrak C) = \mathbf M I(\mathfrak A,\mathfrak B'\mid\mathfrak C) + \mathbf M I(\mathfrak A,\mathfrak B\mid\mathfrak C\vee\mathfrak B'), \]
where \(\mathfrak C\vee\mathfrak B'\) is the minimal closed algebra containing \(\mathfrak C\) and \(\mathfrak B'\).

\((\delta)\) If \(\mathfrak B\supseteq\mathfrak B'\), then
\[ \mathbf M I(\mathfrak A,\mathfrak B\mid\mathfrak C) \geq \mathbf M I(\mathfrak A,\mathfrak B'\mid\mathfrak C). \]

\((\varepsilon)\) If
\[ \mathfrak A_1\subseteq\cdots\subseteq\mathfrak A_n\subseteq\cdots,\quad \bigcup \mathfrak A_n=\mathfrak A, \]
then
\[ \lim_{n\to\infty}\mathbf M I(\mathfrak A_n,\mathfrak B\mid\mathfrak C) = \mathbf M I(\mathfrak A,\mathfrak B\mid\mathfrak C). \]

\((\zeta)\) If
\[ \mathfrak C_1\supseteq\mathfrak C_2\supseteq\cdots\supseteq\mathfrak C_n\supseteq\cdots,\quad \bigcap_n\mathfrak C_n=\mathfrak C, \]
then
\[ \liminf_{n\to\infty}\mathbf M I(\mathfrak A,\mathfrak B\mid\mathfrak C_n) \geq \mathbf M I(\mathfrak A,\mathfrak B\mid\mathfrak C). \]

§ 2. Definition of the invariant \(h\). We shall say that a flow \(\{S_t\}\) is quasiregular (has type \(\mathfrak N\)) if** there exists a closed subalgebra \(\mathfrak S_0\) of the algebra \(\mathfrak S\), whose shifts \(\mathfrak S_t=S_t\mathfrak S_0\) have the following properties: (I) \(\mathfrak S_t\subseteq\mathfrak S_{t'}\), if \(t\leq t'\). (II) \(\bigcup_t\mathfrak S_t=\mathfrak S\). (III) \(\bigcap_t\mathfrak S_t=\mathfrak N\).

* The authors of note (\(^{8}\)) did not in due time pay attention to Appendix 7 to (\(^{7}\)), included in the Russian translation (\(^{9}\)). Note (\(^{8}\)) should have begun with a reference to this appendix to (\(^{7}\)).

** This condition is considerably weaker than the condition of “regularity” usually used in the theory of random processes. See about this at the end of § 4.

When the flow is interpreted as a stationary random process, \(\mathfrak S_t\) may be regarded as the algebra of events “depending only on the course of the process up to the moment of time \(t\).” It is easily proved that flows of type \(\mathfrak R\) are transitive, and from Plessner’s results \((^{10},\,^{11})\) one can infer that they have homogeneous Lebesgue spectrum. If the multiplicity of the spectrum is equal to \(\nu\) \((\nu=1,2,\ldots,\omega)\), then we assign the flow to the type \(\mathfrak R^\nu\). Clearly, \(\mathfrak R^\nu \subset \Omega^\nu\), where \(\Omega^\nu\) is the class of flows with Lebesgue spectrum of homogeneous multiplicity \(\nu\). It is possible, however, that all \(\Omega^\nu\) (and, consequently, \(\mathfrak R^\nu\)), except for \(\Omega^\omega(\mathfrak R^\omega_0)\), are empty and that \(\mathfrak R^\omega=\Omega^\omega\).

Theorem 1. If for the flow \(\{S_t\}\) there exists an \(\mathfrak S_0\) satisfying conditions (I)—(III), then for \(\Delta>0\)
\[ \mathbf M H(\mathfrak S_{t+\Delta}\mid \mathfrak S_t)=h\Delta, \]
where \(h\) is a constant lying in the range \(0<h\leqslant\infty\).

Theorem 2. The constant \(h\) for a given flow \(\{S_t\}\) does not depend on the choice of \(\mathfrak S_0\) satisfying conditions (I)—(III).

We outline here the proof of Theorem 2. Let two \(\mathfrak S_0\) and \(\mathfrak S'_0\) correspond to \(h<\infty\) and \(h'\). By Theorem 1 and lemmas \((\alpha)\) and \((\varepsilon)\), for every \(\varepsilon>0\) one can find such a \(k\) that
\[ h=\mathbf M H(\mathfrak S_{t+1}\mid \mathfrak S_t)=\mathbf M I(\mathfrak S_{t+1},\mathfrak S\mid \mathfrak S_t)\leqslant \mathbf M I(\mathfrak S_{t+1},\mathfrak S'_{t+k}\mid \mathfrak S_t)+\varepsilon. \tag{5} \]

From (5), by lemma \((\zeta)\), it follows that there exists such an \(m\) that
\[ h\leqslant \mathbf M I(\mathfrak S_{t+1},\mathfrak S'_{t+k}\mid \mathfrak S_t\vee \mathfrak S'_s)+2\varepsilon \quad \text{for } t-s>m. \tag{6} \]

From (6) and lemmas \((\delta),(\gamma),(\alpha),(\beta)\) (to be applied in the indicated order!):
\[ nh\leqslant \sum_{t=0}^{n-1}\mathbf M I(\mathfrak S_{t+1},\mathfrak S'_{t+k}\mid \mathfrak S_t\vee \mathfrak S'_{-m})+2n\varepsilon\leqslant \]
\[ \leqslant \sum_{t=0}^{n-1}\mathbf M I(\mathfrak S_{t+1},\mathfrak S'_{n+k}\mid \mathfrak S_t\vee \mathfrak S'_{-m})+2n\varepsilon= \]
\[ =\mathbf M I(\mathfrak S_x,\mathfrak S'_{n+k}\mid \mathfrak S_0\vee \mathfrak S'_{-m})+2n\varepsilon\leqslant \]
\[ \leqslant \mathbf M H(\mathfrak S'_{n+k}\mid \mathfrak S_0\vee \mathfrak S'_{-m})+2n\varepsilon\leqslant \]
\[ \leqslant \mathbf M H(\mathfrak S'_{n+k}\mid \mathfrak S'_{-m})+2n\varepsilon= \]
\[ =(n+k+m)h'+2n\varepsilon, \]
\[ h\leqslant \frac{n+k+m}{n}h'+2\varepsilon. \tag{7} \]

Since \(\varepsilon>0\) and \(n\) are arbitrary (with \(n\) chosen after \(k\) and \(m\) have been fixed), (7) implies the inequality \(h\leqslant h'\). This inequality is proved quite analogously also in the case \(h=\infty\). The reverse inequality \(h'\leqslant h\) is proved similarly, and this completes the proof of Theorem 2.

§ 3. Invariants of automorphisms. If in § 2 we assume that \(t\) takes only integer values, then \(\{S_t\}\) is uniquely determined by the automorphism \(T=S_1\). By Theorems 1 and 2 there exists an invariant \(0<h(T)\leqslant\infty\).

It is easily proved that every automorphism of type \(\mathfrak R_0\) (the subscript is used to distinguish it from the case of flows with continuous time) has countably multiple Lebesgue spectrum, i.e. among the classes \(\mathfrak R_0^\nu\) only the class \(\mathfrak R_0^\omega \subset \Omega_0^\omega\) is nonempty. It decomposes according to the values of \(h(T)\) into classes \(\mathfrak R_0^\omega(h)\).

Theorem 3. For every \(h\), \(0<h\leqslant\infty\), there exists an automorphism belonging to \(\mathfrak R_0^\omega(h)\).

The corresponding examples are well known and are obtained, for example,

from the scheme of independent random trials \(\Omega_{-1}, \Omega_0, \Omega_1,\ldots,\Omega_t,\ldots\) with distribution of the probabilities of the outcome \(\xi_t\) of the trial \(\Omega_t\)

\[ \mathbf P\{\xi_t=a_i\}=p_i,\qquad -\sum_{i=1}^{\infty}p_i\log p_i=h. \tag{8} \]

The space \(M\) consists of sequences \(x=(\ldots,x_{-1},x_0,x_1,\ldots,x_t,\ldots)\), \(x_t=a_1,a_2,\ldots\), and the shift \(Tx=x'\) is defined by the formula \(x'_t=x_{t-1}\). The measure \(\mu\) on \(M\) is defined as the direct product of the probability measures (8).

§ 4. Invariants of flows.

Theorem 4. For any \(h\), \(0<h<\infty\), there exists a flow of class \(\mathfrak N^\omega(h)\), i.e. a flow with countably multiple Lebesgue spectrum and with the prescribed value of the constant \(h\).

By analogy with § 3 there naturally arises the idea of using, for the proof of Theorem 4, instead of the scheme of discrete independent trials, the scheme of “processes with independent increments,” or of generalized processes “with independent values” \((^{12,13})\). However, this path leads only to flows of the class \(\mathfrak N^\omega(\infty)\) \((^5)\). To obtain finite values of \(h\) one has to use a more artificial construction. In this note it is possible only to give a description of one such construction.

Let us define mutually independent random variables \(\xi_n\), corresponding to all integers \(n\), with distributions of their values: \(\mathbf P(\xi_0=k)=3/4^k\), \(k=1,2,\ldots\), and, for \(n\ne0\), \(\mathbf P\{\xi_n=k\}=1/2^k\), \(k=1,2,\ldots\). We place the point \(\tau_0\) on the \(t\)-axis, in the case \(\xi_0=k\), with uniform distribution of probabilities on the interval \(-u/2^k\le \tau_0\le0\), and define the points \(\tau_n\) for \(n\ne0\) from the relation

\[ \tau_{n+1}=\tau_n+u/2^{\xi_n}. \]

Put \(\varphi(t)=\xi_n\) for \(\tau_n\le t<\tau_{n+1}\). It is easy to verify that the distribution of the random function \(\varphi(t)\) is invariant with respect to the shifts \(S_t\varphi(t_0)=\varphi(t_0-t)\). It is easy to calculate that \(h\{S_t\}=6/u\) (per unit of time there fall on average \(3/u\) points \(\tau_n\), and each \(\xi_n\) contributes entropy

\[ \sum_{k=1}^{\infty}\frac{k}{2^k}=2 \]
).

One can obtain a more visual representation of our random process if one includes in the description of its state \(\omega(t)\) at the moment of time \(t\), in addition to the value \(\varphi(t)\), also the value \(\delta(t)=t-\tau^*(t)\) of the difference between \(t\) and the nearest point \(\tau_n\) to the left of \(t\). With this method of description our process turns out to be a stationary Markov process. It deserves only the name “quasiregular,” since, although the corresponding dynamical system is transitive, the value of the difference \(f(\omega(t),t)=\tau^*(t)=t-\delta(t)\) is determined with the precision of a dyadic-rational term by the behavior of a realization of the process in the arbitrarily distant past.

Received
21 I 1958

REFERENCES

\({}^1\) V. A. Rokhlin, Uspekhi Mat. Nauk, 4, 2 (30) (1949).
\({}^2\) I. M. Gel'fand, S. V. Fomin, Uspekhi Mat. Nauk, 7, 1 (47) (1952).
\({}^3\) S. V. Fomin, Ukr. Mat. Zhurn., 2, No. 2 (1950).
\({}^4\) K. Itô, Japan J. Math., 22, 63 (1952).
\({}^5\) K. Itô, Trans. Am. Math. Soc., 81, 253 (1956).
\({}^6\) J. L. Doob, Stochastic Processes, IL, 1956.
\({}^7\) C. E. Shannon, W. Weaver, The Mathematical Theory of Communications, 1949.
\({}^8\) I. M. Gel'fand, A. N. Kolmogorov, A. M. Yaglom, DAN, 111, No. 4 (1956).
\({}^9\) K. Shannon, Collected papers, Theory of Transmission of Electrical Signals in the Presence of Noise, IL, 1953.
\({}^{10}\) A. I. Plesner, DAN, 23, No. 4 (1939).
\({}^{11}\) A. I. Plesner, DAN, 25, No. 9 (1939).
\({}^{12}\) K. Itô, Mem. Coll. Sci. Univ. Kyoto, 18, No. 3 (1954).
\({}^{13}\) I. M. Gel'fand, DAN, 100, No. 5 (1955).

Submission history

Reports of the Academy of Sciences of the USSR