Full Text
UDC 519.21
MATHEMATICS
V. M. ZOLOTAREV
SOME INEQUALITIES OF PROBABILITY THEORY AND THEIR APPLICATION TO A REFINEMENT OF A. M. LYAPUNOV’S THEOREM
(Presented by Academician A. N. Kolmogorov on 14 II 1967)
1. The construction of many limit theorems in the uniform metric and their refinements is connected with the use of the well-known inequality of K. Esseen. However, experience in working with this inequality has shown that in accuracy it is somewhat inferior to an analogous inequality of A. Berry, which, to be sure, is applicable in a narrower class of cases. Below an improved version of the Berry–Esseen inequality will be given, which, surpassing A. Berry’s inequality in accuracy, at the same time turns out to be no less convenient in use and applicable in a broader class of cases than K. Esseen’s inequality. This version of the inequality, together with some new inequalities for characteristic functions of sums of independent random variables, made it possible substantially to lower the estimates of the absolute constant appearing in A. M. Lyapunov’s theorem.
2. Let \(L(x)\) and \(H(x)\) be left-continuous functions of bounded variation, and let \(l(t)\), \(h(t)\) be their corresponding Fourier–Stieltjes transforms. Choose some absolutely continuous, symmetric distribution with density \(p(x)\) and absolutely integrable characteristic function \(\omega(t)\). Introduce the notation:
\[ \Delta=\sup |L(x)-H(x)|;\qquad m(t)=l(t)-h(t); \]
\[ V(x)=x\int_{|u|<x} p(u)\,du,\quad x>0;\qquad Q(y)=\frac{y}{2\pi}\int\left|\omega\left(\frac{t}{y}\right)\frac{m(t)}{t}\right|dt,\quad y>0, \]
and let \(\alpha\) be the unique positive root of the function \(2V(x)-x\). Let \(\mathfrak{E}_1,\mathfrak{E}_2\) be arbitrary sets on the real line. Put
\[ \beta(\mathfrak{E}_1,\mathfrak{E}_2)=\inf(a''-a'), \tag{1} \]
where the lower bound is taken over all possible pairs of points \(a'<a''\), belonging to the set \(\mathfrak{E}_1\cup\mathfrak{E}_2\), supplemented by the improper points of the line \(-\infty\) and \(\infty\). In Theorem 1 some additional assumptions concerning the properties of the functions \(L,H\) will be used. These assumptions will be chosen from among the following three.
Condition \(A_1\). Denote by \(\mathfrak{A}_1\) the set of discontinuity points of the function \(H\), and by \(\mathfrak{B}_1\) the complement of the set \(\mathfrak{A}_1\). Then the function \(H\) has a derivative at all points of the set \(\mathfrak{B}_1\), and
\[ q_H=\sup_{\mathfrak{B}_1}|H'(x)|<\infty. \]
Condition \(A_2\). Denote by \(\mathfrak{A}_2\) the set of discontinuity points of the function \(L\), and by \(\mathfrak{B}_2\) the complement of the set \(\mathfrak{A}_2\). Then the function \(L\) has a derivative at all points of the set \(\mathfrak{B}_2\), and
\[ q_L=\sup_{\mathfrak{B}_2}|L'(x)|<\infty. \]
Condition \(A_3\). The function \(L\) is monotone. In this case we formally set \(q_L=0\) and take the empty set as \(\mathfrak{A}_3\).
Theorem 1. Let condition \(A_1\) and one of the conditions \(A_2, A_3\) be satisfied. From the sets \(\mathfrak A_1, \mathfrak A_k\) corresponding to the pair of satisfied conditions, form, according to (1), the quantity \(\beta=\beta(\mathfrak A_1,\mathfrak A_k)\). Suppose that \(\beta>0\). Then for all positive \(x,y\) satisfying the requirements \(x\ge a\),
\[ y\ge \frac{4}{\beta}x, \]
the following inequality holds \((q=q_L+q_H)\):
\[ \Delta \le x[qV(x)+Q(y)]/y[2V(x)-x]. \]
3. Let \(\xi_1,\ldots,\xi_n\) be independent random variables such that \(M\xi_j=0\) and \(\beta_j=M|\xi_j|^3<\infty\). Denote by \(\mathfrak S_n\) the set of \(n\)-tuples of distribution functions of random variables of the type under consideration, and let \(\mathfrak S=\bigcup_n \mathfrak S_n\).
Put \(\zeta=\xi_1+\cdots+\xi_n\), \(\sigma^2=M\zeta^2\), and \(\varepsilon=(\sum\beta_j)/\sigma^3\). Also denote by \(F\) the distribution function of the sum \(\zeta/\sigma\), and by \(\Phi\) the distribution function of the standard normal law. Consider the quantity
\[ D_n(r)=\frac{1}{r}\sup_{\varepsilon=r}\sup_x |F(x)-\Phi(x)|,\qquad r>0, \]
where the outer supremum is taken over all possible \(n\)-tuples from \(\mathfrak S_n\) for which the Lyapunov ratio \(\varepsilon\) has the same value \(r\). It is well known (this fact was established by H. Cramér) that
\[ C=\sup_n\sup_r D_n(r)<\infty . \]
Until recently, the efforts of specialists were directed not toward finding the true value of \(C\), but toward constructing estimates for this constant.
Thus, C. Esseen \((^1)\) showed that, within the scheme of increasing sums of identically distributed random summands with \(M\xi_1=0\) and \(M|\xi_1|^3<\infty\) (the class of distributions of such summands will be denoted by \(\mathfrak F\)), the asymptotically correct constant is
\[ C^*=\sup_{\mathfrak F}\lim_{n\to\infty}\frac{1}{\varepsilon}\sup_x |F-\Phi| =\frac{\sqrt{10}+3}{6\sqrt{2\pi}}=0.40974\ldots . \tag{2} \]
Since \(C^*\le C\), C. Esseen’s result gives a lower estimate for \(C\).
For a long time the best upper estimate was the estimate \(C<4.8\), due to T. Bergström \((^2)\), and in the special case of identically distributed summands the estimate \(C<2.031\), obtained by K. Takano \((^3)\) (A. Berry’s own estimate in this case, \(C<1.88\), as it turned out, had been obtained by erroneous reasoning). Some time ago, by combining theoretical estimates and computations on an electronic computer, the author succeeded in proving that in the general case \(C<1.322\), and in the special case mentioned above \(C<1.301\) (see \((^4)\)). At present it has been possible to achieve a further lowering of the upper estimates of the constant \(C\) by refining Lemma 4, which is very important for the chosen method, from \((^5)\).
Lemma. Let \(f\) be the characteristic function of the sum \(\zeta/\sigma\). Then for all real \(t\) one may assert that
\[ |f(t)|\le \exp\left\{-\frac{t^2}{2}(1-k\varepsilon |t|)\right\}, \tag{3} \]
where
\[ k=4\sup_{x>0}(\cos x-1+x^2/2)x^{-3}=0.396647\ldots, \]
and the constant \(k\) in inequality (3) cannot be decreased.
Theorem 2. One may assert that in the general case
\[ C<0.9051, \]
and in the case of identically distributed summands,
\[ C<0.82. \]
The proof of this theorem, as well as the proof of the estimates of 1, is based on upper estimates \(D^*(r)\) of the quantity \(D(r)=\sup_n D_n(r)\), obtained with the aid of Theorem 1 and the lemma. Table 2 gives some values of \(D^*(r)\) and \(I^*(r)=rD^*(r)\) (rounded upward), pertaining to the general case.
For the function \(D^*(n)\) an asymptotic representation has been constructed as \(r\to 0\). Namely, uniformly over sets of distributions from \(\mathfrak S\), in the general case
\[ D^*(r)=0.81967+0.05894\,r^{1/3}+O(r^{2/3}), \]
and in the case of identically distributed summands
\[ D^*(r)=0.81967-0.99951\,r+O(r^2) \]
(all numbers are given with accuracy to the unit of the last digit).
Table 1
| \(n\) | \(C_n\) | \(n\) | \(C_n\) | \(n\) | \(C_n\) |
|---|---|---|---|---|---|
| 1 | 0.3703 | 8 | 0.4060 | 15 | 0.4078 |
| 2 | 0.3559 | 9 | 0.4037 | 16 | 0.4073 |
| 3 | 0.3981 | 10 | 0.4065 | 17 | 0.4079 |
| 4 | 0.3951 | 11 | 0.4063 | 18 | 0.4080 |
| 5 | 0.4010 | 12 | 0.4064 | 19 | 0.4077 |
| 6 | 0.4037 | 13 | 0.4074 | 20 | 0.4083 |
| 7 | 0.4015 | 14 | 0.4061 | 21 | 0.4078 |
Let us note that the following asymptotic estimates of the form
\[ \sup_x |F(x)-\Phi(x)|\le \varepsilon D^*(\varepsilon)\le 0.81968\,\varepsilon+\cdots \]
are of independent interest, since in the general case the problem of finding an asymptotically correct constant has not been solved at all, while in the special case noted above K. Esseen’s result does not guarantee a decrease of the remainder term uniform over the class \(\mathfrak S\).
Table 2
| \(r\) | \(D^*(r)\) | \(I^*(r)\) | \(r\) | \(D^*(r)\) | \(I^*(r)\) |
|---|---|---|---|---|---|
| 0.001 | 0.8337 | 0.00084 | 0.13 | 0.8975 | 0.11667 |
| 0.002 | 0.8362 | 0.00168 | 0.14 | 0.8994 | 0.12592 |
| 0.003 | 0.8382 | 0.00252 | 0.15 | 0.9010 | 0.13515 |
| 0.004 | 0.8400 | 0.00336 | 0.16 | 0.9023 | 0.14437 |
| 0.005 | 0.8412 | 0.00421 | 0.17 | 0.9033 | 0.15356 |
| 0.006 | 0.8424 | 0.00506 | 0.18 | 0.9040 | 0.16272 |
| 0.007 | 0.8434 | 0.00591 | 0.19 | 0.9045 | 0.17186 |
| 0.008 | 0.8443 | 0.00676 | 0.20 | 0.9049 | 0.18097 |
| 0.009 | 0.8452 | 0.00761 | 0.21 | 0.9050 | 0.19005 |
| 0.010 | 0.8459 | 0.00846 | 0.22 | 0.9051 | 0.19911 |
| 0.02 | 0.8517 | 0.01704 | 0.23 | 0.9050 | 0.20815 |
| 0.03 | 0.8565 | 0.02570 | 0.24 | 0.9049 | 0.21717 |
| 0.04 | 0.8613 | 0.03446 | 0.25 | 0.9048 | 0.22618 |
| 0.05 | 0.8664 | 0.04332 | 0.26 | 0.9046 | 0.23519 |
| 0.06 | 0.8715 | 0.05229 | 0.27 | 0.9044 | 0.24419 |
| 0.07 | 0.8765 | 0.06135 | 0.28 | 0.9043 | 0.25320 |
| 0.08 | 0.8811 | 0.07049 | 0.29 | 0.9042 | 0.26220 |
| 0.09 | 0.8853 | 0.07967 | 0.30 | 0.9041 | 0.27122 |
| 0.10 | 0.8890 | 0.08890 | 0.31 | 0.9040 | 0.28024 |
| 0.11 | 0.8923 | 0.09815 | 0.32 | 0.9040 | 0.28926 |
| 0.12 | 0.8951 | 0.10741 | 0.33 | 0.9039 | 0.29829 |
- A special position in problems of absolute and asymptotic estimation of the remainder term in the central limit theorem and its analogues in the mean metric is occupied by the case of summands \(\xi_j\) having one and the same Bernoulli distribution. As all investigations pertaining to this show (including (1)), it is precisely here that we encounter the distribution \(F\) with the greatest (with respect to the value \(\varepsilon\)) deviation from the distribution \(\Phi\).
For this special case of summands \(\xi_j\), the quantities
\[ C_n=\sup_r D_n(r) \]
can be computed directly for the first several values of \(n\) with the aid of computers. Table 1 gives the results of such computations for the first 21 values of \(n\) (accurate to the unit of the fourth decimal place).
From these computations it is seen that the values \(C_n\) computed by us (presumably) in the worst case for approximation certainly do not exceed \(C^*=0.40974\ldots\). Hence the natural supposition suggests itself that this fact also holds for the remaining values of \(n\).
In the end, the considerations presented by us speak quite convincingly in favor of the fact that
\[ C=C^*=(\sqrt{10}+3)/6\sqrt{2\pi}. \]
In conclusion, we note that in the course of the proof of Theorem 2 the monotonicity property of the function \(I^*(r)\) was used in an essential way. This property naturally increases the value of the values of \(I^*(r)\) given in Table 2.
Mathematical Institute
named after V. A. Steklov
Academy of Sciences of the USSR
Received
22 VIII 1966
References
\(^{1}\) C. G. Esseen, Skand. Aktuar., 3—4, 160 (1956).
\(^{2}\) H. Bergström, Skand. Aktuar., 33, 37 (1949).
\(^{3}\) K. Takano, Res. Mem. Just. Stat. Math., 9, 6 (1951).
\(^{4}\) V. M. Zolotarev, Theory of Probability and Its Applications, 10, 1, 108 (1966).
\(^{5}\) V. M. Zolotarev, Theory of Probability and Its Applications, 10, 3, 519 (1965).
-
As cited on the preceding page. ↩