UDC 519.281
MATHEMATICS
Submitted 1968-01-01 | RussiaRxiv: ru-196801.43920 | Translated from Russian

Full Text

UDC 519.281

MATHEMATICS

N. A. BODIN

ON THE THEORY OF GROUPED SAMPLES

(Presented by Academician Yu. V. Linnik on 4 III 1967)

In the present note a refinement and generalization to the multidimensional case is given of some results of Kulldorff \((^1)\) on the theory of grouped samples. For the history of the question see \((^1)\).

Let there be given a random variable with a continuous \(n\)-dimensional distribution function

\[ F(\mathbf{x}) = F(\mathbf{x}, \vec{\theta}) = F(x_1,\ldots,x_n;\theta_1,\ldots,\theta_s), \tag{1} \]

depending on an unknown vector parameter \(\vec{\theta}=(\theta_1,\ldots,\theta_s)\) with \(s\)-dimensional range of variation \(\Omega\). The problem is to estimate the parameter \(\vec{\theta}\) on the basis of \(N\) observations of this random variable. For this purpose the following procedure is proposed (a method of grouping observations). Let \(\mathfrak{X}\) be the range of variation of the variable \(\mathbf{x}=(x_1,\ldots,x_n)\). This region may be infinite and, in particular, may coincide with the whole \(n\)-dimensional space. Partition the region \(\mathfrak{X}\) into pairwise nonintersecting subregions \(\mathfrak{X}_1,\ldots,\mathfrak{X}_k\). Usually \(\mathfrak{X}\) is a finite or infinite rectangular parallelepiped; the regions \(\mathfrak{X}_i\) are also parallelepipeds formed by planes parallel to the faces of the parallelepiped \(\mathfrak{X}\).

The function \(F(\mathbf{x},\vec{\theta})\) and this partition define a finite probability scheme with \(k\) outcomes having probabilities

\[ p_i = p_i(\vec{\theta}) = p_i(\mathfrak{X}_i,\vec{\theta}) \quad (i=1,\ldots,k), \tag{2} \]

where \(p_i(\mathfrak{X}_i,\vec{\theta})\) is the probability of falling in the region \(\mathfrak{X}_i\). The method of grouping a sample for estimating the parameter \(\vec{\theta}\) consists in reducing this problem to estimating \(\vec{\theta}\) from observations in a finite probability scheme. In what follows we shall estimate the parameter \(\vec{\theta}\) from observations in the finite probability scheme (2).

Consider a sample \(\mathfrak{M}_N\) consisting of \(N\) observations in our finite scheme. Let \(N_i\) be the number of observations of the \(i\)-th outcome of the scheme. In the case where the \(p_i\) are determined by formulas (2), \(N_i\) is the number of observations falling in the region \(\mathfrak{X}_i\) \((i=1,\ldots,k)\). To estimate the parameter \(\vec{\theta}\) we use the maximum likelihood method. To the sample \(\mathfrak{M}_N\) there corresponds the likelihood function

\[ L = L_{\mathfrak{M}_N}(\vec{\theta}) = \prod_{i=1}^{k} p_i^{N_i}. \tag{3} \]

By a maximum likelihood estimate (m.l.e.) in the strict sense we mean such a value \(\vec{\theta}\in\Omega\) at which the likelihood function \(L=L(\vec{\theta})\) assumes its absolutely largest value.

Let the functions \(p_i=p_i(\vec{\theta})\) have partial derivatives with respect to all arguments. Any root of the system of likelihood equations

\[ \frac{\partial}{\partial \theta_t}\log L = 0 \quad (t=1,\ldots,s), \tag{4} \]

lying in \(\Omega\), will be called an m.l.e. in the broad sense. An m.l.e. may fail to exist, but then it can be replaced by a certain random variable

(see (1)), for which the results given below will also be valid.

Theorem 1. Let a sample \(\mathfrak{M}_N\) be fixed, and suppose that, for every \(i\) \((i=1,\ldots,k)\) for which \(N_i>0\), the functions \(p_i=p_i(\vec{\theta})=p_i(\theta_1,\ldots,\theta_s)\) have continuous first partial derivatives in the open parallelepiped \(\vec{\Omega}\) defined by the inequalities

\[ \omega_i' < \theta_i < \omega_i'' \qquad (i=1,\ldots,s) \tag{5} \]

(possibly \(\omega_i'=-\infty,\ \omega_i''=+\infty\)). Suppose that there exist points \(\bar{\vec{\theta}}=(\bar{\theta}_1,\ldots,\bar{\theta}_s)\) and \(\underline{\vec{\theta}}=(\underline{\theta}_1,\ldots,\underline{\theta}_s)\) of the parallelepiped \(\vec{\Omega}\) satisfying the following conditions: for each index \(t=1,\ldots,s\),

\[ \sum_{i=1}^{k} N_i \left.\frac{\partial \log p_i}{\partial \theta_t}\right|_{\theta_t=\bar{\theta}_t} > 0, \tag{6a} \]

\[ \sum_{i=1}^{k} N_i \left.\frac{\partial \log p_i}{\partial \theta_t}\right|_{\theta_t=\underline{\theta}_t} < 0 \tag{6b} \]

for all \(\theta_1,\ldots,\theta_{t-1},\theta_{t+1},\ldots,\theta_s\) satisfying inequalities (5). Then an m.l.e. exists in the broad sense. If, moreover, for any vector \(\vec{\theta}=\vec{\theta}'\) satisfying the system of likelihood equations (4), the matrix

\[ \left\| \left. \sum_{i=1}^{k} N_i \frac{\partial^2 \log p_i}{\partial \theta_t \partial \theta_u} \right|_{\vec{\theta}=\vec{\theta}'} \right\| \qquad (t,u=1,\ldots,s) \tag{7} \]

is the matrix of a negative definite quadratic form, then there exists a unique m.l.e. in the strict sense.

Let \(\vec{\theta}^{(0)}\) be the true value of the parameter \(\vec{\theta}\). An m.l.e. (in the strict sense or, respectively, in the broad sense) \(\vec{\theta}^{(\mathfrak{M}_N)}\) is called consistent if, as \(N\to\infty\), the vector \(\vec{\theta}^{(\mathfrak{M}_N)}\) converges in measure to \(\vec{\theta}^{(0)}\). The m.l.e. \(\vec{\theta}^{(\mathfrak{M}_N)}\) is called asymptotically efficient (in Wald’s sense) if

\[ \text{(a)}\quad \lim_{N\to\infty} E\left[\sqrt{N}\left(\vec{\theta}^{(\mathfrak{M}_N)}-\vec{\theta}^{(0)}\right)\right]=0; \tag{8} \]

\[ \text{(b)}\quad \lim_{N\to\infty} \left\| E\left\{\left[\sqrt{N}\left(\theta_t^{(\mathfrak{M}_N)}-\theta_t^{(0)}\right)\right] \left[\sqrt{N}\left(\theta_u^{(\mathfrak{M}_N)}-\theta_u^{(0)}\right)\right]\right\} \right\| = \]

\[ = \left\| \sum_{i=1}^{k} \left( p_i \frac{\partial \log p_i}{\partial \theta_t} \cdot \frac{\partial \log p_i}{\partial \theta_u} \right) \right\|_{\vec{\theta}=\vec{\theta}^{(0)}}^{-1}; \tag{9} \]

\[ \text{(c)} \]
the vector random variable
\[ \sqrt{N}\left(\vec{\theta}^{(\mathfrak{M}_N)}-\vec{\theta}^{(0)}\right) \]
is asymptotically normal as \(N\to\infty\) (the parameters of the limiting distribution are determined by the right-hand sides of (8) and (9)).

Theorem 2. Suppose that, for sufficiently large \(N\), the system of maximum-likelihood equations (4) has a unique\(^*\) solution \(\vec{\theta}^{(\mathfrak{M}_N)}\), and suppose that, in some neighborhood \(\vec{\Omega}_0 \subset \vec{\Omega}\) of the point \(\vec{\theta}^{(0)}\), there exist second partial derivatives of the functions \(p_i=p_i(\vec{\theta})=p_i(\theta_1,\ldots,\theta_s)\) \((i=1,\ldots,k)\), and, moreover,

\[ \sum_{i=1}^{k} \left( \left. \frac{\partial p_i}{\partial \theta_t} \right|_{\theta_t=\theta_t^{(0)}} \right)^2 \ne 0 \qquad (t=1,\ldots,s); \tag{10} \]

\(^*\) If, as mentioned above, we generalize the concept of an m.l.e., then it is sufficient to require that the system of equations (4) have no more than one root.

suppose, moreover, that there exists a positive differentiable function \(g(\vec{\theta})=g(\theta_1,\ldots,\theta_s)\) such that the functions

\[ \frac{\partial}{\partial \theta_t}\left(g(\vec{\theta})\frac{\partial \log p_i}{\partial \theta_t}\right) \quad (i=1,\ldots,k;\ t=1,\ldots,s) \tag{11} \]

are continuous. Then, as \(N\to\infty\), the estimate \(\vec{\theta}^{\mathfrak{M}_N}\) is consistent and asymptotically efficient.

Let \(p_N\) be the probability that, for a given \(N\), the m.l.e. exists in the strict sense. Then \(p_N\to 1\) as \(N\to\infty\).

As an application of the results obtained, let us consider the m.l.e. for the mathematical expectation \(\vec{\mu}\) of an \(n\)-dimensional normal distribution with known covariance matrix \(\vec{\Sigma}\), on the basis of a grouping of the observation results. Let \(\mathfrak{X}\) coincide with \(n\)-dimensional space. Partition it into \(k=k_1\cdots k_n\) pairwise disjoint parallelepipeds \((k_i\ge 2\ (i=1,\ldots,n))\)

\[ \mathfrak{X}_{\nu_1,\ldots,\nu_n}\quad (\nu_1=1,\ldots,k_1;\ldots;\nu_n=1,\ldots,k_n), \tag{12} \]

defined by the inequalities

\[ x_1^{(\nu_1-1)}<x_1\le x_1^{(\nu_1)}, \]
\[ \cdots\cdots\cdots\cdots\cdots \tag{13} \]
\[ x_n^{(\nu_n-1)}<x_n\le x_n^{(\nu_n)}, \]

where

\[ -\infty=x_i^{(0)}<x_i^{(1)}<\ldots<x_i^{(k_i)}=+\infty \quad (i=1,\ldots,n) \tag{14} \]

are certain fixed numbers. Then system (4) takes the form

\[ \frac{\partial}{\partial \mu_t}\log L=0 \quad (t=1,\ldots,n), \tag{15} \]

where

\[ L=L_{\mathfrak{M}_N}(\vec{\mu}) = \prod_{\nu_1=1}^{k_1}\cdots\prod_{\nu_n=1}^{k_n} p_{\nu_1,\ldots,\nu_n}^{N_{\nu_1,\ldots,\nu_n}} = \prod_{\nu_1=1}^{k_1}\cdots\prod_{\nu_n=1}^{k_n} \left( \int_{\mathfrak{X}_{\nu_1,\ldots,\nu_n}} f(\mathbf{x},\vec{\mu})\,d\mathbf{x} \right)^{N_{\nu_1,\ldots,\nu_n}} . \tag{16} \]

where

\[ f(\mathbf{x},\vec{\mu}) = (2\pi)^{-1/2 n}|\vec{\Sigma}|^{-1/2} \exp\left[-\frac12(\mathbf{x}-\vec{\mu})^T\vec{\Sigma}^{-1}(\mathbf{x}-\vec{\mu})\right]; \tag{17} \]

here \(N_{\nu_1,\ldots,\nu_n}\) is the number of observations in the sample \(\mathfrak{M}_N\) falling into the region \(\mathfrak{X}_{\nu_1,\ldots,\nu_n}\).

Theorem 3. Suppose that for the sample \(\mathfrak{M}_N\) the conditions

\[ \sum_{\nu_1=1}^{k_1}\cdots \sum_{\nu_{i-1}=1}^{k_{i-1}} \sum_{\nu_{i+1}=1}^{k_{i+1}}\cdots \sum_{\nu_n=1}^{k_n} N_{\nu_1,\ldots,\nu_{i-1},1,\nu_{i+1},\ldots,\nu_n}\ne N, \]

\[ (i=1,\ldots,n) \tag{18} \]

\[ \sum_{\nu_1=1}^{k_1}\cdots \sum_{\nu_{i-1}=1}^{k_{i-1}} \sum_{\nu_{i+1}=1}^{k_{i+1}}\cdots \sum_{\nu_n=1}^{k_n} N_{\nu_1,\ldots,\nu_{i-1},k_i,\nu_{i+1},\ldots,\nu_n}\ne N. \]

Then there exists a unique m.l.e. in the strict sense. It is computed as the unique root of system (15). This estimate, as \(N\to\infty\), is consistent and asymptotically efficient.

Theorem 3 follows directly from Theorems 1 and 2 if we use the following integral inequality.

Lemma. Let \(A(\mathbf{x}) = A(x_1,\ldots,x_n)\) be a negative definite quadratic form, and let \(D\) be an \(n\)-dimensional parallelepiped in the space \((x_1,\ldots,x_n)\), defined by the inequalities (13); let \(D(\vec{\mu})\) be the result of a parallel translation of the parallelepiped \(D\) by the vector \(\vec{\mu}=(\mu_1,\ldots,\mu_n)\). Denote

\[ p(\vec{\mu})=\int_{D(\vec{\mu})} e^{A(\mathbf{x})}\,d\mathbf{x}. \]

Then the matrix

\[ \left\| \frac{\partial}{\partial \mu_i \partial \mu_j}\log p(\vec{\mu}) \right\| \qquad (i,j=1,\ldots,n) \]

is the matrix of a negative definite quadratic form.

In the case \(n=2\), this proposition was proved by the author and V. A. Zalgaller \((^2)\). In the general case it was proved geometrically by V. A. Zalgaller.

The results of the study can be generalized to the case of partial grouping of samples (see \((^1)\)). In addition to the example of the maximum-likelihood estimate \(\vec{\mu}\) under the normal distribution, one may consider problems of estimating the parameters of other specific multivariate distributions.

I express my gratitude to Yu. V. Linnik for his attention to this work.

Leningrad Branch
of the V. A. Steklov Mathematical Institute
Academy of Sciences of the USSR

Received
28 II 1967

REFERENCES

  1. T. Kulldorff, Introduction to the theory of estimation from grouped and partially grouped samples, Moscow, 1966.
  2. N. A. Yudin, V. A. Zalgaller, Lithuanian Mathematical Collection, 7, No. 3 (1967).

Submission history

UDC 519.281