UDC 519.281.2
MATHEMATICS
Submitted 1970-01-01 | RussiaRxiv: ru-197001.76060 | Translated from Russian

Full Text

UDC 519.281.2

MATHEMATICS

Yu. S. TUMASHEV

ON THE APPLICATION OF THE MAXIMUM LIKELIHOOD METHOD FOR PROCESSING RANDOM GAUSSIAN PROCESSES

(Presented by Academician B. N. Petrov, October 18, 1968)

Usually, for processing a finite number of measurements with random errors, the maximum likelihood method is used \((^{1,2})\). In the case when the set of measured quantities can be regarded as a random vector with a Gaussian (normal) probability density distribution, the application of this method generally requires knowledge of the matrix inverse to the correlation matrix of the errors. In passing from discrete measurements to continuous ones, the random vector of measurements becomes a random Gaussian process, and the correlation matrix of measurement errors becomes a correlation function. Therefore, it is impossible to carry out a direct generalization of the maximum likelihood method for processing random Gaussian processes.

The results obtained in this paper make it possible, when processing random Gaussian processes, to find estimates of the parameters to be determined from equations analogous to the likelihood equations for processing discrete measurements.

§ 1. Let the measured quantity \(y(t)\) be related to the parameters to be determined \(c_1 \ldots c_n\) by the equation

\[ y(t)=\varphi(c,t). \tag{1} \]

The function \(\varphi(c,t)\) is continuous together with its first derivatives with respect to \(c_1 \ldots c_n\).

Measurements of \(y(t)\) are made at the instants \(t_i\) \((i=1,\ldots,N)\) on the interval \([0T]\). The sample of measurements \(y_{\text{meas}}(t)\) is a random \(N\)-dimensional vector. It is assumed that the correlation matrix \(K(\delta y)\) is known. The application of the maximum likelihood method to the processing of \(y_{\text{meas}}(t)\) leads to the solution of a nonlinear system of equations with respect to the estimated parameters \(c\)

\[ \frac{\partial \varphi^{T}}{\partial c}K^{-1}(\delta y)(y_{\text{meas}}-\varphi)=0. \tag{2} \]

Here

\[ \frac{\partial \varphi^{T}}{\partial c} = \begin{bmatrix} \dfrac{\partial \varphi}{\partial c_1}(t_1)\ldots \dfrac{\partial \varphi}{\partial c_1}(t_N)\\ \cdots \cdots \cdots \cdots \cdots\\ \dfrac{\partial \varphi}{\partial c_n}(t_1)\ldots \dfrac{\partial \varphi}{\partial c_n}(t_N) \end{bmatrix}; \qquad \varphi= \begin{bmatrix} \varphi(ct_1)\\ \vdots\\ \varphi(ct_N) \end{bmatrix}; \qquad y_{\text{meas}}= \begin{bmatrix} y_{\text{meas}}(t_1)\\ \vdots\\ y_{\text{meas}}(t_N) \end{bmatrix} \]

are \(N\)-dimensional vectors.

In the theory of random functions, all statistical properties of Gaussian processes are completely determined by specifying the correlation function \(R(t,\tau)\). This function can always be approximated on the measurement interval with any prescribed accuracy by a set of linearly independent functions \(a_i(t)\) \((i=1,\ldots,m)\)

\[ R(t,\tau)=\mathbf{a}^{T}(t)H\mathbf{a}(\tau). \tag{3} \]

Here \(\mathbf{a}^T(t)=[a_1(t),\ldots,a_m(t)]\); \(\mathbf{a}(\tau)=\begin{bmatrix} a_1(\tau)\\ \vdots\\ a_m(\tau)\end{bmatrix}\), and \(H\) is a nonsingular constant matrix defined for the given correlation function and the basis set \(a_i(t)\).

For values of the random function \(y_{\text{из}}(t)\) at discrete instants \(t_i\) \((i=1,\ldots,N)\), an approximate correlation matrix of the random vector \(\mathbf{y}_{\text{из}}\) can be obtained from (3). We write the approximate correlation matrix for the sample \(\mathbf{y}_{\text{из}}\) in the form

\[ K_0(\delta y)=A^T H A, \tag{4} \]

where

\[ A= \begin{bmatrix} a_1(t_1) & \ldots & a_1(t_N)\\ \cdot & \cdot & \cdot\\ a_m(t_1) & \ldots & a_m(t_N) \end{bmatrix}. \]

The rank of the matrix \(A\) is equal to \(m\). This follows from the linear independence of the functions \(a_i(t)\) and from the fact that \(t_i\ne t_j\) for \(i\ne j\). As shown in (3), the rank of the matrix \(K_0(\delta y)\) must not exceed \(m\). Since \(m<N\), the matrix (4) cannot be used directly in system (2).

Transform the vectors \(\mathbf{y}_{\text{из}}\) and \(\boldsymbol{\varphi}\) according to the formulas

\[ \mathbf{z}_{\text{из}}=M\mathbf{y}_{\text{из}}, \qquad \boldsymbol{\varphi}_1(c)=M\boldsymbol{\varphi}(c). \tag{5} \]

The correlation matrices of the random vectors \(\mathbf{y}_{\text{из}}\) and \(\mathbf{z}_{\text{из}}\) are related by

\[ K(\delta z)=M K(\delta y)M^T \tag{6} \]

and, consequently,

\[ K_0(\delta z)=M A^T H A M^T. \tag{6'} \]

Consider the random vector \(\mathbf{z}_{\text{из}}\) with fictitious statistical characteristics determined by the pseudocorrelation matrix according to the formula

\[ K(\delta z)=K_0(\delta z)+\lambda \varepsilon, \tag{7} \]

where \(K_0(\delta z)\) is determined by (6), and \(\varepsilon\) is a nonsingular matrix.

We formulate for the pencil of matrices (7) a lemma, without giving its proof.

Lemma. Let \(K_0\) be a singular matrix, and let the matrix \(\varepsilon\) be nonsingular. There exists a \(\delta\)-neighborhood of the point \(\lambda=0\) in which, except for the point itself, all matrices of the pencil (7) are nonsingular.

Without loss of generality, one may assume that \(\varepsilon\equiv E\). Consequently, the pseudocorrelation matrix (7) can be used in the likelihood equations.

§ 2. After substituting (7) into them and performing some transformations, the likelihood equations are expanded in a series in \(\lambda\). In the resulting expansions the limiting transition \(\lambda\) to zero is carried out.

Introduce the matrix \(X\) by the formula

\[ \frac{\partial \boldsymbol{\varphi}_1^T}{\partial c} K^{-1}(\delta z)=X. \tag{8} \]

Substituting the expression for \(K(\delta z)\) from (7) and (6′) into (8), we obtain

\[ X M A^T H A M^T+\lambda X=\frac{\partial \boldsymbol{\varphi}_1^T}{\partial c}. \tag{9} \]

The linear system of equations (9) is solved with respect to \(X\) by a method analogous to the method for solving Fredholm integral equations of the second kind with degenerate kernels. As a result we obtain

\[ X=\frac{1}{\lambda}\frac{\partial \boldsymbol{\varphi}_1^T}{\partial c} -\frac{1}{\lambda}\frac{\partial \boldsymbol{\varphi}_1^T}{\partial c} M A^T\left(H A M^T M A^T+\lambda E\right)^{-1}H^{-1}A M^T. \tag{10} \]

Using (5), expand (10) in a series in \(\lambda\). The likelihood equations are represented in the form

\[ \frac{1}{\lambda}\frac{\partial \varphi^T}{\partial c} \left[ M^T M - M^T M A^T(AM^TMA^T)^{-1}AM^TM + \lambda M^T M A^T(AM^TMA^T)^{-1}\times \]
\[ \times H^{-1}(AM^TMA^T)^{-1}AM^TM + \lambda^2(\ldots) \right](\mathbf{y}_{\mathrm{из}}-\varphi)=0, \tag{11} \]

Consider a specific form of the nonsingular transformation \(M\)

\[ M=A^T(AA^T)^{-1}A+\lambda E. \tag{12} \]

Here \(\lambda\) is a parameter identical with the parameter \(\lambda\) in (11). The likelihood equations (11), using (12), shall be represented in the form of a series in \(\lambda\), restricting ourselves to the linear term of the expansion.

\[ \frac{\partial \varphi^T}{\partial c} \left\{ A^T(AA^T)^{-1}H^{-1}(AA^T)^{-1}A +\lambda\left[ E-A^T(AA^T)^{-1}A-\right.\right. \]
\[ \left.\left. - A^T(AA^T)^{-1}H^{-1}(AA^T)^{-1}H^{-1}(AA^T)^{-1}A \right]\right\}(\mathbf{y}_{\mathrm{из}}-\varphi)=0. \tag{13} \]

Three cases are possible for the approximate representation of the correlation function in the form (3). The first case is when the number of terms in the expansion \(m\) is greater than the number of parameters being determined \(n\); the second case is \(m<n\), and the third case is \(m=n\).

§ 3. First case \(m>n\).

Since \(N\gg m\) and the functions \(a_i(t)\) are linearly independent, the rank of the matrix

\[ \frac{\partial \varphi^T}{\partial c}A^T(AA^T)^{-1} \]

is equal to the number of parameters being determined. Therefore the Jacobian of the transformation for \(c\), determined by system (13) at \(\lambda=0\), is different from zero, i.e., the condition holds

\[ \det \frac{\partial \varphi^T}{\partial c} A^T(AA^T)^{-1}H^{-1}(AA^T)^{-1}A \frac{\partial \varphi}{\partial c}\ne 0. \]

Consequently, the likelihood equations for this case take the form

\[ \frac{\partial \varphi^T}{\partial c} A^T(AA^T)^{-1}H^{-1}(AA^T)^{-1}A (\mathbf{y}_{\mathrm{ит}}-\varphi)=0. \tag{14} \]

The correlation matrix of the parameters estimated by formula (14) has the form

\[ K(\delta c)= \left[ \frac{\partial \varphi^T}{\partial c} A^T(AA^T)^{-1}H^{-1}(AA^T)^{-1}A \frac{\partial \varphi}{\partial c} \right]^{-1}. \tag{15} \]

In equations (14) and (15) a direct limiting transition to continuous measurement is possible.

Second case \(m<n\). Since \(N\gg m\), the rank of the matrix

\[ \frac{\partial \varphi^T}{\partial c}A^T(AA^T)^{-1} \]

is equal to \(m\), and the Jacobian of the transformation for \(c\), determined by the first term of the expansion of system (13) in \(\lambda\), is equal to zero, i.e.,

\[ \det\left[ \frac{\partial \varphi^T}{\partial c} A^T(AA^T)^{-1}H^{-1}(AA^T)^{-1}A \frac{\partial \varphi}{\partial c} \right]=0. \]

We shall now show that the Jacobian of the transformation for \(c\), determined by system (13), is different from zero. For this it is sufficient to show the nonsingularity of the matrix

\[ E-A^T(AA^T)^{-1}A - A^T(AA^T)^{-1}H^{-1}(AA^T)^{-1}H^{-1}(AA^T)^{-1}A. \]

Indeed, by direct verification it is easy to show that the matrix

\[ E-A^T(AA^T)^{-1}A - A^T HAA^T HA \]

is, for

\[ E-A^T(AA^T)^{-1}A - A^T(AA^T)^{-1}H^{-1}(AA^T)^{-1}H^{-1}(AA^T)^{-1}A \]

the inverse.

Multiply system (13) from the left by the nonsingular matrix

\[ \frac{1}{\lambda}E -\frac{1}{\lambda}\frac{\partial \varphi^T}{\partial c}A^T \left( A\frac{\partial \varphi}{\partial c}R\frac{\partial \varphi^T}{\partial c} \right)^{-1} A\frac{\partial \varphi}{\partial c}R^{-1} + \]
\[ + \frac{\partial \varphi^T}{\partial c}A^T \left( A\frac{\partial \varphi}{\partial c}R^{-1}\frac{\partial \varphi^T}{\partial c}A^T \right)^{-1} AA^THAA^T \left( A\frac{\partial \varphi}{\partial c}R^{-1}\frac{\partial \varphi^T}{\partial c}A^T \right)^{-1} A\frac{\partial \varphi}{\partial c}R^{-1}, \]

where

\[ R=\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c} -\frac{\partial\varphi^T}{\partial c}A^T(AA^T)^{-1}\frac{\partial\varphi}{\partial c}. \]

Then the likelihood equations take the form

\[ \left\{ \frac{\partial\varphi^T}{\partial c}\left[E-A^T(AA^T)^{-1}A\right] +\frac{\partial\varphi^T}{\partial c}A^T \left[ A\frac{\partial\varphi}{\partial c} \left(\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c}\right)^{-1} \frac{\partial\varphi^T}{\partial c}A^T \right]^{-1} \times \right. \]
\[ \left. {}\times A \left[ E-\frac{\partial\varphi}{\partial c} \left(\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c}\right)^{-1} \frac{\partial\varphi^T}{\partial c} \right] \right\} (y_{\mathrm{meas}}-\varphi)=0. \tag{16} \]

System (16) has a unique solution for \(c\). For this it is sufficient to show that the matrix

\[ \frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c} -\frac{\partial\varphi^T}{\partial c}A^T(AA^T)^{-1}A\frac{\partial\varphi}{\partial c} \]

has an inverse. Indeed, such a matrix is

\[ \left(\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c}\right)^{-1} + \left(\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c}\right)^{-1} \times \]
\[ {}\times \frac{\partial\varphi}{\partial c}A^T \left[ AA^T- A\frac{\partial\varphi}{\partial c} \left(\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c}\right)^{-1} \frac{\partial\varphi^T}{\partial c}A^T \right]^{-1} A\frac{\partial\varphi}{\partial c} \left(\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c}\right)^{-1}. \]

The correlation matrix of the parameters \(c\) estimated by formula (16) can be represented in the form

\[ K(\delta c)=B^TAA^THAA^TB, \tag{17} \]

where

\[ B= \left[ A\frac{\partial\varphi}{\partial c} \left(\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c}\right)^{-1} \frac{\partial\varphi^T}{\partial c}A^T \right]^{-1} A\frac{\partial\varphi}{\partial c} \left(\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c}\right)^{-1}. \]

In equations (16) and (17) a passage to a continuous measurement is also possible.

Third case \(m=n\). In this case the matrix \(\dfrac{\partial\varphi^T}{\partial c}A^T\) is square and nonsingular. Its rank is equal to the rank of the matrix \(AA^T\). Therefore the likelihood equations take the form

\[ A(y_{\mathrm{meas}}-\varphi)=0. \tag{18} \]

The correlation matrix of the parameters estimated by formula (18) has the form

\[ K(\delta c)= \left(A\frac{\partial\varphi}{\partial c}\right)^{-1} AA^THAA^T \left(\frac{\partial\varphi^T}{\partial c}A^T\right)^{-1}. \tag{19} \]

In formulas (18), (19) a passage to a continuous measurement is possible. In this case the following limiting expressions will hold for the matrices occurring in formulas (14), (15), (16), (17), (18), (19):

\[ \lim_{\Delta t\to 0} A\frac{\partial\varphi}{\partial c}\Delta t = \int_0^t a(t)\frac{\partial\varphi}{\partial c}(c,t)\,dt, \]

\[ \lim_{\Delta t\to 0} AA^T\Delta t = \int_0^t a(t)a^T(t)\,dt, \]

\[ \lim_{\Delta t\to 0} \left(\frac{\partial\varphi^T}{\partial c}\frac{\partial\varphi}{\partial c}\right)\Delta t = \int_0^t \frac{\partial\psi^T}{\partial c}(c,t) \frac{\partial\varphi}{\partial c}(c,t)\,dt, \tag{20} \]

\[ \lim_{\Delta t\to 0} A(y_{\mathrm{meas}}-\varphi)\Delta t = \int_0^t a(t)\,[y_{\mathrm{meas}}(t)-\varphi(c,t)]\,dt. \]

Substituting equalities (20) into systems (14), (15), (16), (17), (18), (19), we obtain the likelihood equations for processing Gaussian processes and the estimates of the correlation matrices of the parameters being determined.

Received
2 X 1968

CITED LITERATURE

  1. Yu. V. Linnik, The Method of Least Squares and the Foundations of the Theory of Observation Processing, Moscow, 1962.
  2. H. Cramér, Mathematical Methods of Statistics, IL, 1948.
  3. F. R. Gantmakher, The Theory of Matrices, “Nauka,” 1966.

Submission history

UDC 519.281.2