Full Text
UDC 518.512.25
MATHEMATICS
Yu. A. KUZNETSOV
ON THE THEORY OF ITERATION PROCESSES
(Presented by Academician L. V. Kantorovich on 15 V 1968)
We shall denote by \(V_n\) the space of complex \(n\)-dimensional vectors with scalar product
\[ (\varphi,\psi)=\sum_{i=1}^{n}\varphi_i\bar{\psi}_i \]
and norm \(\|\varphi\|=(\varphi,\varphi)^{1/2}\). Let \(Q\subset V_n\). A matrix \(B\)* is called positive definite on \(Q\) \((B\underset{Q}{>}0)\) if \((B\varphi,\varphi)>0\) for all \(\varphi\in Q\) \((\varphi\ne\theta\), where \(\theta\) is the zero vector), and positive semidefinite on \(Q\) \((B\underset{Q}{\geq}0)\) if \((B\varphi,\varphi)\geq 0\) for all \(\varphi\in Q\). The set of all eigenvalues of the matrix \(B\) will be denoted by \(\sigma(B)\), and the quantity
\[ \rho(B)=\max_{\lambda\in\sigma(B)}|\lambda| \]
will be called its spectral radius (see [1]). The relation \(\alpha\leq\sigma(B)\leq\beta\), where \(\alpha\) and \(\beta\) are real numbers, means that all eigenvalues of the matrix \(B\) belong to the interval \([\alpha;\beta]\), and moreover \(\alpha,\beta\in\sigma(B)\).
For solving the system of linear algebraic equations
\[ A\varphi=f, \tag{1} \]
where \(A\) is a nonsingular matrix and \(f\in V_n\) is a given vector, the iteration process
\[ \varphi^{k+1}=\widetilde{T}\varphi^k\qquad (k=0,1,\ldots) \tag{2} \]
is proposed. Here \(\widetilde{T}\) is some, generally speaking, nonlinear operator.
By convergence of the iteration process (2) we shall mean convergence of the sequence \(\{\varphi^k\}\) to the vector \(\varphi^*=A^{-1}f\), which is the unique solution of system (1). The iteration process (2) can always be written in the equivalent form
\[ \psi^{k+1}=T\psi^k, \]
where \(\psi^k=\varphi^k-\varphi^*\), \(T\psi^k=\widetilde{T}(\psi^k+\varphi^*)-\varphi^*\) \((k=0,1,\ldots)\). The operator \(T\), defined in this way, is called the transition operator of the iteration process (2). Consider a closed subset \(U^0\) of the space \(V_n\), possessing the property that \(\varphi^k\in U^0\) \((k=0,1,\ldots)\).
Introduce the sets
\[ U=\{\psi:\psi=\varphi-\varphi^*,\ \varphi\in U^0\},\qquad U_A=\{\psi:\psi=A\varphi-f,\ \varphi\in U^0\}. \]
If the iteration process (2) converges for any \(\varphi^0\in U^0\), then the quantity
\[ R_{\infty}^{U}(T)=-\lim_{k\to\infty}\left[\frac{1}{k}\ln\sup_{\psi^0\in U}\frac{\|T^k\psi^0\|}{\|\psi^0\|}\right] \]
is called its asymptotic rate of convergence on the set \(U\). The asymptotic rate of convergence in \(V_n\) will be denoted by \(R_\infty(T)\). Note that, since in a finite-dimensional space all norms are equivalent, any norm may be used to find \(R_\infty^U(T)\). In particu-
* In the present paper all matrices are assumed to be square of order \(n\).
if \(U=V_n\) and \(T\) is a matrix, then \(R_\infty(T)=-\ln\rho(T)\). We shall say that the iteration process (2) with transition operator \(T_1\) on the set \(U_1\) is asymptotically faster than the process with transition operator \(T_2\) on the set \(U_2\), if \(R_\infty^{U_1}(T_1)>R_\infty^{U_2}(T_2)\).
A functional \(J(\varphi)\), defined and continuous on the set \(U^0\), is called subordinate to system (1) on this set if the following hold:
1) \(J(\varphi)>J(\varphi^*)\) for all \(\varphi\in U^0\), except \(\varphi=\varphi^*=A^{-1}f\);
2) the Lebesgue sets \(Z_c=\{\varphi:J(\varphi)\le c\}\) of the functional \(J(\varphi)\) are bounded for all \(c\ge J(\varphi^*)\).
Let the functional \(J(\varphi)\) be subordinate on \(U^0\) to system (1). Then, for any sequence \(\{\varphi^k\}\) \((\varphi^k\in U^0,\ k=0,1,\ldots)\), the relation \(\|\varphi^k-\varphi^*\|\to0\) as \(k\to\infty\) holds if and only if \(|J(\varphi^k)-J(\varphi^*)|\to0\) as \(k\to\infty\).
In what follows it will always be assumed that the operator \(T\) of the iteration process (2) is defined and continuous at every \(\varphi\in U^0\), with the possible exception of the vector \(\varphi=\varphi^*\), and that \(T\varphi\in U^0\) for all \(\varphi\in U^0\) \((\varphi\ne\varphi^*)\). In particular, this is always the case for the iterative methods proposed below.
Theorem 1. If there exists a functional \(J(\varphi)\), subordinate to system (1) on the set \(U^0\), such that \(J(\widetilde T\varphi)\le J(\varphi)\) for all \(\varphi\in U^0\) \((\varphi\ne\varphi^*)\), and for every \(\varphi\in U^0\) \((\varphi\ne\varphi^*)\) there is an integer \(p=p(\varphi)\ge1\) for which \(J(\widetilde T^p\varphi)<J(\varphi)\), then the iteration process (2) converges for any initial vector \(\varphi^0\in U^0\).
The proof of the theorem is based on the finite-dimensionality of the space \(V_n\) and on the properties of a subordinate functional.
The functional
\[ J_D(\varphi)=(D(A\varphi-f),A\varphi-f), \]
where \(A\) is a matrix, \(f\) is the right-hand side of system (1), and \(D\) is a Hermitian matrix, is subordinate to system (1) on the set \(U^0\) if \(J_D(\varphi)\ge \alpha\|A\varphi-f\|^2\) \((\alpha>0)\) for any \(\varphi\in U^0\). Suppose that the latter holds. Then, if the iteration process (2) converges for any \(\varphi^0\in U^0\), then
\[ R_\infty^U(T)=-\lim_{k\to\infty}\left[\frac{1}{k}\ln\sup_{\varphi\in U^0}\left(\frac{J(\widetilde T^k\varphi)}{J(\varphi)}\right)^{1/2}\right]. \]
Consider the iteration process
\[ \varphi^{k+1}=\varphi^k-\tau_k B(A\varphi^k-f),\quad \varphi^k\in U^0\quad (k=0,1,\ldots). \tag{3} \]
Here \(B\) is a certain matrix for which the matrix \(B^*A^*DAB>0\) on \(U_A\), and \(\tau_k=\tau(\varphi^k)\), where
\[ \tau(\varphi)=\frac{((DAB)^*(A\varphi-f),A\varphi-f)}{(DAB(A\varphi-f),AB(A\varphi-f))}. \]
It is not difficult to see that, with this choice of \(\tau_k\), for the iteration process (3)
\[
J_D(\varphi^{k+1})/J_D(\varphi^k)
=
\left[1-\frac{|(DAB\xi^k,\xi^k)|^2}{(DAB\xi^k,AB\xi^k)(D\xi^k,\xi^k)}\right]\le1
\]
\[
(\xi^k=A\varphi^k-f,\quad k=0,1,\ldots).
\]
Lemma 1. Definiteness of the matrix \(DAB\) on the set \(U_A\) (either \(DAB>0\) on \(U_A\), or \(-DAB>0\) on \(U_A\)) is sufficient for convergence of the iteration process (3) for any \(\varphi^0\in U^0\); if \(DAB=(DAB)^*\) and \(U_A\) is a convex set, or \(DAB\) is a real matrix and \(U_A\) is a convex set of real vectors, then this condition is also necessary.
We note that if, for all \(\xi\in U_A\), \(|(DAB\xi,\xi)|\ge a(\xi,\xi)\) \((a>0)\), then for the iteration process (3) \(R_\infty^U(T)\ge\beta>0\). This, for example, occurs when \(U_A\) is a linear space and \(DAB>0\) on \(U_A\).
Let us consider three particular cases of interest.
1) Let \(A \underset{V_n}{>} 0\) and \(D=A^{-1}\). Then the condition \(B \underset{U_A}{>} 0\) is sufficient for convergence of the iterative process (3) for any \(\varphi^0 \in U^0\). For the case \(B \underset{V_n}{>} 0\) and \(U^0=V_n\), we give the formula for its asymptotic rate of convergence. Since \(0<m\leq \sigma(BA)\leq M=\rho(BA)\), it is not difficult, following (5), to show that
\[ R_{\infty}(T)=-\ln\left[\frac{M-m}{M+m}\right], \]
where \(T=I-\tau BA\) (\(I\) here and below is the identity matrix) is the transition operator of the iterative process (3). For \(B=I\) we arrive at the method of steepest descent, and relation (4) easily follows from \((4')\).
If the corresponding matrix iterative process
\[ \varphi^{k+1}=\varphi^k-B(A\varphi^k-f),\qquad \varphi^k\in V_n\ (k=0,1,\ldots) \]
converges, then it is not difficult to see that \(R_{\infty}(T)\geq R_{\infty}(I-BA)\), and equality is possible only in the case \(2-m=M\) (this, for example, occurs when \(I-BA\) is a weakly cyclic matrix of index 2 (see \((3')\)). In particular, for \(0\leq \sigma(I-BA)\leq \rho(I-BA)<1\) we have
\[ R_{\infty}(T)=-\ln\left[\frac{\rho(I-BA)}{2-\rho(I-BA)}\right] > R_{\infty}(I-BA)=-\ln\rho(I-BA). \]
2) Suppose that \(B \underset{V_n}{>} 0\). Then, for \(D=B\), the iterative process (3) converges for any \(\varphi^0\in U^0\), if \(A \underset{U_B}{>} 0\), where \(U_B=\{\varphi:\varphi=B\xi,\ \xi\in U_A\}\). It follows from this that, when the matrices \(A,B\) and the vector \(f\) are real, \((A+A^*) \underset{V_n}{>} 0\) is a sufficient condition for convergence of the iterative process (3) for any initial real vector \(\varphi^0\). As before, it is not difficult to show that when \(A \underset{V_n}{>} 0\) and \(0<m\leq \sigma(BA)\leq M\),
\[ R_{\infty}(T)=-\ln\left[\frac{M-n}{M+n}\right]. \]
Hence, for example, it follows that the asymptotic rates of convergence of the methods of steepest descent and of minimal residuals coincide. This also easily follows from (5).
3) Let the matrix \(A\) of even order \(n\) have the form:
\[ A= \begin{bmatrix} A_1 & -A_2\\ -A_3 & A_4 \end{bmatrix}, \]
where \(A_1=A_4^*\), \(A_2 \underset{V_n}{\geq} 0\) and \(A_3 \underset{V_n}{\geq} 0\) are \(\frac n2\times \frac n2\)-matrices. Systems of equations with matrices of this form arise, for example, in the finite-difference approximation of kinetic equations of neutron transport. Define the matrices \(B\) and \(D\) by the relations:
\[ B^{-1}= \begin{bmatrix} A_1 & 0\\ -A_3 & A_4 \end{bmatrix}, \qquad D= \begin{bmatrix} A_2^+ & 0\\ 0 & 0 \end{bmatrix}, \]
where \(A_2^+\) is the pseudoinverse matrix for the matrix \(A_2\) (see (2)). Then, with the choice \(\varphi^0=Bf\), the sequence \(\{\varphi^k\}\) of the iterative process (3) belongs to the set \(U^0=\{\varphi:A\varphi-f=D^+\psi,\ \psi\in V_n\}\). In this case \(U_A=\{\xi:\xi=D^+\varphi,\ \varphi\in V_n\}\) is a linear space and \(D \underset{U_A}{>} 0\). It can be shown that if \(f\in V_n\), then the iterative process (3) with the above-indicated choice of \(\varphi^0\) converges if and only if the corresponding matrix iterative process (5) converges, i.e. \(\rho(I-BA)<1\). (The latter occurs when, for example, \(A\) is an \(M\)-matrix (see \((3')\).) If
this is fulfilled, then \(0<m\leqslant \sigma(BA)\leqslant M\leqslant 1\)
\[ R_\infty^U(T)\geqslant -\ln\left[\frac{M-m}{M+m}\right] \geqslant -\ln\left[\frac{1-m}{1+m}\right] > R_\infty(I-BA)=-\ln(1-m). \]
Consequently, the iteration process (3) is asymptotically faster in the space \(U\) than the corresponding iteration process (5) in the space \(V_n\), and for \(M=1\)
\[ \lim_{m\to 0}\frac{R_\infty^U(T)}{R_\infty(I-BA)}\geqslant 2. \]
We note that, for the realization of the iteration process (3), knowledge of the matrix \(A_2^+\) is not necessary.
In conclusion, the author expresses deep gratitude to G. I. Marchuk for valuable advice and constant attention to the work.
Computing Centerof the Siberian Branch of the Academy of Sciences of the USSR Received
8 V 1968
REFERENCES
- D. K. Faddeev, V. N. Faddeeva, Computational Methods of Linear Algebra, Moscow, 1963.
- F. R. Gantmacher, The Theory of Matrices, Moscow, 1966.
- R. S. Varga, Matrix Iterative Analysis, New Jersey, 1963.
- L. V. Kantorovich, Uspekhi Mat. Nauk, 3, No. 2, 89 (1948).
- M. A. Krasnosel’skii, S. G. Krein, Mat. Sbornik, 31 (73), No. 2, 315 (1952).