Abstract
Full Text
UDC 517.948:513.88:518
MATHEMATICS
B. A. VERTEIM
APPROXIMATE SOLUTION OF NONLINEAR EQUATIONS AS A CONTROLLED PROCESS
(Presented by Academician L. V. Kantorovich on 4 II 1970)
For the solution of nonlinear operator equations, many approximate methods have recently been proposed and studied, most of them prompted by the fundamental works of L. V. Kantorovich (see \((^{1-4,14})\), where further references are given).
The abundance of methods brings to the fore the problem of the optimal choice among them—depending, of course, on the particular situation—and a certain uniformity of these methods creates convenient prerequisites for considering them jointly. The present note is devoted to the study of some aspects of this question, which is part of the general problem of optimal search—a problem known to be extremely difficult both in its formulation and from the point of view of its solution (see, on this matter, for example, \((^5)\)).
- Let us consider a certain class \(\mathfrak A\) of mappings \(P:\Omega \to Y,\ \Omega \subset X\), where \(X\) and \(Y\) are Banach spaces, \(\Omega\) is a given subset, and the corresponding class of operator equations
\[ P(x)=0,\qquad P\in\mathfrak A . \tag{1} \]
By a method of approximations we shall, for simplicity in this item, understand a mapping \(\Phi:\mathfrak A\times\Omega\to\Omega\), which to each \(P\in\mathfrak A\) and \(x\in\Omega\) assigns \(\Phi(P,x)\) (if \(x\) is the initial approximation to the solution of equation (1), then \(\Phi(P,x)\) is the next approximation). In the general case the formulation just given obviously needs refinement, since it may happen that not every \(x\in\Omega\) is suitable as an initial approximation; this is connected with the fact that usually the mapping \(\Phi\) is obtained as a composition of several operations, and the problem arises of effectively describing the domain of definition of \(\Phi\).
Let a family \((\Phi_i),\ i\in I\), of approximate methods be chosen (here \(I\) is an index set), and let there be given a certain mapping \(u:N\to I\) of the natural series \(N=\{0,1,2,\ldots\}\) into the set \(I\). Construct the sequence \(x_n\equiv x_n(u)\):
\[ x_{n+1}=\Phi_{u(n)}(P,x_n),\qquad n=0,1,2,\ldots . \tag{2} \]
The process (2) is carried out by alternating methods from the family \((\Phi_i)\) with the aid of the control \(u\). Suppose the cost \(c_i(n,u)\) of one iteration of the method \(\Phi_i\) at step \(n\) under the control \(u\) is known; usually \(c_i\) depends only on the values of \(u\) on the set \(\{0,1,\ldots,n-1\}\). The following problems naturally arise.
Problem A. Given: a concrete equation (1), its solution \(x^*\), an approximation \(x_0\), a number \(Q>0\), and a certain class \(K\) of controls \(u\). Find the number of iterations \(n\) (if it is not known from the description of the class \(K\)) and a control \(u\in K\) such that
\[ \sum_{k=0}^{n-1} c_{u(k)}(k,u)\le Q,\qquad \|x_n(u)-x^*\|\to \min . \tag{3} \]
Problem B. Under the conditions of Problem A, for a additionally prescribed number \(\Delta > 0\), find \(n\) and \(u \in K\) such that
\[ \|x_n(u)-x^*\|\leq \Delta,\qquad \sum_{k=0}^{n-1} c_{u(k)}(k,u)\to \min . \tag{4} \]
Since the solution of these problems for a concretely given \(P\) is, as a rule, more difficult than finding the root \(x^*\), the question of the stability of the characteristics of the optimal control \(u\) under variation of the operator \(P\) is of particular interest. On the other hand, modifications of the problem are possible in which the whole class \(\mathfrak A\) is considered and minimax criteria are used (see (5)).
2. Consider the following special case of Problem A, useful for its solution in the general situation. Let \(I=\{1,2\}\), \(\Phi_1\) be the basic, and \(\Phi_2\) the modified Newton–Kantorovich processes. For a given control \(u\), the set \(u^{-1}\{1\}=\{0,n_1,n_2,\ldots\}\) is composed of the indices for which \(P'(x_{n_i})\) is computed; we agree that \(n_1<n_2<\cdots\). In this case, for indices \(k\) such that \(n_i\leq k<n_{i+1}\), the modified process using \(P'(x_{n_i})\) is applied. Thus,
\[ x_{k+1}=x_k-[P'(x_{n_i})]^{-1}P(x_k),\qquad n_i\leq k<n_{i+1}, \tag{5} \]
(see \((^1,^9,^{11})\)). Consider the class \(\mathfrak A_0\) of operators \(P\) and equations of the form
\[ P(x)\equiv a-x+\lambda\varphi(x)=0,\qquad \varphi(0)=\varphi'(0)=0; \tag{6} \]
here the mapping \(\varphi:X\to Y\) has a sufficient number of derivatives in a neighborhood of the point \(a\in X\); \(\lambda:Y\to X\) is a continuous linear operator of small norm; \(a\) and \(\lambda\) are parameters. It can be shown that there exists a neighborhood \(S\) of the point \(a\) such that, for sufficiently small \(\|\lambda\|\), equation (6) has a unique solution in \(S\), \(x^*=x^*(\lambda)\in S\), to which the approximations of the Newton–Kantorovich process converge for any control \(u\) and any initial point \(x_0\in S\).
Problem \(A_0\). Let the class \(K_0\) of admissible controls for solving equation (6) by method (5) consist of controls, pertaining to an \(n\)-step process, in which the basic Newton process is applied exactly \(l\) times, i.e. \(u^{-1}\{1\}=\{0,n_1,n_2,\ldots,n_l\}\), \(l\) and \(n\) are prescribed, \(n>l>1\); the number \(Q\) in (3) is sufficiently large. It is required to find the indices \(n_i\) from the condition of minimizing \(\|x_n(u)-x^*\|\).
We first formulate a result concerning the case \(X=Y=R\).
Theorem 1. Suppose the following conditions are satisfied: \(\varphi'(a)\ne 0\),
\[ (\forall x)\ \varphi''(x)\ne 0,\qquad B^2|\varphi(a)\varphi''(a)|<1\quad (B\varphi'(a)=1). \]
Then there exists a number \(\delta\), depending only on \(\varphi,a,\lambda\), such that for \(|\lambda|<\delta\) the optimal control \(u\) of Problem \(A_0\) for equation (6) is determined by the relations
\[ n_i-n_{i-1}=\left[\frac{n+i-1}{l}\right]\qquad (n\geq 2l,\ i=1,2,\ldots,l;\ n_0=0); \tag{7} \]
\[ u^{-1}\{2\}=\{2l-n+1,\ 2l-n+3,\ldots,\ n-3,\ n-1\}\qquad (n<2l). \tag{8} \]
The proof is first carried out for \(l=2\), \(n=3\) and \(n=4\). Applying series expansions (in particular, the Lagrange series), we have
\[ x^*-x_3(0,1)=\tfrac12\varphi^3\varphi''\lambda^5+o(\lambda^5),\quad x^*-x_3(0,2)=\tfrac12\varphi^2\varphi'^2\varphi''\lambda^5+o(\lambda^5), \tag{9} \]
\[ x^*-x_4(0,1)=\tfrac12\varphi^4\varphi'''\lambda^7+o(\lambda^7),\quad x^*-x_4(0,2)=\tfrac12\varphi^3\varphi'^3\varphi''\lambda^8+o(\lambda^8). \]
Here \(\varphi\equiv\varphi(a)\), \(\varphi^{(i)}\equiv\varphi^{(i)}(a)\); in the notation, for example, \(x_3(0,1)\) indicates the indices of the basic process. Next, for \(l=2\), \(n\geq 4\), \(2n_1<n\), we show that a control \(u\) such that \(u^{-1}\{1\}=\{0,n_1-1\}\) is better than the control—
of \(u\), where \(u_1^{-1}\{1\}=\{0,n_1\}\). The general case of arbitrary \(l>2\) is studied by repeated application of the properties proved for \(l=2\).
Theorem 1 expresses the intuitively obvious tendency to apply first the basic process and then the modified process; moreover, this tendency is realized in a nonobvious way, since, for example, for \(n \ge 2l\) the iterations of both methods at first alternate. The above makes it possible to clarify one remark of A. M. Ostrovskii (see \((^3)\)) concerning the modified process, which in that book is called into question because of its comparatively slow convergence. It is interesting to note that in \((^3)\) an alternating method is considered from a different point of view than here, for which \(u^{-1}\{1\}=\{0,2,4,6,\ldots\}\), and it is shown that for approximations \(y_n \equiv x_{2n}\) this method is of third order. A similar fact for functional equations was obtained in \((^{12})\), where, however, overly restrictive sufficient convergence conditions are given. Let us note that the theorems of \((^{12})\), as well as \((^{11})\), admit generalizations to the class of equations studied in \((^8)\), i.e., to the case when \(P'\) satisfies a Hölder condition.
- Passing to the case of functional equations, we note that the small-parameter method leads here to results similar to Theorem 1, under certain additional assumptions, for example
\(\varphi''(h_1,h_2)=0 \Rightarrow h_1=h_2=0\), and with the kernels of the operators \(\varphi'\) and \(\lambda\) trivial. In the computations we use a certain analogue of the Lagrange series for the functional case. We give one more result for a problem of type \(A_0\), pertaining to equations of the form (1).
Theorem 2. Let the operator \(P\) in equation (1) be analytic, \(P(x^*)=0\), \(\Gamma^*=[P'(x^*)]^{-1}\), \(\|\Gamma^*\|<\infty\), \([P'']^{-1}\{0\}=(0,0)\). Then there exists a neighborhood \(U(x^*)\) of the point \(x^*\) such that, for problem \(A_0\), for any choice of the initial approximation \(x_0 \in U(x^*)\), the optimal control is determined by relations (7) or (8) (depending on the sign of the number \(n-2l\)).
The analyticity condition can be weakened. Similar results can also be obtained for the coupling of other methods, such as Chebyshev’s method, secants, tangent hyperbolas and parabolas \((^{2-7,13,14})\), as well as for methods of minimizing functionals.
On the basis of Theorems 1 and 2 one can solve problems A and B. One approach, for example, to problem A is as follows: we compile a table of the function
\(\psi(n,l)=|x^*-x_n|\) over some range of values of \(n\) and \(l\) for the model equation (6) with \(\varphi(x)=x^2\) (see \((^2)\)), where \(x_n\) is obtained in the course of process (5) under the optimal control characterized in Theorem 1, and then, by elementary means, solve the problem of minimizing \(\psi(n,l)\) subject to the condition (see (3))
\(c_1l+c_2(n-l)\le Q\); here the costs \(c_1\) and \(c_2\) are assumed not to depend on \(n\) and \(u\). One can also estimate the closeness of \(x_n\) to \(x^*\) by means of expansions of type (9) (without assuming \(\varphi(x)=x^2\)). In this case
\(\nu=\nu(n,l)\) is determined—the smallest exponent of the degree of \(\lambda\) entering with a nonzero coefficient into the expansion of \(x^*-x_n\)—and then, by simple enumeration, the problem of maximizing \(\nu(n,l)\) is solved subject to the condition \(c_1l+c_2(n-l)\le Q\).
For equations (1), in problem B the method of dynamic programming turns out to be applicable in the analysis of a difference system obtained by one of L. V. Kantorovich’s methods (direct or majorant) for the iterative process (2). The passage to a recurrent system is analyzed in the paper \((^{10})\), in whose abstract scheme no switching from one approximate method to another is provided. On the basis of works \((^2,{}^{10},{}^{14})\), one can construct a general transition scheme, convenient for analyzing such switchings, to a difference system that majorizes, in a certain sense, process (2). The solution, for example, of problem B for this majorant difference system in a number of typical cases gives an approximate solution of problem B for the original equation of type (1).
Novosibirsk State
University
Received
15 I 1970
References
- L. V. Kantorovich, Uspekhi Mat. Nauk, 3, No. 1, 89 (1948).
- L. V. Kantorovich, G. P. Akilov, Functional Analysis in Normed Spaces, Moscow, 1959.
- A. M. Ostrovskii, Solving Equations and Systems, IL, 1964.
- M. A. Krasnosel’skii, G. M. Vainikko et al., Approximate Solution of Operator Equations, “Nauka,” 1969.
- R. Bellman, S. Dreyfus, Applied Problems of Dynamic Programming, “Nauka,” 1965.
- G. S. Salekhov, Dokl. Akad. Nauk SSSR, 82, No. 4, 525 (1952).
- M. I. Nechepurenko, Uspekhi Mat. Nauk, 11, No. 2, 163 (1954).
- B. A. Vertgeim, Dokl. Akad. Nauk SSSR, 110, No. 5, 719 (1956).
- R. G. Bartle, Proc. Am. Math. Soc., 6, No. 5, 827 (1955).
- W. C. Rheinboldt, SIAM J. Numer. Anal., 5, No. 1, 42 (1968).
- J. E. Dennis, Jr., ibid., 6, No. 3, 493 (1969).
- W. E. Bosarge, Jr., P. Falb, JOTA, 4, No. 3, 156 (1969).
- H. Ehrmann, Arch. Rat. Mech. Anal., 4, No. 1, 45 (1959).
- L. Kollatz, Functional Analysis and Computational Mathematics, Moscow, 1969.