ON THE APPLICATION OF THE METHOD OF LEAST SQUARES TO NONLINEAR EQUATIONS
Unknown
Submitted 1962-01-01 | RussiaRxiv: ru-196201.15290 | Translated from Russian

Full Text

A. LANGENBACH

ON THE APPLICATION OF THE METHOD OF LEAST SQUARES TO NONLINEAR EQUATIONS

(Presented by Academician V. I. Smirnov, 9 XI 1961)

1. Suppose that in a Hilbert space \(H\) an equation is given

\[ Pu=f. \tag{1,1} \]

We regard the domain \(D_P\) of the operator \(P\) as a linear set, dense in the space \(H_1 \subset H\). The method of least squares consists in taking, as an approximate solution of equation (1,1), that linear combination \(u_n\) of given elements \(\varphi_1,\ldots,\varphi_n\) of the space \(H_1\) on which \(\|Pu_n-f\|_H\) attains a minimum. Until now the method of least squares has been applied chiefly to linear equations \((^1)\). In the present note we investigate the applicability of the method to the approximate solution of nonlinear equations.

First of all suppose that the range \(P(D_P)=R_P\) of the operator \(P\) is dense in \(H\), and that the system of coordinate functions forms a basis in \(H_1\) (condition (A)). Then there exists at least one sequence

\[ \{u_n\}\subset H_1,\qquad u_n=\sum_{k=1}^n a_k\varphi_k \tag{1,2} \]

such that \(\|Pu_n-f\|_H \to 0\) as \(n\to\infty\).

Theorem 1. Let condition (A) be fulfilled, and also the condition \(^*\)

\[ (Pu_1-Pu_2,u_1-u_2)_H \ge \gamma^2\|u_1-u_2\|_{H_1}^2 \tag{B} \]

for all \(u_1,u_2\in D_P\); moreover, assume that the norm in \(H_1\) is not weaker than the norm in \(H\):

\[ \frac{\|u\|_H}{\|u\|_{H_1}}\le K,\qquad u\in D_P. \tag{C} \]

Then the sequence (1,2) converges in \(H_1\).

Proof. From the convergence of \(Pu_n\) to \(f\) it follows that
\[ \|Pu_n-Pu_m\|_H \to 0 \quad (n,m\to\infty). \]
Applying condition (B) and Schwarz’s inequality to \(u_n,u_m\), we obtain
\[ \|u_n-u_m\|_{H_1}^2 \le \frac{1}{\gamma^2}\|Pu_n-Pu_m\|_H\|u_n-u_m\|_H. \]
Condition (C) then gives
\[ \|u_n-u_m\|_{H_1} \le \frac{K}{\gamma^2}\|Pu_n-Pu_m\|_H, \]
which is what was required to prove.

If the operator \(P\) has on \(D_P\) a linear Gâteaux differential, continuous on every straight line,
\[ P'(u)h=\frac{d}{dt}\,[P(u+th)]_{t=0}, \]
then (B) follows from the condition

\[ (P'(u)h,h)\ge \gamma^2\|h\|_{H_1}^2,\qquad u,h\in D_P. \tag{B'} \]

Indeed, by the definition of the Gâteaux differential,

\[ Pu_1-Pu_2= \int_0^1 P'\bigl(tu_1+(1-t)u_2\bigr)(u_1-u_2)\,dt. \tag{1,3} \]

Multiplying (1,3) scalarly by \(u_1-u_2\), interchanging the scalar product and integration along the line, we obtain (B) as a consequence of (B′).

\[ \text{* It is sufficient to require that condition (B) be fulfilled for those }u_1,u_2\in D_P \text{ for which }\|Pu_1-Pu_2\|_H\text{ is small.} \]

Suppose we know a method for constructing a minimizing sequence (1.2), and equation (1.1) is solvable*. Then Theorem 1 guarantees convergence of the approximations \(u_n\) to the solution \(u\) of equation (1.1) in the metric \(H_1\). Indeed, from condition (B) we obtain
\[ \frac{K}{\gamma^2}\|Pu-Pu_n\|_{H}\geq \|u-u_n\|_{H_1}. \]
Replacing \(Pu\) by \(f\), we obtain the error estimate
\[ \|u-u_n\|_{H_1}\leq \frac{K}{\gamma^2}\|f-Pu_n\|_{H}. \]

Let us now consider the case when equation (1.1) is not solvable in \(D_P\) for every \(f\in H\). We often encounter this situation in the theory of boundary-value problems. An operator \(P\) satisfying condition (B) has an inverse operator \(P^{-1}\) satisfying a Lipschitz condition. Indeed, let \(v_1,v_2\in R_P\). They correspond to unique elements \(u_1,u_2\in D_P\) such that \(Pu_1=v_1,\ Pu_2=v_2\). Just as in the proof of Theorem 1, we obtain the inequality
\[ \|u_1-u_2\|_{H_1}\leq K\frac{1}{\gamma^2}\|v_1-v_2\|_{H}, \]
which gives an estimate for the Lipschitz constant. The operator \(P^{-1}\), whose domain of definition \(R_P\) is dense in \(H\) by assumption, admits an extension to \(H\). This extension determines an extension of the original operator \(P\). The method of least squares may therefore be regarded as a method for extending the operator \(P\).

Let us now turn to the construction of the minimizing sequence (1.2). In doing so, we assume that the operator \(P\) has a linear Gateaux differential. We require that the functional \(\|Pu_n-f\|_H^2\) attain a minimum:
\[ \frac{\partial}{\partial a_k}\|Pu_n-f\|_H^2=0,\qquad k=1,2,\ldots,n. \tag{1.4} \]
We give expression (1.4) the form
\[ \begin{aligned} \lim_{\Delta a_k\to 0}\frac{1}{\Delta a_k}\{& (P(a_1\varphi_1+\cdots+(a_k+\Delta a_k)\varphi_k+\cdots+a_n\varphi_n)-f,\\ & P(a_1\varphi_1+\cdots+(a_k+\Delta a_k)\varphi_k+\cdots+a_n\varphi_n)-f)_H\\ &-(P(a_1\varphi_1+\cdots+a_k\varphi_k+\cdots+a_n\varphi_n)-f,\\ & P(a_1\varphi_1+\cdots+a_k\varphi_k+\cdots+a_n\varphi_n)-f)_H\}=0,\\ &\hfill k=1,2,\ldots,n. \end{aligned} \tag{1.5} \]
The operator \(P\) is continuous on every line; expanding the brackets and passing to the limit in the scalar products, we obtain the following system for determining the coefficients \(a_k\):
\[ (P'(u_n)\varphi_k,\,Pu_n-f)_H=0,\qquad k=1,2,\ldots,n. \tag{1.6} \]

Theorem 2. Let the operator \(P\) have a linear Gateaux differential, let conditions (B), (C) be satisfied, and let the system of coordinate functions \(\varphi_1,\ldots,\varphi_n\) be orthonormal in \(H_1\). Then the system (1.6) of the least-squares method is solvable for every \(n\).

Consider \(\|Pu_n\|_H\) as a function of the coefficients \(a_k\). According to (B), we have
\[ \|Pu_n\|_H\|u_n\|_H\geq (Pu_n,u_n)_H\geq \gamma^2\|u_n\|_{H_1}^2. \tag{1.7} \]
Condition (C) allows us to rewrite (1.7) in the form
\[ \|Pu_n\|_H\geq \frac{\gamma^2}{K}\|u_n\|_{H_1}. \tag{1.8} \]
The coordinate functions \(\varphi_1,\ldots,\varphi_n\) are orthonormal in \(H_1\), and consequently
\[ \|u_n\|_{H_1}=\left(\sum_{k=1}^n a_k^2\right)^{1/2}. \]
The functional \(\|Pu_n-f\|_H^2\) is a positive, continuous, and even differentiable function of the variables \(a_1,\ldots,a_n\) in every ball
\[ \sum_{k=1}^n a_k^2\leq R^2. \]
The inequality
\[ \|Pu_n-f\|_H^2\geq \|Pu_n\|_H\bigl(\|Pu_n\|_H-2\|f\|_H\bigr)+\|f\|_H^2 \]
with allowance for inequality (1.8) shows that this function is greater than, say, \(\|f\|_H^2\) on a sphere of sufficiently large—

* According to condition (B), there can exist no more than one solution of equation (1.1).

of radius \(R_f\). Consequently, \(\|Pu_n-f\|_H^2\) has a minimum in the ball

\[ \sum_{k=1}^{n} a_k^2 \leq R_f^2, \]

where the conditions (1.6) are satisfied. The theorem is proved.

It is interesting to compare the system (1.6) of the method of least squares with the systems of other direct methods of mathematical physics. The application of the Ritz method to equations of the form (1.1) was discussed in the work \((^2)\). If some operator \(Q\) has a linear differential \(Q'(u)h\) that is continuous on “planes” and that, together with (B′), also satisfies the symmetry condition

\[ (Q'(u)h_1,h_2)_H=(Q'(u)h_2,h_1)_H \tag{D} \]

for all \(h_1,h_2,u\) from \(D_Q\), then the equation \(Qu=f\) is put in correspondence with the energy functional

\[ \Phi(u)=\int_0^1 (Qtu,u)_H\,dt-(f,u)_H. \]

The Ritz system for this functional will be the system

\[ (Qu_n-f,\varphi_k)_H=0,\qquad k=1,2,\ldots,n. \tag{1.9} \]

The Galerkin method leads to the same system. The coincidence of the two systems was noted in \((^3)\). Comparing the systems (1.6) and (1.9), we arrive at the following conclusion:

Theorem 3. Let the conditions (B′) and (D) be satisfied with respect to the operator \(P\); \(f\in D_P\). Then the application of the method of least squares to equation (1.1) is equivalent to the application of the Ritz or Galerkin method to the equation

\[ P'(u)Pu+\bigl[P'(0)-P'(u)\bigr]f=P'(0)f. \tag{1.10} \]

Theorem 3 makes it possible to formulate convergence conditions for the method of least squares in terms of the well-studied Ritz and Galerkin methods.

2. Examples. Consider a bounded domain \(\Omega\) of \(n\)-dimensional space with smooth boundary \(S\). In \(\Omega\) the equation

\[ \sum_{i=1}^{n} -\frac{\partial}{\partial x_i}\,a_i(x_j,u_{x_j})+a(x_j,u)=f(x_j),\qquad j=1,2,\ldots,n, \tag{2.1} \]

is given, with sufficiently smooth coefficients, and the boundary condition

\[ u\big|_S=0. \tag{2.2} \]

Let equation (2.1) be elliptic, i.e.,

\[ \sum_{i,k=1}^{n}\frac{\partial a_i(x_j,u_{x_j})}{\partial u_{x_k}}\,\xi_i\xi_k \geq \mu^2\sum_{k=1}^{n}\xi_k^2,\qquad \mu>0, \tag{2.3} \]

and let the derivative \(\partial a(x_j,u)/\partial u\) be bounded below by a sufficiently large number \(k\) (not necessarily positive). The left-hand side of equation (2.1) defines an operator \(P\) on the manifold \(M\) of functions twice continuously differentiable in \(\overline{\Omega}+S\) and equal to zero on \(S\). \(P\) has a linear Gâteaux differential. By simple calculations we find, for \(u,h\in M\),

\[ (P'(u)h,h)_{L_2} = \int_{\Omega} \left[ \sum_{i,k=1}^{n} \frac{\partial a_i(x_j,u_{x_j})}{\partial u_{x_k}}\, h_{x_k}h_{x_i} + \frac{\partial a(x_j,u)}{\partial u}\,h^2 \right]d\Omega. \tag{2.4} \]

Taking into account Friedrichs’ inequality

\[ \int_{\Omega} h^2\,d\Omega \leq \frac{1}{\nu^2} \int_{\Omega}\sum_{i=1}^{n} h_{x_i}^2\,d\Omega \tag{2.5} \]

and the ellipticity condition (2.3), we obtain the estimate

\[ (P'(u)h,h)_{L_2} \geq \gamma^2 \int_{\Omega}\sum_{i=1}^{n} h_{x_i}^2\,d\Omega, \tag{2.6} \]

if \(\partial a(x_j,u)/\partial u\geq k\) and \(\mu^2+k/\nu^2\geq \gamma^2\) \((k<0)\), or when \(\gamma^2=\mu^2\) \((k\geq 0)\). The condition (B′) for the operator \(P\) is satisfied if, as the space \(\overset{\circ}{H}{}_1\), one takes the closure of the set \(M\) in the metric of the space \(W_2^{(1)}(\Omega)\). S. L. So-

Bol’s theorem (^4). Condition (C) follows from inequality (2.5). Condition (D), finally, is satisfied if the relations
\(\partial a_i(x_j,u_{x_j})/\partial u_{x_k}=\partial a_k(x_j,u_{x_j})/\partial u_{x_i}\), \(i,k=1,2,\ldots,n\), hold. Sufficient, and in a certain sense also necessary, conditions for the solvability of problem (2.1), (2.2) for smooth \(f(x_j)\) are given in (^5). These conditions are at the same time conditions for the completeness of \(R_P\) in \(L_2(\Omega)\) (condition (A)). We note that in (^2) the application of Ritz’s method to the solution of the problem of elastic-plastic torsion was investigated. This problem is a special case of problem (2.1), (2.2).

As a second example, let us consider the operator equation

\[ Qu=f, \tag{2.7} \]

in which the operator \(Q\) can be represented in the form

\[ Q=L+A, \tag{2.8} \]

where we assume that \(D_Q=D_L=D_A\) is dense in \(H\). Let, in the representation (2.8), \(L\) be a linear, symmetric, positive-definite operator:

\[ (Lu,u)_H \geq m^2\|u\|_H^2,\qquad u\in D_Q. \tag{2.9} \]

The scalar product \((Lu,v)_H=(u,v)_{H_1}\) defines the space \(H_1\). Let, finally,

\[ \gamma_1^2\|u_1-u_2\|_{H_1}\|h\|_{H_1} \geq (Au_1-Au_2,h)_H \geq \gamma_2^2\|u_1-u_2\|_{H_1}\|h\|_{H_1}, \tag{2.10} \]

\[ 1>\gamma_1>\gamma_2>0,\qquad u_1,u_2,h\in D_Q. \]

Equation (2.7) can be represented in the space \(H_1\) in the form

\[ (Lu,h)_H+(Au,h)_H=(f,h)_H, \tag{2.11} \]

or, by the Riesz theorem, by the equation

\[ \overline{Q}u=u+\overline{A}u=\overline{f} \tag{2.12} \]

with a contraction operator

\[ \|\overline{A}u_1-\overline{A}u_2\|_{H_1}\leq \gamma_1^2\|u_1-u_2\|_{H_1}. \tag{2.13} \]

Equation (2.12), as is known, is solvable for any \(\overline{f}\in H_1\), if we first extend the operator \(\overline{A}\) to the whole space \(H_1\). The extended operator, obviously, also satisfies (2.13). Hence it follows, among other things, that \(R_{\overline{Q}}\) is dense in \(H_1\) (condition (A)). Indeed, let \(f_0\) be an arbitrary element of \(H_1\), \(u_0\in H_1\) the corresponding solution of equation (2.12) (this solution is unique), and \(\{u_n\}\in D_{\overline{Q}}=D_Q\) a sequence defining the element \(u_0\): \(\|u_n-u_0\|_{H_1}\to 0\). From (2.13) it follows that

\[ \|\overline{Q}u_n-f_0\|_{H_1} =\|\overline{Q}u_n-\overline{Q}u_0\|_{H_1} =\|(I+\widetilde{A})u_n-(I+\widetilde{A})u_0\| \leq 2\|u_n-u_0\|_{H_1}, \]

where \(\widetilde{Q},\widetilde{A}\) denote the extensions of the operators \(\overline{Q},\overline{A}\). Condition (B) follows from (2.10) with \(h=u_1-u_2\); condition (C) is trivial, since equation (2.12) is considered only on \(H_1\).

Many problems of the theory of elastic-plastic deformations lead to equations of the type considered. In (^6) the contraction operator appears owing to the hypothesis of saturation of the stress field. More general is the hypothesis of the finiteness of the variable modulus of elasticity, which leads to a contraction operator also in problems formulated in stresses.

Humboldt University
Berlin, GDR

Received
1 XI 1961

REFERENCES

  1. S. G. Mikhlin, Variational Methods in Mathematical Physics, Moscow, 1957.
  2. A. Langenbach, Wiss. Zs. Humboldt-Univ. Berlin, Math. Nat. R., IX (1959/60).
  3. L. N. Gagen-Torn, S. G. Mikhlin, DAN, 138, No. 2 (1961).
  4. S. L. Sobolev, Some Applications of Functional Analysis in Mathematical Physics, Leningrad, 1950.
  5. O. A. Ladyzhenskaya, N. N. Ural’tseva, Uspekhi Mat. Nauk, 16, no. 1 (1961).
  6. I. I. Vorovich, Yu. P. Krasovskii, DAN, 126, No. 4 (1959).

Submission history

ON THE APPLICATION OF THE METHOD OF LEAST SQUARES TO NONLINEAR EQUATIONS