Full Text
MATHEMATICS
I. Petersen
ON THE CONVERGENCE OF GRADIENT METHODS FOR FINDING A LOCAL CONSTRAINED MINIMUM OF A NONLINEAR FUNCTIONAL UNDER LINEAR CONDITIONS IN HILBERT SPACE
(Presented by Academician A. A. Dorodnitsyn, 23 I 1963)
Let \(f(x)\) be a functional defined in some convex domain \(D\) of a real Hilbert space \(H\), and let \(\pi\) be a plane in \(H\) specified by the equations
\[ (c^i, x)=b_i,\qquad i=1,2,\ldots,m, \tag{1} \]
and having a nonempty intersection with the domain \(D\). We assume that the elements \(c^i\) are orthonormal. Denote
\[ f'(x)y=(\nabla f(x),y),\qquad f''(x)yz=(H_f(x)y,z). \tag{2} \]
If \(x_0\in D\cap\pi\) is an initial approximation to a constrained local minimum of the functional \(f(x)\) under conditions (1), then it is natural to seek the point \(x^*\) at which this minimum is attained by moving near the curve of “steepest descent,” which is defined by the differential equation
\[ x'(t)=-\nabla f(x(t))+\sum_{i=1}^{m}(\nabla f(x(t)),c^i)c^i \tag{3} \]
and the initial condition \(x(0)=x_0\).
Theorem 1. If the functional \(f(x)\) is twice continuously differentiable in the convex domain \(D\cap\pi\), \(x_0\in\pi\), the closed sphere \(S[x_0,r]\cap D\), where \(r\geq MG(x_0)\), and
\[ G^2(x)=\|\nabla f(x)\|^2-\sum_{i=2}^{m}(\nabla f(x),c^i)^2, \tag{4} \]
\[ \nabla f(x) \]
satisfies the Lipschitz condition in the sphere \(S[x_0,r]\cap\pi\), and for any \(x\in D\cap\pi\) and \(y\in H\) the condition
\[ (H_f(x)y,y)\geq \frac{1}{M}\|y\|^2\qquad (M>0) \tag{5} \]
is fulfilled, and \(x(t)\) is a solution of equation (3), then for \(t\geq 0\)
\[ x(t)\in S[x_0,r]\cap\pi, \tag{6} \]
the limits exist
\[ \lim_{t\to\infty}x(t)=x^*,\qquad \lim_{t\to+\infty}f(x(t))=c, \tag{7} \]
and for \(t>0\) the estimates hold
\[ \|x(t)-x^*\|\leq MG(x_0)\exp\left(-\frac{t}{M}\right), \tag{8} \]
\[ 0\leq f(x(t))-c\leq \frac{M}{2}G^2(x_0)\exp\left(-\frac{2t}{M}\right), \tag{9} \]
and for all \(x\in D\cap\pi\)
\[ f(x)\geq c+\frac{\|x-x^*\|}{2M}. \tag{10} \]
In the case of an unconstrained minimum \((m=0)\), Theorem 1 reduces to the main theorem in the work \((^1)\) of P. S. Rosenbloom.
The connection established by Theorem 1 between the solution of differential equation (3) and the point \(x^*\) of the conditional minimum, with linear conditions, of the functional \(f(x)\) makes it possible to construct and investigate for convergence various methods of approximate determination of the point \(x^*\), based on methods of approximate solution of differential equation (3).
We introduce the notation: \(p(\delta)\) is a polynomial of degree \(s-2\) with positive coefficients,
\[ q(h,\delta)=(m+1)Ah^s p(\delta)+\exp\left(-\frac{h}{M}\right), \tag{11} \]
\[ r(h,\delta)= \frac{h^s p(\delta)+M\left[1-\exp\left(-\frac{h}{M}\right)\right]} {1-q(h,\delta)}\,\delta, \tag{12} \]
and the sequence \(\{\delta_n\}\) is defined by the formula
\[ \delta_{n+1}=q(h,\delta_n)\delta_n \qquad (n=0,1,\ldots). \tag{13} \]
For one-step methods we have the following result:
Theorem 2. Let the functional \(f(x)\) be twice continuously differentiable in the convex domain \(D\cap\pi\), \(x_0\in\pi\), and let the closed sphere \(S[x_0,r(h,\delta_0)]\subset D\), where \(\delta_0\geqslant G(x_0)\); suppose that \(\nabla f(x)\) satisfies, in the ball \(S[x_0,r(h,\delta_0)]\cap\pi\), the Lipschitz condition with constant \(A\), and that in \(D\cap\pi\) condition (5) is fulfilled. Further, let \(h^*\) be the positive root of the equation
\[ q(h,\delta_0)=1, \tag{14} \]
let \(h\) be some number, \(0<h<h^*\), and let the sequence of elements of the plane \(\pi\) be constructed by the recurrence formula
\[ x_{n+1}=F(x_n,h) \qquad (n=0,1,\ldots), \tag{15} \]
where for all \(n\) the condition
\[ \|F(x_n,h)-x_n(h)\|\leqslant h^s p(\|G(x_n)\|)\|G(x_n)\|,\qquad s\geqslant 2, \]
is satisfied, where \(x_n(t)\) is the solution of differential equation (3) under the initial condition \(x_n(0)=x_n\).
Then \(x_n\in S[x_0,r(h,\delta_0)]\cap\pi\), and the sequence \(\{x_n\}\) converges to the unique conditional minimum \(x^*\) of the functional \(f(x)\) in the domain \(D\) under conditions (1), with rate
\[ \|x_{n+1}-x^*\|\leqslant \left[h^s p(\delta_n)+M\exp\left(-\frac{h}{M}\right)\right]\delta_n. \tag{16} \]
As examples of the application of Theorem 2, we give two theorems corresponding to the approximate solution of equation (3) by Euler’s method and by the improved Euler–Cauchy method.
Theorem 3. If the functional \(f(x)\) is twice continuously differentiable in the convex domain \(D\cap\pi\), \(x_0\in\pi\), the closed sphere \(S[x_0,r]\subset D\), where
\[ r\geqslant \frac{2M[1-\exp(-h/M)]+(m+1)Ah^2} {2[1-\exp(-h/M)]-(m+1)^2A^2h^2}\,G(x_0), \tag{17} \]
in the ball \(S[x_0,r]\cap\pi\) condition (5) and the condition
\[ \|H_f(x)\|\leqslant A \tag{18} \]
are fulfilled, and \(h\) satisfies the inequality
\[ 0<h<\frac{2M}{1+(m+1)^2A^2M^2}, \tag{19} \]
then the sequence
\[ x_{n+1}=x_n-h\left[\nabla f(x_n)-\sum_{i=1}^{m}(\nabla f(x_n),c^i)c^i\right] \tag{20} \]
converges to the unique conditional minimum \(x^*\) of the functional \(f(x)\) in the domain \(D\) under conditions (1) with the rate
\[ \|x_{n+1}-x^*\|\leq \left[\frac{m+1}{2}Ah^2+M\exp\left(-\frac{h}{M}\right)\right] \left[\frac{(m+1)^2}{2}A^2h^2+\exp\left(-\frac{h}{M}\right)\right]^n G(x_0). \tag{21} \]
Theorem 4. Let the functional \(f(x)\) be twice continuously differentiable in the convex domain \(D\cap\pi\), \(x_0\in\pi\), and let the closed sphere \(S[x_0,r]\subset D\), where
\[ r\geq \frac{12M[1-\exp(-h/M)]+h^3\left[5(m+1)ABG(x_0)+2(m+1)^2A^2\right]} {12[1-\exp(-h/M)]-(m+1)Ah^3\left[5(m+1)ABG(x_0)+2(m+1)^2A^2\right]} \,G(x_0), \tag{22} \]
and suppose that in the ball \(S[x_0,r]\cap\pi\) conditions (5) and (18) are satisfied, and \(f(x)\) is three times continuously differentiable in the ball \(S[x_0,r+hG(x_0)]\cap\pi\), and moreover in this ball
\[ \|H'_f(x)\|\leq B. \tag{23} \]
If, in addition, \(h\) satisfies the inequality
\[ 0<h< \frac{2\sqrt{3}\,M} {2.118117+\sqrt{5(m+1)ABM^3G(x_0)+2(m+1)^3A^3M^3-2}}, \tag{24} \]
then the sequence
\[ x_{n+1}=x_n-\frac{h}{2} \left[ \nabla f(x_n)+\nabla f(y_n) -\sum_{i=1}^{m}(\nabla f(x_n)+\nabla f(y_n),c^i)c^i \right], \tag{25} \]
where
\[ y_n=x_n-h\left[ \nabla f(x_n)-\sum_{i=1}^{m}(\nabla f(x_n),c^i)c^i \right], \tag{26} \]
converges to the unique conditional minimum \(x^*\) of the functional \(f(x)\) in the domain \(D\) under conditions (1) with the rate
\[ \|x_{n+1}-x^*\|\leq \left[ h^3\left(\frac{5}{12}B\delta_n+\frac{(m+1)^2}{6}A^2\right) +M\exp\left(-\frac{h}{M}\right) \right]\delta_n, \tag{27} \]
where \(\delta_0\geq G(x_0)\) and
\[ \delta_k= \frac{5}{12}(m+1)h^3AB\delta_{k-1}^{\,2} + \left[ \frac{(m+1)^2}{6}h^3A^3+\exp\left(-\frac{h}{M}\right) \right]\delta_{k-1}. \tag{28} \]
Analogous methods and theorems on their convergence can be obtained by starting from Runge—Kutta methods of higher orders. If in Theorems 2—4 one takes \(m=0\) \(\left(\sum_{i=1}^{m}=0\right)\), then one obtains theorems on the convergence of methods for the approximate finding of unconditional local minima of the functional \(f(x)\).
Institute of Cybernetics
Academy of Sciences of the Estonian SSR
Received
21 I 1963
CITED LITERATURE
- P. S. Rosenbloom, Numerical Analysis, Proc. Symposia in Appl. Math., 6, N. Y., 1956.