I. M. GLAZMAN
Unknown
Submitted 1965-01-01 | RussiaRxiv: ru-196501.19120 | Translated from Russian

Abstract

Full Text

I. M. GLAZMAN

RELAXATION ON SURFACES WITH SADDLE POINTS

(Presented by Academician L. V. Kantorovich, 27 X 1964)

  1. The present note is devoted to the construction of algorithms for minimizing functionals \(\Phi(\mathbf{x})\) of a vector \(\mathbf{x}(x_1, x_2, \ldots, x_p)\) in Euclidean space \(\mathcal{E}_p\) that possess saddle points.

The principal one of these algorithms, \(\mathfrak{R}_1\), is universal for the class \(S\) of all functionals \(\Phi(\mathbf{x})\) having the following properties:
\(1^\circ\). \(\Phi(\mathbf{x})\) is three times continuously differentiable.
\(2^\circ\). \(\Phi(\infty)=\infty\).
\(3^\circ\). \(\Phi(\mathbf{x})\) has a finite number of stationary points.
\(4^\circ\). At stationary points, \(d^2\Phi(\mathbf{x})\) is not a degenerate sign-constant quadratic form.

An example of a functional from the class is the function

\[ \Phi(x,y)=|P(x+iy)|^2, \tag{1} \]

where \(P(z)\) is a polynomial of degree \(n\) with complex coefficients, under the assumption that \(P(z)\) and \(P'(z)\), as well as \(P'(z)\) and \(P''(z)\), have no common roots. Obviously, the functional (1) has altogether \(2n-1\) stationary points: \(n\) minima (at the roots of \(P(z)\)) and \(n-1\) saddle points (at the roots of \(P'(z)\)).

When applied to \(\Phi(\mathbf{x})\in S\), the constructed algorithm, for any initial approximation \(\mathbf{x}_0\), leads to one of the minima of \(\Phi(\mathbf{x})\), and in the case of uniqueness of the minimum—to the least value of the functional \(\Phi(\mathbf{x})\).

The rate of convergence of the principal algorithm \(\mathfrak{R}_1\), as well as of the algorithm \(\mathfrak{R}_2\) indicated below in Sec. 2, is no less than the rate of convergence of a certain geometric progression.

  1. For functionals (1) it is possible to dispense with requirement \(4^\circ\), since the assumption that \(P'(z)\) and \(P''(z)\) have no common roots is dropped. Thus one obtains the construction of a universal algorithm \(\mathfrak{R}_2\) for solving algebraic equations by the method of descent.

Existing methods of descent on the surface of the square of the modulus of a polynomial (see, for example, \((^2)\)) do not exhaust the question so long as their convergence has not been proved. In a recent investigation \((^3)\) a new descent algorithm was constructed for the functional (1), and its convergence was established if, for the initial approximation \(z_0\), the value \(|P(z_0)|\) is less than the value of \(|P(z)|\) at any saddle point. In the general case the convergence of this algorithm should be regarded as not clarified, since the convergence theorem established in \((^3)\) in one point does not correspond to the algorithm described there.*

  1. The algorithms described below are connected with the use of infinite controlling sequences. The use of such sequences appears unavoidable even in the simplest case of a single stationary point (see \((^1)\)) and is due to the fact that the definition of the class \(S\) contains only qualitative characteristics of \(\Phi(\mathbf{x})\), while quantitative characteristics, such as, for example, an upper bound for the norm of the Hessian matrix \((\partial^2\Phi/\partial x_j\partial x_k)\) in a bounded domain, are different for different \(\Phi(\mathbf{x})\in S\).

* Namely, upon entering a neighborhood of a saddle point, the algorithm in \((^3)\) provides for moving to an arbitrary point of a certain circle on which the value of \(|P(z)|\) is less than at the center (which is accomplished by a finite number of trials), whereas in proving convergence in \((^3)\) it is assumed that the move is made to the point with the minimal value of \(|P(z)|\) on the aforementioned circle.

and are not assumed to be known. However, the infinite control sequences constructed in (1) and here are such that, when the algorithm is applied to each given functional \(\Phi(x) \in S\), only their finite part “fires,” the length of which depends on \(\Phi\).

In machine realization of the algorithms of the present note, the efficiency of the computations will depend on a successful choice of control sequences. To this end, a preliminary “tuning” of the algorithm will be required, in which adaptation of the universal algorithm to the given computer and \(\Phi(x)\) will be achieved.

  1. Passing to the construction of the main algorithm \(\mathfrak R_1\), let us denote by \(\mathfrak R\) the gradient relaxation algorithm of note (1). In addition to the control sequence of the algorithm \(\mathfrak R\) (see (1)), we introduce two more control sequences \(Q_\delta\) and \(Q_\rho\).

The algorithm \(\mathfrak R_1\) consists in carrying out successive gradient steps of the algorithm \(\mathfrak R\), except for the appearance of an indication of a special step. The sequence \(Q_\delta\) signals the necessity of checking the indication of a special step. As \(Q_\delta\) one may take any sequence \(\delta_k \downarrow 0\), and as \(Q_\rho\), any sequence \(\rho_k > 0\) for which zero is a limit point of its limit points. The sequence \(Q_\rho\) controls the choice of the length of the special step; and to determine the direction of the special step, one should select a certain finite set \(\Sigma\) of methods \(T\) for completing squares when reducing, by Lagrange’s method, a quadratic form in \(p\) variables with literal coefficients to canonical form. The set of computational schemes \(\Sigma\) must be complete in the sense that at least one of the schemes \(T \in \Sigma\) must be applicable to any quadratic form in \(p\) variables with numerical coefficients.

The algorithm \(\mathfrak R_1\) is as follows.

1) We compare \(|\nabla \Phi(x_k)|\) with the current \(\delta \in Q_\delta\). If \(|\nabla \Phi(x_k)| \geq \delta\), then we perform a gradient step according to \(\mathfrak R\). In the contrary case we check the indication of a special step.

2) The indication of a special step is the sign-indefiniteness of the form \(d^2 \Phi(x_k)\). If there is no indication, then we perform a gradient step according to \(\mathfrak R\).* If there is an indication, then we delete the number \(\delta\) from \(Q_\delta\) and test, for applicability to \(d^2 \Phi(x_k)\), all schemes \(T \in \Sigma\). From the schemes applicable to \(d^2 \Phi(x_k)\) we choose the optimal one (or one of the optimal ones). A scheme \(T\) is regarded as optimal if, among all denominators appearing in the process of transforming the form \(d^2 \Phi(x_k)\) by means of \(T\), the smallest takes the largest value.

3) We realize on the form \(d^2 \Phi(x_k)\) the scheme \(T\), select the negative coefficient of greatest absolute value (if there are several such, then the first of them) and, setting equal to zero all variables except the variable multiplied by this coefficient, find a unit vector \(\vec{\tau}\) of the direction of the special step so that \((\nabla \Phi(x_k), \vec{\tau}) \leq 0\).

4) We test the next multiplier \(\rho\) from \(Q_\rho\) for relaxation. This means checking the inequality

\[ \Phi(x_k+\rho\gamma \vec{\tau}_k) < \Phi(x_k), \tag{2} \]

where \(\gamma\) is the length of the first special step, obtained with the aid of a finite number of tests for relaxation. If the next multiplier \(\rho \in Q_\rho\) does not satisfy inequality (2), then we pass to the next multiplier, and so on, until (2) is satisfied. The multiplier \(\rho\) thus found is deleted from the sequence \(Q_\rho\), and the next step is carried out according to the formula \(x_{k+1}=x_k+\rho\gamma \vec{\tau}_k\).

Theorem 1. For any functional \(\Phi(x) \in S\), under an arbitrary approximation \(x_0 \in \mathscr E_p\), the sequence \(x_k\), constructed by the algorithm \(\mathfrak R_1\),

* Either an arbitrary relaxation step in any direction (if \(\nabla \Phi(x_k)=0\) and the form \(d^2\Phi(x_k)\) is negative), or stopping (if \(\nabla \Phi(x_k)=0\) and the form \(d^2\Phi(x_k)\) is positive).

converges to one of the points \(x\) at which \(\Phi(x)\) has a local minimum.

  1. Turning to the functionals (1), we first free the polynomial from multiple roots and after this determine the maximal multiplicity \(r\) of the roots of its derivative. In algorithm \(\mathfrak{R}_2\), instead of \(Q_\delta\) and \(\mathfrak{L}\), one uses the sequence \(Q_\tau\) of \(r\)-dimensional vectors \(\vec{\varkappa}_k\) with coordinates \((\varkappa_k^{(1)}, \varkappa_k^{(2)}, \ldots, \varkappa_k^{(r)})\), where \(\varkappa_k \downarrow 0\), \(\varkappa_k^{(s)}=[\varkappa_k^{(1)}]^{\beta_s}\), \(\beta_s=\alpha^{s-1}\) and \(\alpha>2\). With such a choice of the controlling sequence \(Q_\tau\), the directions of special steps \(\tau_k\), beginning with some \(k\), will correspond to the degree of degeneration of the saddle point lying near \(z_k\). Put also \(u_j=|P'|+|P''|+\cdots+|P^{(r+1-j)}|\) and \(u=(u_1,u_2,\ldots,u_r)\).

Algorithm \(\mathfrak{R}_2\) consists in the following.

1) Compare \(u\) with the next \(\vec{\varkappa}\in Q_\tau\). If* \(u\geq \varkappa\), then perform a gradient step according to \(\mathfrak{R}\) (now \(\nabla\Phi=2P\overline{P'}\)). Otherwise, check the criterion for a special step.

2) The criterion for a special step is the inequality \(|P'|^2<|PP''|\). If the criterion is absent, perform a gradient step according to \(\mathfrak{R}\) (or stop if \(P=0\)). If the criterion is present, find the minimal \(j=j_0\) for which \(u_j<\varkappa^{(j)}\), remove \(\vec{\varkappa}\) from \(Q_\tau\), and pass to the determination of \(\tau\).

3) If \(j_0=r\), then choose \(\tau\) collinear with \(|P\overline{P''}|-P\overline{P''}\), so that \((P\overline{P'},\tau)\leq 0\). If \(j_0<r\), put \(\omega=\arg P\overline{P}^{(2+r-j_0)}\) and, as \(\tau\), choose from the \((2+r-j_0)\) vectors
\[ \exp i[\omega+(2s-1)\pi]/(2+r-j_0) \]
\((s=1,2,\ldots,2+r-j_0)\) the one least deviating from \(-P\overline{P'}\).

4) Test the next multiplier \(\rho\in Q_\rho\) for relaxation by means of relation (2), and then proceed as in 4) of item 4. The multiplier \(\rho\) ultimately found is removed from \(Q_\rho\), and the next step is carried out by the formula \(z_{k+1}=z_k+\rho\gamma\tau_k\).

Theorem 2. For any polynomial \(P(z)\) having no multiple roots, and for an arbitrary initial approximation \(z_0\), the sequence \(z_k\) constructed by algorithm \(\mathfrak{R}_2\) converges to one of the roots of \(P(z)\).

  1. For the proof of Theorems 1 and 2 it is first shown that the criterion for a special step can appear only a finite number of times. Therefore, beginning with some step, algorithms \(\mathfrak{R}_1\) and \(\mathfrak{R}_2\) pass into \(\mathfrak{R}\), after which convergence is established as in (1).

Thus, the controlling sequences from algorithms \(\mathfrak{R}_1\) and \(\mathfrak{R}_2\), intended a priori for infinitely repeated use, need be invoked a posteriori only a finite number of times. For a given \(\Phi(x)\in S\), this number does not depend on the number of steps realized by algorithms \(\mathfrak{R}_1\) and \(\mathfrak{R}_2\). It can be shown that the controlling sequence of algorithm \(\mathfrak{R}\) is of the same kind, whence follows

Theorem 3. The rate of convergence of minimizing sequences constructed by algorithms \(\mathfrak{R}_1\) and \(\mathfrak{R}_2\) is not less than the rate of convergence of some geometric progression.* *

Physical-Technical Institute of Low Temperatures
Academy of Sciences of the USSR

Received
20 X 1964

REFERENCES

  1. I. M. Glazman, DAN, 154, No. 5 (1964).
  2. J. N. Lance, Numerical Methods for High-Speed Computers, IL, 1962.
  3. V. V. Voevodin, Journal of Computational Mathematics and Mathematical Physics, 2, No. 2, 187 (1962).

* The inequality \(u\geq\vec{\varkappa}\) means that \(u_j\geq \varkappa^{(j)}\) for all \(j\).

** The main difficulties in the proof of this theorem were overcome by Yu. I. Lyubich.

Submission history

I. M. GLAZMAN