Reports of the Academy of Sciences of the USSR
MATHEMATICS
Submitted 1967-01-01 | RussiaRxiv: ru-196701.96945 | Translated from Russian

Full Text

Reports of the Academy of Sciences of the USSR
1967. Volume 173, No. 4

UDC 512.25/26+519.3:330.115

MATHEMATICS

I. I. EREMIN

THE “PENALTY” METHOD IN CONVEX PROGRAMMING

(Presented by Academician A. I. Maltsev on 6 VI 1966)

The article is devoted to establishing estimates relating the optimal solutions and optimal values of the problems
\[ \max_{x\in M_0\cap M} (c,x) \]
(the original problem) and
\[ \sup_{x\in M}\bigl[(c,x)-\Phi(x,M_0,\mathbf K)\bigr], \]
where \(M_0\) and \(M\) are convex sets in \(R^n\), defined by systems of convex inequalities; \(\mathbf K\) is a vector-parameter; and \(\Phi\) is a function, chosen in one way or another, reflecting the measure of the “penalty” for violation of the constraints defining the set \(M_0\). The meaning of considering these estimates is that, for certain constructions of the function \(\Phi\), the optimal values of the indicated problems either coincide (for some fixed \(\mathbf K\)), or converge under certain regimes of variation of the vector \(\mathbf K\).

The idea of the “penalty” method is natural and has been used, in one form or another, in some earlier works as well. It appears, for example, in the procedure of the direct differential gradient method \((^1)\). Connected with this idea are methods of asymptotic reduction of convex programming problems to unconstrained extremum problems (or the corresponding numerical realizations) \((^{2-6})\). The use of the “penalty” method for solving a transportation problem with prohibitions is found in \((^7)\). A mathematical justification of the applicability of this method to solving optimal-control problems is given in \((^8)\).

In the present note, an estimative approach is carried out toward establishing the relations that exist between the problems under consideration. Further, unlike the method of asymptotic reduction of a convex programming problem to an unconstrained-extremum problem, here, in constructing the function \([(c,x)-\Phi(x,M_0,\mathbf K)]\), provision is made for the possibility of taking into account an arbitrary part of the constraints from the original system, while the optimization is carried out on the set of solutions of the remaining subsystem. This thereby determines the possibility of applying certain special algorithms for solving optimization problems with “almost” special systems of constraints. The approach described can also be used for solving large-scale problems.

I. Problem C. Find \(\max (c,x)\) subject to the constraints
\[ f_j(x)\leqslant 0\quad (j\in J_1);\qquad f_j(x)=0\quad (j\in J_2), \tag{1} \]
where \(J_1\cup J_2=\overline{1,l}\); the \(f_j(x)\) are convex differentiable functions defined on \(R^n\), with \(f_j(x)\) linear for \(j\in J_2\). Let \(\overline{1,l}=S_0\cup S\), \(S_0\cap S=\varnothing\); \(K_j>0\) \((j\in S_0)\).

Problem C\(_1\). Find
\[ \sup\bigl[(c,x)-\Phi_1(x,\mathbf K)\bigr], \]
\[ \Phi_1(x,\mathbf K)= \sum_{J_1\cap S_0} K_j\delta_j(x)f_j^2(x) + \sum_{J_2\cap S_0} K_j f_j^2(x) \]
subject to the constraints
\[ f_j(x)\leqslant 0\quad (j\in J_1\cap S);\qquad f_j(x)=0\quad (j\in J_2\cap S); \tag{2} \]

here

\[ \delta_j(x)= \begin{cases} 1, & f_j(x)>0,\\ 0, & f_j(x)\leqslant 0. \end{cases} \]

We shall assume that problem C satisfies the regularity condition (9).

Introduce the notation: \(M\) is the set of solutions of the system (2); \(\widetilde m\) and \(\widetilde m_1(K)\) are the optimal values, respectively, of problems C and C\(_1\); \(\widetilde x\) and \(\widetilde x_1(K)\) are their optimal solutions.

If \(\max (c,x)\) in problem C exists and is attained (at the point \(\widetilde x\)), then, by the regularity condition, there will be numbers \(u_j^0\) (dual estimates) such that

\[ u_j^0\geqslant 0\ (j\in J_1),\qquad c=\sum_1^l u_j^0\nabla f_j(\widetilde x),\qquad \widetilde m=\sum u_j^0\bigl[(\nabla f_j(\widetilde x),\widetilde x)-f_j(\widetilde x)\bigr]; \tag{3} \]

here \(\nabla\) is the gradient symbol.

Theorem 1. Under certain conditions the following assertions are valid:

1) if the optimal set of problem C is nonempty and bounded, then problem C\(_1\) also has this property;

2)

\[ \widetilde m\leqslant \widetilde m_1(K)\leqslant \widetilde m+\sum_{S_0}\frac{(u_j^0)^2}{4K_j}; \]

3)

\[ \max_j\{\delta_j(\overline x_1)f_j(\overline x_1),\ j\in J_1;\ |f_j(\overline x_1)|,\ j\in J_2\} \leqslant \frac{\sqrt2+1}{2K}\,\|u^0\|, \quad \text{where } \|u^0\|= \]

\[ =\left[\sum_{S_0}(u_j^0)^2\right]^{1/2}, \qquad K=\min_j K_j,\quad \overline x_1=\overline x_1(K); \]

4)

\[ \bigl|(c,\overline x_1)-\widetilde m\bigr| \leqslant |S_0|\frac{\sqrt2+2}{2K}\,\|u^0\|^2 \quad \text{for } K_j=K>0, \]

where \(|S_0|\) is the number of elements of the set \(S_0\).

Of the listed assertions we shall prove 2).

Let \(F_1(x,K)=(c,x)-\Phi_1(x,K)\). Since \(F_1(\widetilde x,K)=(c,\widetilde x)=\widetilde m\) and \(\widetilde x\in M\), we have

\[ \widetilde m_1(K)=\sup_{x\in M}F_1(x,K)\geqslant \widetilde m. \]

Next, let \(x'\in M\), \(\gamma_j=(\nabla f_j(x'),x'-\widetilde x)+f_j(\widetilde x)\). Note the obvious relations:

\[ \gamma_j\leqslant f_j(x')\quad \text{(by the convexity of } f_j(x)),\qquad \gamma_j=f_j(x)\quad \text{for } j\in J_2; \tag{4} \]

\[ \sum_{J_1\cap S}u_j^0\gamma_j\leqslant 0,\qquad \sum_{J_2\cap S}u_j^0\gamma_j=0,\qquad \sum_{J_2\cap S_0}u_j^0\gamma_j= \sum_{J_2\cap S_0}u_j^0f_j(x'). \]

Taking (3) and (4) into account, we obtain

\[ F_1(x',K)=(c,x')-\Phi_1(x',K) =\widetilde m+\sum_1^l u_j^0\gamma_j-\Phi_1(x',K) \leqslant \widetilde m-\Phi_1(x',K)+ \]

\[ +\sum_{J_1\cap S_0}u_j^0 f_j(x') +\sum_{J_2\cap S_0}^{*}u_j^0 f_j(x') =\widetilde m -\sum_{J_1\cap S_0\cap S(x')} \left[\sqrt{K_j}f_j(x')-\frac{u_j^0}{2\sqrt{K_j}}\right]^2 + \]

\[ +\sum_{J_1\cap S_0\cap S(x')}\frac{(u_j^0)^2}{4K_j} +\sum_{J_1\cap S_0}^{*}u_j^0 f_j(x') -\sum_{J_2\cap S_0} \left[\sqrt{K_j}f_j(x')-\frac{u_j^0}{2\sqrt{K_j}}\right]^2 + \]

\[ +\sum_{J_2\cap S_0}\frac{(u_j^0)^2}{4K_j} \leqslant \widetilde m+\sum_{S_0}\frac{(u_j^0)^2}{4K_j}; \]

here \(S(x')=\{j\mid f_j(x')>0\}\).

But since \(x'\) is an arbitrary element of \(M\), we have

\[ \widetilde m_1(\mathbf K)=\sup_{x\in M}F_1(x,\mathbf K)\leq \widetilde m+\sum_{S_0}\frac{(u_j^0)^2}{4K_j}. \]

Assertion 2) is proved.

Corollary 1. \(\widetilde m_1(\mathbf K)\to \widetilde m\) as \(\min_j K_j\to +\infty\).

Corollary 2. If the optimal set \(\widetilde M\) of problem \(C\) is bounded, then

\[ |\overline x_1(\mathbf K)-\widetilde M| =\inf_{y\in \widetilde M}|\overline x_1(\mathbf K)-y|\to 0 \quad \text{as } \min_j K_j\to +\infty . \]

Corollary 3. If in system (1) all constraints are linear, then for \(K_j=K>0\),

\[ |\overline x_1(\mathbf K)-\widetilde M| \leq C_0\frac{1}{K}\max\left\{ \frac{\sqrt2+1}{2}\|u^0\|,\, S_0\left|\frac{\sqrt2+2}{2}\right|\|u\|^2 \right\}, \]

where \(C_0\) is a constant depending only on the coefficient matrix of system (1).

II. Problem \(C_2\). Find

\[ \sup_{x\in M}[(c,x)-\Phi_2(x,\mathbf K)], \]

where

\[ \Phi_2(x,\mathbf K)= \sum_{j_1\cap S_0}K_j\delta_j(x)f_j(x) +\sum_{j_2\cap S_0}K_j|f_j(x)|, \]

and \(M\) is the set of solutions of system (2).

Theorem 2. If \(\widetilde m_2(\mathbf K)\) is the optimal value of problem \(C_2\) and \(K_j>|u_j^0|\) for \(j\in S_0\), then

1) \(\widetilde m_2(\mathbf K)=\widetilde m\);

2) the optimal sets of problems \(C\) and \(C_2\) coincide.

We prove the first assertion of the theorem. Let \(x'\in M\). Using (3) and (4), we estimate

\[ F_2(x',\mathbf K)=(c,x')-\Phi_2(x',\mathbf K). \]

We have

\[ \begin{aligned} F_2(x',\mathbf K) &=\sum u_j^0(\nabla f_j(\widetilde x),x')-\Phi_2(x',\mathbf K) \\ &=\widetilde m+\sum_1^l u_j^0\gamma_j-\Phi_2(x',\mathbf K) \\ &\leq \widetilde m+\sum_{j_1\cap S_0}u_j^0 f_j(x') +\sum_{j_2\cap S_0}u_j^0 f_j(x') \\ &\quad -\sum_{j_1\cap S_0}K_j\delta_j(x')f_j(x') -\sum_{j_2\cap S_0}K_j|f_j(x')| \\ &\leq \widetilde m -\sum_{j_1\cap S_0}(K_j-u_j^0)\delta_j(x')f_j(x') -\sum_{j_2\cap S_0}(K_j-|u_j^0|)|f_j(x')| \\ &\leq \widetilde m . \end{aligned} \]

Thus \(F_2(x',\mathbf K)\leq \widetilde m\). Since \(x'\) is an arbitrary element of \(M\), we have

\[ \widetilde m_2(\mathbf K)\leq \widetilde m . \tag{5} \]

On the other hand, \(\widetilde x\in M\) and \(F_2(\widetilde x,\mathbf K)=\widetilde m\). Hence,

\[ \widetilde m_2(\mathbf K)=\sup_{x\in M}F_2(x,\mathbf K)\geq \widetilde m. \]

Comparing the last inequality with (5), we obtain the relation to be proved.

Remark. The function \(\Phi_1(x,\mathbf K)\) is smooth. Therefore, reducing problem \(C\) to problem \(C_1\) is expedient in the case when problem \(C_1\) is solved by one of the methods using continuity of the gradient of the optimizing function (and most deterministic methods are of precisely this kind). The choice of a “penalty” function in the form \(\Phi_2(x,\mathbf K)\), however, is expedient, for example, in probabilistic optimization methods.

III. Theorems 1 and 2 remain valid if, in the functions \(\Phi_1(x,\mathbf K)\) and \(\Phi_2(x,\mathbf K)\), terms of the form \(K_j\delta_j(x)f_j^2(x)\) and \(K_j\delta_j(x)f_j(x)\) are partially or completely replaced, respectively, by \(K_j[f_j(x)+z_j]^2\) and \(K_j|f_j(x)+z_j|\), where \(z_j\) are auxiliary variables on which nonnegativity conditions are imposed. This fact can be used successfully in constructing rational numerical schemes that lead to the solution of a problem of type \(C\) by reducing it to a problem of type \(C_1\). Suppose, for example, that a linear programming problem is given: find \(\max(c,x)\) for

\[ x\in \{x\mid (c_j,x)- \]

\(-a_j \leqslant 0\) \((j \in J_1)\), \((c_j, x) - a_j = 0\) \((j \in J_2)\), \(x \geqslant c\). If we set

\[ \Phi(x, z, K)=\sum_{J_1} K_j \left[(c_j,x)-a_j+z_j\right]^2+\sum_{J_2} K_j \left[(c_j,x)-a_j\right]^2, \]

then the solution of the original problem can be replaced by the solution of the problem

\[ \max_{(x,z)>0} \left[(c,x)-\Phi(x,z,K)\right]. \]

The function optimized in the latter problem is a concave quadratic form, whose maximization on the set of nonnegative values of the variables can be effectively carried out, for example, by the gradient projection method.

Computations for a number of linear programming problems (using a program for the M-20 computer), carried out in accordance with the “penalty” method, made it possible to find, with high accuracy, the optimal values (and optimal points) of the problems being solved.

Sverdlovsk Branch
of the V. A. Steklov Mathematical Institute
Academy of Sciences of the USSR Received
24 V 1966

REFERENCES

\(^{1}\) R. Wolfe, Recent Developments in Nonlinear Programming, 1962.
\(^{2}\) T. Pietrzykowski, Prace ZAM, ser. A, No. 13 (1961).
\(^{3}\) T. Pietrzykowski, Algorytmy, 1, No. 1 (1962).
\(^{4}\) A. V. Fiacco, G. P. McCormick, Manag. Sci., 10, No. 2 (1964).
\(^{5}\) N. P. Buslenko, G. A. Sokolov, Economics and Mathematical Methods, 1, No. 1 (1965).
\(^{6}\) M. V. Vilkov, Automation and Telemechanics, 26, No. 11 (1965).
\(^{7}\) D. B. Yudin, E. G. Golshtein, Problems and Methods of Linear Programming, Moscow, 1964.
\(^{8}\) Okamura Kiychisa, J. Soc. Ind. and Appl. Math. A2, No. 3 (1965).
\(^{9}\) G. Zoutendijk, Methods of Feasible Directions, IL, 1963.
\(^{10}\) A. J. Hoffman, J. Res. Nat. Bur. Stand., 49, No. 4 (1952).

Submission history

Reports of the Academy of Sciences of the USSR