A quasi-Newton augmented Lagrangian algorithm for constrained optimization problems

Abstract

In this paper, we propose a method for finding an optimum solution of the nonlinear optimization problem with equality constraints. The original problem is replaced by a sequence of unconstrained problems of the augmented Lagrangian function. The subproblems are minimized by using quasi-Newton methods. The Hessian matrix of the augmented Lagrangian is updated by using a new secant approximations which is positive definite at every iteration. A set of computational results on test problems from CUTEr collection are presented.

Keywords

Constrained optimization augmented Lagrangian methods quasi-Newton methods equality constraints

1 Introduction

Many applications in engineering, decision science, operations research and other branches are formulated as constrained optimization problems. In these applications, the variables may interrelated by physical laws like the conservation of mass or energy, Kirchhoffs voltage and current laws, and other system equalities or inequalities that must be satisfied [3 , 43].

We consider the nonlinear equality constrainedproblem: $min_{x \in ℝ^{n}} f (x) subject to c_{i} (x) = 0, i = 1, 2, . . ., m .$ (1) where the functions $f : ℝ^{n} ⟶ ℝ$ and $c_{i} : ℝ^{n} ⟶ ℝ$ are all twice continuously differentiable. For convenience, let c (x) = (c₁ (x), c₂ (x),…, c_m (x)) ^T and f_k refers to f (x_k), etc.

One of the major developments and successes in continuous optimization is the efficient quasi-Newton methods for solving minimization problems. The important reasons for this success are:

Positive definite quasi-Newton updates is compatible with line search rules that maintain global convergence. The Hessian matrix can be approximated at every iteration and positive definiteness still preserved.

In a neighborhood of a strong local minimizer the Hessian matrix maintains positive definiteness.

Unit step size helps for rapid local convergence.

Since we are concerned in equality constraints, any inequality constraints have been converted to equality by adding slack variables. The idea of transforming a constrained optimization problem to a sequence of unconstrained problems has played a main role in the formulation of algorithms since the 1960s [18].

An effective method for solving (1) involves an unconstrained function based on the quadratic penalty function, which consist of the function f in addition to the sum of the squares of the constraint violations. Thus at every iteration an easier optimization problem is to be minimized and the solution is obtained when the penalty parameter approaches infinity. Penalty methods have been grown speedily in the past years. These methods consist of two levels: inner and outer iterations. So a constrained optimization problem can be solved via a two-level cycle: in inner iteration, we start from a given point and use any local minimization method to minimize the penalty function, and in outer iteration, we test for convergence and set the value of the penalty parameter. These methods regarded as unfashionable and numerically unstable, fundamentally because of the inevitable ill-conditioning that occurs as the penalty parameter tends to infinity.

One of the most important penalty methods in nonlinear programming is the augmented Lagrangian methods, also known as the multiplier methods. Have gained their popularity for the following reasons:

They avoid the ill-conditioning difficulties encountered by the classical penalty algorithms like the Courant penalty method or log-barrier approaches which result as the penalty parameter goes to infinity [4–6 , 19].

It dispenses with the need for iterates to stay strictly feasible with respect to the inequality constraints [36].

It does not make any non smoothness, like Sl₁QP and Sl_∞QP methods [36].

An advantage of these methods are that they can be applied matrix-free [2 , 11] and still have fast local convergence under relatively weak assumptions [17, 29].

Another feature of the Augmented Lagrangian approach is that the solution of the inner iteration can be obtained using algorithms that can deal with a large number of variables without making use of factorization of matrices of any kind.

The work on augmented Lagrangian methods proposed first by Hestenes [28], Powell [39] and Rockfellar [40], and hence was adapted by Conn et al. [11]. Augmented Lagrangian function is an unconstrained function that contain both a Lagrange multiplier term and a quadratic penalty term which does not require for convergence that penalty parameter approach infinity. Augmented Lagrangian methods decreases the probability of ill-conditioning by introducing Lagrange multiplier estimates at every outer iteration into the minimized function. Bertsekas [5] shows that the method possesses at least linear convergence, and at a rate that is more satisfactory than that of the quadratic penalty method.

Since the 1990s, the interest in augmented Lagrangian methods has attracted much attention, and many algorithms based on using the augmented Lagrangian as an objective function for successive unconstrained minimization is proposed.

The implementations can be built from the methods and software for unconstrained optimization. Effective algorithms for solving nonlinear minimization problems based on the augmented Lagrangian function have been proposed [13–15 , 31]. One important work on practical augmented Lagrangian method was the package LANCELOT [12] that is based on the paper by Conn et al. [11]. LANCELOT algorithm treats a problem containing equality constraints and bound constraints to find the approximate solution of the constrained problem. Similarly, the package MINOS invented by Murtagh and Saunders [34] is a software for solving large-scale optimization problems, that finds the solution of a sequence of subproblems in which the constraints are linearized and the objective is an augmented Lagrangian function.

Augmented Lagrangian functions can be applied as a merit function for sequential quadratic programming (SQP) methods [9 , 42]. These methods find an approximate solution of a sequence of quadratic programming subproblems in which a quadratic model of the objective function is minimized subject to linearization of the constraints. Evolutionary Algorithms are a class of stochastic optimization search methods that have been successfully applied to solve optimization problems such as genetic algorithms, evolutionary programming, evolution strategies, Particle Swarm Optimization, and their variants [1 , 37].

A filter-typed method was present by Niu and Yuan [35] for nonlinear constrained optimization problems. The inner iteration is to find a solution of a quadratic approximation to the augmented Lagrangian function in the trust region, instead of minimizing the standard augmented Lagrangian function. Good numerical results are given, but, no convergence results are presented in that paper.

Standard trust region SQP methods may facing infeasible subproblems [44]. Sl₁QP and Sl_∞QP are designed to solve this situation, but these two approaches still involve difficulties arisen from nonsmoothness [44]. The augmented Lagrangian method attracts much attention due to its applications to sparse optimization in compressive sensing and low rank matrix optimization problems. So it may be seem that a reasonable choice is to try to minimize the augmented Lagrangian function but with a new quasi-newton updates for the Hessian matrix. With this we solve the difficulties from infeasibility and avoid the nonsmoothness problem.

The purpose of this paper is to propose, analyze, and present an algorithm for the constrained minimization problems based on the sequential minimization of the augmented Lagrangian function to overcome the disadvantages described above. The original problem is replaced by a sequence of unconstrained minimization problems. The subproblems are minimized by using new quasi-Newton method where the Hessian matrix of the augmented Lagrangian is positive definite at every iteration.

This paper is organized as follows. In Section 2 the algorithm and the technique used to approximate the Hessian matrix are proposed. The stability of the search direction is given in Section 3. An applications of the algorithm to nonlinear equality constrained optimization problems are in Section 4. Conclusions are presented in Section 5. Throughout the paper, ∥.∥ denotes the Euclidean norm of vectors.

2 Algorithm description

The augmented Lagrangian function for general constrained optimization problem (1) is defined by $L (x, λ, μ) = f (x) - \sum_{i = 1}^{m} λ_{i} c_{i} (x) + \frac{1}{2} \sum_{i = 1}^{m} μ_{i} (c_{i} (x))^{2}$ (2) where λ = (λ₁, λ₂,…, λ_m) is the Lagrange multiplier vector and μ = (μ₁, μ₂,…, μ_m), μ_i > 0, i = 1,…, m is the penalty parameter vector.

If λ_∗ is the Lagrange multiplier vector associated with a local minimum x_∗ and the penalty parameters are sufficiently large, then x_∗ is a strict local solution of L (x, λ_∗, μ) [36]. Hence from this theoretical result given a point x_k, a new iterate x_k+1 can be found by solving the subproblem: $min_{x \in ℝ^{n}} L (x, λ^{k}, μ^{k}) .$ (3) Conn, Gould, and Toint [11] proposed that the exact solution for (3) is not required, but we need to use an iterative method to find an approximate minimizer x_k+1 such that $∥ \nabla_{x} L (x_{k + 1}, λ^{k}, μ^{k}) ∥ \leq ∊_{1}^{k} .$ (4) Where $∊_{1}^{k}$ is small and by forcing $∊_{1}^{k} ⟶ 0$ together with adaptive update of λ^k and μ^k the algorithm will find KKT points.

Therefore, by using the augmented Lagrangian function we try to generate a sequence of points {x_k}, each of them is an approximate solution of (3) and update λ^k and μ^k to make x_k+1 feasible point that is to reduce ∥c (x_k+1)∥ to certain error.

The unconstrained subproblem can be minimized by using the quasi-Newton methods which have been combined with line search methods. The step size is the optimal point for the problem $min_{α} L (x_{k} + α d_{k}, λ^{k}, μ^{k})$ (5) where d_k is the search direction in iteration k.

Quasi-Newton methods is an efficient method for solving (3). In quasi-Newton methods the Hessian matrix does not need to be computed, so it can be used when the Hessian is unavailable or difficult to calculate. Several recent computational studies have shown that the quasi-Newton methods are effective methods for solving minimization problems. These methods use only first derivatives to make an approximation to the inverse of Hessian matrix at each step instead of performing the computational work of evaluating and inverting Hessian.

The idea of a line search is to control the length of the used direction by solving the one-dimensional problem (5). The gradient of (3) can computed, we can effectively do a one-dimensional search with derivatives. When the search direction is a descent direction, which can be occur if the Hessian matrix is positive definite, the line search can find a positive step α > 0. Since the search direction is not exactly the right direction, there is no need to find an exact minimum, usually it is enough to move towards the optimum until convergence. One condition that measures progress is called the Armijo condition, which we will use to find an optimal step length.

For the stopping criterion, given a sequence ${∊_{1}^{k}}$ converging to zero, each inner iteration terminates at x_k+1 as an approximate solution to (3) if $∥ \nabla_{x} L (x_{k + 1}, λ^{k}, μ^{k}) ∥ \leq ∊_{1}^{k} .$ (6) In the main algorithm, a minimizer x_k+1 can be consider a solution of the constrained optimization problem (1) if $∥ \nabla_{x} L (x_{k + 1}, λ^{k}, μ^{k}) ∥ \leq ∊_{1}, ∥ c (x_{k + 1}) ∥ \leq ∊_{2},$ (7) where ∊₁ and ∊₂ are positive constants, if any these conditions do not satisfy, the penalty parameters and the multiplier vector are updated and new iterate will be found.

2.1 Update of penalty parameters and vectors of multipliers

A critical point in the development of the algorithm is the strategies of updating penalty parameters and Lagrange multipliers. Every update has different effects on the efficiency of algorithms. The algorithm must generate a nondecreasing sequence {μ^k} to maintain convergence. The algorithm will contain m different penalty parameters, every component of c (x) has one penalty parameter.

The following strategy will be applied: the penalty parameter is increased if sufficient improvement of constraint violations is not obtained. To be more exact, if ∣c_i (x_k+1)∣ is not sufficiently less than ∣c_i (x_k)∣, then the penalty parameter is increased to satisfy $μ_{i}^{k + 1} = β μ_{i}^{k}$ , β > 1. In the algorithm we take β = 10 and decide that the current value of $μ_{i}^{k}$ doing a good job if the following condition is satisfied $∣ c_{i} (x_{k + 1}) ∣ \leq ∣ c_{i} (x_{k}) ∣ / 4 .$

We update the multiplier vector based on the current information, in order to be more accurately keeping it as a constant through the procedure of minimizing the augmented Lagrangian function then update it in the outer iteration by some formula. We employ a first order formula for updating the Lagrange multiplier vector, starting with λ⁰ = 0, $λ_{i}^{k + 1} = λ_{i}^{k} - μ_{i}^{k} c_{i} (x_{k + 1})$

2.2 Update of Hessian matrix

In the inner iteration the unconstrained subproblem can be solved by minimizing the augmented Lagrangian starting from the current iterate x_k and generate a better minimizer x_k+1 by using quasi-Newton methods.

Let A_k+1 be an approximation of $\nabla_{xx}^{2} L (x_{k + 1}, λ^{k}, μ^{k})$ , σ_k = x_k+1 - x_k, $y_{k} = \nabla_{x} L (x_{k + 1}, λ^{k}, μ^{k}) - \nabla_{x} L (x_{k}, λ^{k}, μ^{k}),$ and J (x) as the Jacobian of c at x that is $J (x) = (\nabla c_{1} (x), \nabla c_{2} (x), . . ., \nabla c_{m} (x))^{T},$ then the Hessian matrix is updated by a secant approximation to satisfy

$A_{k + 1} σ_{k} = y_{k} .$ (8) Differentiating (3) with respect to x, we obtain

$\begin{matrix} \nabla_{x} L (x, λ, μ) & = & \nabla f (x) - \sum_{i = 1}^{m} λ_{i} \nabla c_{i} (x) \\ + & \sum_{i = 1}^{m} μ_{i} c_{i} (x) \nabla c_{i} (x) \end{matrix}$ (9)

$\begin{matrix} \nabla_{xx}^{2} L (x, λ, μ) \\ = & \nabla^{2} f (x) - \sum_{i = 1}^{m} λ_{i} \nabla^{2} c_{i} (x) + \sum_{i = 1}^{m} μ_{i} c_{i} (x) \nabla^{2} c_{i} (x) \\ + & \sum_{i = 1}^{m} μ_{i} \nabla c_{i} (x) \nabla c_{i}^{T} (x) \\ = & \nabla^{2} f (x) - \sum_{i = 1}^{m} [λ_{i} - μ_{i} c_{i} (x)] \nabla^{2} c_{i} (x) \\ + & \sum_{i = 1}^{m} μ_{i} \nabla c_{i} (x) \nabla c_{i}^{T} (x), \end{matrix}$ (10) at a minimizer (x_∗, λ_∗) the Hessian (10) takes the form

$\begin{matrix} \nabla_{xx}^{2} L (x_{*}, λ_{*}, μ) & = & \nabla_{xx}^{2} l (x_{*}, λ_{*}) \\ + & \sum_{i = 1}^{m} μ_{i} \nabla c_{i} (x_{*}) \nabla c_{i}^{T} (x_{*}), \end{matrix}$ (11) which is positive definite for all μ larger than a certain threshold value μ_∗, where l (x, λ) = f (x) - λ^Tc (x). Also last term in (11) adds positive curvature to the Lagrangian on the space spanned by the columns of J (x) ^T while leaving the curvature on the null space of J (x) unchanged [36]. Now we can choose quasi-Newton approximation A_k+1 of $\nabla_{xx}^{2} L (x_{k + 1}, λ^{k}, μ^{k})$ , which will always be positive definite and can be used directly to minimize (3).

Let B_k+1 be the inverse of A_k+1 at x_k+1 we present an update formula of rank two as follows [41] $\begin{matrix} B_{k + 1} & = & B_{k} + φ \frac{σ_{k} σ_{k}^{T}}{σ_{k}^{T} y_{k}} - (1 - φ) \frac{B_{k} y_{k} y_{k}^{T} B_{k}}{y_{k}^{T} B_{k} y_{k}} \\ + & \frac{((1 - φ) σ_{k} - φ B_{k} y_{k}) ((1 - φ) σ_{k} - φ B_{k} y_{k})^{T}}{((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} y_{k}} . \end{matrix}$ (12)

or in the form $\begin{matrix} B_{k + 1} \\ = & B_{k} + ((\frac{φ}{σ_{k}^{T} y_{k}} + \frac{(1 - φ)^{2}}{((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} y_{k}}) σ_{k} \\ - & \frac{φ (1 - φ)}{((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} y_{k}} B_{k} y_{k}) σ_{k}^{T} \\ - & (\frac{φ (1 - φ)}{((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} y_{k}} σ_{k} \\ - & (\frac{φ^{2}}{((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} y_{k}} - \frac{(1 - φ)}{y_{k}^{T} B_{k} y_{k}}) B_{k} y_{k}) \\ (B_{k} y_{k})^{T} . \end{matrix}$ (13) This formula represent a class of approximating matrices as a function of a scalar parameter φ that includes the DFP and BFGS methods as special cases. When φ = 1 gives DFP formula and $φ = 1 + \frac{y_{k}^{T} B_{k} y_{k} + \sqrt{(y_{k}^{T} B_{k} y_{k})^{2} + 4 σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k}}}{2 σ_{k}^{T} y_{k}}$ is the BFGS method. The significance of the parameter φ appear in the search direction that, it will not vanish and the algorithm stops when ∇_xL (x_k+1, λ^k, μ^k) ≠0 and φ ≥ 1.

2.3 Algorithm

From the previous discussions the inner and outer iterations are summarized in the following algorithms.

Algorithm 2.1. (Outer iteration)

Given $μ_{i}^{0} > 0, i = 1, . . ., m$ ; tolerances ∊₁ ≥ 0 and ∊₂ ≥ 0; β > 1; starting points $x_{0} \in ℝ^{n}$ and $λ^{0} \in ℝ^{m}$ . Set k = 0.

Find an approximate minimizer x_k+1 of (3) by using Algorithm 2.2

If ∥ ∇ _xL (x_k+1, λ^k, μ^k) ∥ ≤ ∊₁ and ∥c (x_k+1) ∥ ≤ ∊₂, stop with approximate solution x_k+1 as a minimizer of the constrained optimization problem (1).

Update the Lagrange multiplier vector with $λ_{i}^{k + 1} = λ_{i}^{k} - μ_{i}^{k} c_{i} (x_{k + 1}), i = 1, . . ., m .$

Choose new penalty parameters such that: for i = 1,…, m, set $μ_{i}^{k + 1} = {\begin{matrix} μ_{i}^{k}, & if ∣ c_{i} (x_{k + 1}) ∣ \leq ∣ c_{i} (x_{k}) ∣ / 4; \\ β μ_{i}^{k} & otherwise, \end{matrix}$ where x_k is the approximate solution in the current iteration before entering Algorithm 2.2 in step 1.

Set k = k + 1 and go to step 1.

The iterative procedure of the inner iteration which is used in step 1 of Algorithm 2.1 can be given as follows:

Algorithm 2.2. (Inner iteration)

Start with an initial point $x_{k} \in ℝ^{n}$ which is obtained from Algorithm 2.1, a n × n positive definite symmetric matrix B_k to approximate the inverse of the Hessian matrix of L (x_k, λ^k, μ^k). In the absence of additional information, B_k is taken as the identity matrix I, and a tolerances $∊_{1}^{k} \geq 0$ .

Compute the gradient of the augmented Lagrangian function, g_k, at point x_k, and set $d_{k} = - B_{k} g_{k} .$ (14)

Find an acceptable stepsize α_k in the direction d_k and set $x_{k + 1} = x_{k} + α_{k} d_{k} .$ (15)

Test the new point x_k+1 for optimality. If $∥ \nabla_{x} L (x_{k + 1}, λ^{k}, μ^{k}) ∥ \leq ∊_{1}^{k}$ , terminate the iterative process and return to step 2 in Algorithm 2.1. Otherwise, go to step 4.

Update the Hessian matrix. Set x_k = x_k+1, B_k = B_k+1 and go to step 1.

In Algorithm 2.2

{∊_{1}^{0}, ∊_{1}^{1}, . . .}

is a sequences decreasing to 0, also if we use the Hessian matrix instead of it’s inverse, the matrx B_k will replaced by A_k as an approximation to Hessian matrix and equation (107) in step 1 becomes

A_{k} d_{k} = - g_{k},

(16) which gives the search direction after solving the system (108).

3 Stability

It is usual for descent methods to be stable because one ensures that the function to be minimized is decreased by each step. It will be shown in this section that the direction of search -B_kg_k is downhill, so α_k can always be chosen to be positive. The direction will be descent if and only if $- g_{k}^{T} B_{k} g_{k} < 0,$ which means that we wish to prove that B_k is positive definite. For this we have to show that x^TB_k+1x > 0 for all nonzero vectors x by assuming that B_k is positive definite.

Theorem 3.1. Let B_k be positive definite, and φ ≥ 1, then B_k+1 is positive definite.

Proof. Let x be an arbitrary vector, then from (12) we have $\begin{matrix} x^{T} B_{k + 1} x \\ = & x^{T} B_{k} x + φ \frac{x^{T} σ_{k} σ_{k}^{T} x}{σ_{k}^{T} y_{k}} - (1 - φ) \frac{x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x}{y_{k}^{T} B_{k} y_{k}} \\ + & \frac{x^{T} ((1 - φ) σ_{k} - φ B_{k} y_{k}) ((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} x}{((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} y_{k}} . \end{matrix}$ Let $\begin{matrix} C & = & φ \frac{x^{T} σ_{k} σ_{k}^{T} x}{σ_{k}^{T} y_{k}} + φ \frac{x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x}{y_{k}^{T} B_{k} y_{k}} \\ + & \frac{x^{T} ((1 - φ) σ_{k} - φ B_{k} y_{k}) ((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} x}{((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} y_{k}}, \end{matrix}$ and $D = x^{T} B_{k} x - \frac{x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x}{y_{k}^{T} B_{k} y_{k}},$ (17) then we have $x^{T} B_{k + 1} x = C + D .$ (18) First we consider, $\begin{matrix} C & = & φ \frac{x^{T} σ_{k} σ_{k}^{T} x}{σ_{k}^{T} y_{k}} + φ \frac{x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x}{y_{k}^{T} B_{k} y_{k}} \\ + & \frac{x^{T} ((1 - φ) σ_{k} - φ B_{k} y_{k}) ((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} x}{((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} y_{k}}, \end{matrix}$ then $\begin{matrix} C & = & \frac{1}{σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} ((1 - φ) σ_{k}^{T} y_{k} - φ y_{k}^{T} B_{k} y_{k})} \\ \times & {φ y_{k}^{T} B_{k} y_{k} ((1 - φ) σ_{k}^{T} y_{k} - φ y_{k}^{T} B_{k} y_{k}) x^{T} σ_{k} σ_{k}^{T} x \\ + & φ σ_{k}^{T} y_{k} ((1 - φ) σ_{k}^{T} y_{k} - φ y_{k}^{T} B_{k} y_{k}) x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x \\ + & σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} x^{T} ((1 - φ) σ_{k} - φ B_{k} y_{k}) \\ ((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} x}, \end{matrix}$ (19) but since

$\begin{matrix} x^{T} ((1 - φ) σ_{k} - φ B_{k} y_{k}) ((1 - φ) σ_{k} - φ B_{k} y_{k})^{T} x \\ = & (1 - φ)^{2} x^{T} σ_{k} σ_{k}^{T} x - φ (1 - φ) x^{T} σ_{k} y_{k}^{T} B_{k} x \\ - & φ (1 - φ) x^{T} B_{k} y_{k} σ_{k}^{T} x + φ^{2} x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x, \end{matrix}$ (20) therefore equation (19) becomes $\begin{matrix} C & = & \frac{1}{σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} ((1 - φ) σ_{k}^{T} y_{k} - φ y_{k}^{T} B_{k} y_{k})} \\ \times & {((1 - φ) σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} - φ^{2} {(y_{k}^{T} B_{k} y_{k})}^{2}) x^{T} σ_{k} σ_{k}^{T} x \\ - & φ (1 - φ) σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} (x^{T} σ_{k} y_{k}^{T} B_{k} x + x^{T} B_{k} y_{k} σ_{k}^{T} x) \\ + & φ (1 - φ) {(σ_{k}^{T} y_{k})}^{2} x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x} . \end{matrix}$ (21) Since φ ≥ 1 then we have, $\begin{matrix} C & \geq & \frac{1}{σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} ((φ - 1) σ_{k}^{T} y_{k} + φ y_{k}^{T} B_{k} y_{k})} \\ \times & {((φ - 1) σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} + φ^{2} {(y_{k}^{T} B_{k} y_{k})}^{2}) x^{T} σ_{k} σ_{k}^{T} x \\ - & φ (φ - 1) σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} (x^{T} σ_{k} y_{k}^{T} B_{k} x + x^{T} B_{k} y_{k} σ_{k}^{T} x) \\ + & φ (φ - 1) {(σ_{k}^{T} y_{k})}^{2} x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x \\ - & (φ - 1) {(σ_{k}^{T} y_{k})}^{2} x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x}, \end{matrix}$

$\begin{matrix} C & \geq & \frac{1}{σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} ((φ - 1) σ_{k}^{T} y_{k} + φ y_{k}^{T} B_{k} y_{k})} \\ \times & {(φ - 1) σ_{k}^{T} y_{k} y_{k}^{T} B_{k} y_{k} {| x^{T} σ_{k} |}^{2} \\ + & {| x^{T} (φ y_{k}^{T} B_{k} y_{k} σ_{k} - (φ - 1) σ_{k}^{T} y_{k} B_{k} y_{k}) |}^{2}} . \end{matrix}$ For a general function, we have $σ_{k}^{T} y_{k} = σ_{k}^{T} g_{k + 1} - σ_{k}^{T} g_{k}$ Note that $σ_{k}^{T} g_{k} < 0$ is due to B_k is positive definite. Using exact line search with $σ_{k}^{T} g_{k + 1} = 0$ , then $σ_{k}^{T} y_{k} > 0$ . When we use inexact line search the condition $σ_{k}^{T} y_{k} > 0$ can also be satisfied. In general, as long as we increase the precision of line search, we can make $σ_{k}^{T} g_{k + 1}$ small enough in magnitude to the desired degree. Hence with exact or inexact line search, $C > 0 .$ (23) Now we consider, $D = x^{T} B_{k} x - \frac{x^{T} B_{k} y_{k} y_{k}^{T} B_{k} x}{y_{k}^{T} B_{k} y_{k}},$ (24) since B_k is a positive definite matrix, hence B_k has a positive square root. Let W be the square root of B_k, we can rewrite (24) as $D = x^{T} {WW}^{T} x - \frac{x^{T} {WW}^{T} y_{k} y_{k}^{T} {WW}^{T} x}{y_{k}^{T} {WW}^{T} y_{k}},$ (25) let u = W^Tx and v = W^Ty_k equation (25) becomes

$\begin{matrix} D & = u^{T} u - \frac{(u^{T} v) (v^{T} u)}{v^{T} v} \\ = \frac{(u^{T} u) (v^{T} v) - (u^{T} v)^{2}}{| v |^{2}} \\ \geq 0, \end{matrix}$ (26) where |v|² is always positive and the numerator on the right side is always nonnegative by Schwartz’s inequality. From (23) and (26) B_k+1 is positivedefinite.

4 Numerical results

The main aim of this section is to examine the numerical performance of the algorithm on a set of test problems. The proposed algorithm is executed in Fortran 90 on a PC running ubuntu 14.04.2. Algorithm 2.1 is tested with the results given in [25], thus we test 71 equality constrained optimization problems from CUTEr [27]. In our implementations, the maximum number of iterations is 500. Also the maximum number of function evaluations is set as 4000 and we use the default starting point x₀ for each problem. The Armijo condition is used to determine the step length and the termination condition for each problem is $∥ \nabla_{x} L (x_{k + 1}, λ^{k}, μ^{k}) ∥ < 10^{- 6}, ∥ c (x_{k + 1}) ∥ < 10^{- 6} .$

Tables 1 and 2 shows the computation results, where for each problem we have the following meanings

n: denote the number of variables;

m: is the number of equality constraints;

Iter: the total number of iterations;

n_f: the total number of function evaluations.

In CUTEr the function value and constraints values can be obtained simultaneously by calling the subroutine " cfn ", thus n_f denotes the number of " cfn " calls. A table entry " F " means that the corresponding algorithm terminates unsuccessfully where the optimum point can not be reached because the number of function evaluations or the number of iterations exceeds the maximum number.

Table 1
Test results for problems (A–H)

Prob. Dim. SNOPT pdSQP Algorithm 2.1

Problem n m Iter n _f Iter n _f Iter n _f

BT1 2 1 10 21 6 9 4 40

BT2 3 1 15 16 13 14 2 47

BT3 5 3 6 7 1 2 6 38

BT4 3 2 7 10 7 14 5 48

BT5 3 2 8 11 5 7 4 23

BT6 5 2 14 16 9 11 3 45

BT7 5 3 19 36 41 57 12 184

BT8 5 2 11 13 19 20 2 22

BT9 4 2 18 30 14 23 6 56

BT10 2 2 13 23 6 7 6 38

BT11 5 3 11 14 7 8 7 59

BT12 5 3 8 9 4 5 3 36

BYRDSPHR 3 2 10 14 30 59 4 30

COOLHANS 9 9 19 28 13 14 2 83

DIXCHLNG 10 5 29 30 40 43 2 77

EIGENA2 110 55 3 4 4 5 3 136

EIGENACO 110 55 3 4 8 10 2 82

EIGENB2 110 55 3 4 22 37 2 355

EIGENBCO 110 55 3 4 74 125 2 496

EIGENC2 462 231 243 290 44 83 6 2063

EIGENCCO 462 231 208 253 48 93 2 1892

ELEC 600 200 359 403 59 115 F F

GRIDNETE 60 36 37 38 3 4 7 268

GRIDNETH 60 36 72 73 5 6 7 276

HS6 2 1 6 7 11 26 1 26

HS7 2 1 17 30 8 11 3 19

HS8 2 2 5 6 4 6 2 14

HS9 2 1 6 8 4 5 2 9

HS26 3 1 23 24 18 19 1 22

HS27 3 1 20 23 10 15 3 22

HS28 3 1 10 11 1 2 1 11

HS39 4 2 18 30 14 23 6 44

HS40 4 3 6 7 4 5 5 27

HS42 4 2 6 8 5 6 8 39

HS46 5 2 25 26 18 20 2 32

HS47 5 3 22 31 17 22 1 25

HS48 5 2 7 8 1 2 2 23

HS49 5 2 31 32 16 17 1 28

HS50 5 3 19 21 9 10 2 29

HS51 5 3 7 8 1 2 2 24

HS52 5 3 6 8 1 2 7 44

HS56 7 4 10 14 5 6 4 50

HS61 3 2 68 174 13 19 4 31

HS77 5 2 12 14 9 11 4 46

HS78 5 3 6 7 4 5 4 31

HS79 5 3 11 14 4 5 4 33

HS100LNP 7 2 14 19 18 22 3 55

HS111LNP 10 3 49 104 18 34 5 99

	Prob. Dim.	SNOPT	pdSQP	Algorithm 2.1
BT1	2	1	10	21	6	9	4	40
BT2	3	1	15	16	13	14	2	47
BT3	5	3	6	7	1	2	6	38
BT4	3	2	7	10	7	14	5	48
BT5	3	2	8	11	5	7	4	23
BT6	5	2	14	16	9	11	3	45
BT7	5	3	19	36	41	57	12	184
BT8	5	2	11	13	19	20	2	22
BT9	4	2	18	30	14	23	6	56
BT10	2	2	13	23	6	7	6	38
BT11	5	3	11	14	7	8	7	59
BT12	5	3	8	9	4	5	3	36
BYRDSPHR	3	2	10	14	30	59	4	30
COOLHANS	9	9	19	28	13	14	2	83
DIXCHLNG	10	5	29	30	40	43	2	77
EIGENA2	110	55	3	4	4	5	3	136
EIGENACO	110	55	3	4	8	10	2	82
EIGENB2	110	55	3	4	22	37	2	355
EIGENBCO	110	55	3	4	74	125	2	496
EIGENC2	462	231	243	290	44	83	6	2063
EIGENCCO	462	231	208	253	48	93	2	1892
ELEC	600	200	359	403	59	115	F	F
GRIDNETE	60	36	37	38	3	4	7	268
GRIDNETH	60	36	72	73	5	6	7	276
HS6	2	1	6	7	11	26	1	26
HS7	2	1	17	30	8	11	3	19
HS8	2	2	5	6	4	6	2	14
HS9	2	1	6	8	4	5	2	9
HS26	3	1	23	24	18	19	1	22
HS27	3	1	20	23	10	15	3	22
HS28	3	1	10	11	1	2	1	11
HS39	4	2	18	30	14	23	6	44
HS40	4	3	6	7	4	5	5	27
HS42	4	2	6	8	5	6	8	39
HS46	5	2	25	26	18	20	2	32
HS47	5	3	22	31	17	22	1	25
HS48	5	2	7	8	1	2	2	23
HS49	5	2	31	32	16	17	1	28
HS50	5	3	19	21	9	10	2	29
HS51	5	3	7	8	1	2	2	24
HS52	5	3	6	8	1	2	7	44
HS56	7	4	10	14	5	6	4	50
HS61	3	2	68	174	13	19	4	31
HS77	5	2	12	14	9	11	4	46
HS78	5	3	6	7	4	5	4	31
HS79	5	3	11	14	4	5	4	33
HS100LNP	7	2	14	19	18	22	3	55
HS111LNP	10	3	49	104	18	34	5	99

Table 2

Test results for problems (I–Z)

	Prob. Dim.		SNOPT		pdSQP		Algorithm 2.1
Problem	n	m	Iter	n _f	Iter	n _f	Iter	n _f
LUKVLE1	100	98	11	13	10	11	2	297
LUKVLE3	100	2	35	36	9	10	8	938
LUKVLE6	99	49	28	29	F	F	15	869
LUKVLE7	100	4	33	34	8	12	7	942
LUKVLE8	100	98	18	24	F	F	F	F
LUKVLE9	100	6	F	F	74	129	3	2320
LUKVLE10	100	98	F	F	10	13	5	413
LUKVLE13	98	64	68	76	F	F	10	1201
LUKVLE14	98	64	33	38	F	F	8	1231
LUKVLE16	97	72	160	342	F	F	2	287
LCH	300	1	F	F	17	19	6	1883
MSS1	90	73	59	69	F	F	9	995
MARATOS	2	1	7	13	3	4	4	13
MWRIGHT	5	3	9	10	7	9	5	44
ORTHRDM2	203	100	7	10	6	8	3	2263
ORTHRDS2	203	100	80	161	F	F	2	3749
ORTHREGA	133	64	19	21	F	F	10	1307
ORTHREGB	27	6	6	8	4	5	2	208
ORTHREGC	25	10	14	16	7	8	6	198
ORTHREGD	23	10	442	1462	6	8	2	279
ORTHRGDM	23	10	11	14	6	8	2	366
ORTHRGDS	155	76	F	F	11	17	3	3177
S316-322	2	1	6	10	24	25	9	34

From Tables 1 and 2, we can see that the total number of iterations in Algorithm 2.1 is less than that of both methods for almost all the problems. Algorithm pdSQP was unable to solve the 8 cases mss1, lukvle6, lukvle8, lukvle13, lukvle14, lukvle16, orthrds2, and orthrega and algorithm SNOPT has 4 problems unsolved these are lukvle9, lukvle10, lch, and orthrgds. But Algorithm 2.1 was more robust than pdSQP or SNOPT with only 2 problems that are unsolved and for the other problems the solution point was reached successfully. Finally for some problems Algorithm 2.1 requires fewer number of function evaluations than pdSQP or SNOPT. Therefore we can say that the new algorithm is superior to both pdSQP and SNOPT in a certain extent.

5 Conclusions

An augmented Lagrangian algorithm is proposed for solving nonlinear equality constrained optimization problems. The Hessian matrix of augmented Lagrangian function in the unconstrained subproblems is updated using new Hessian approximations. The search direction based on these matrices is decent, since the Hessian matrix is positive definite for all values of φ ≥ 1. The formula (12) or (13) represent a class of approximating matrices as a function of a scalar parameter φ that includes the DFP and BFGS methods as special cases. Numerical results of equality constrained problems from CUTEr are presented. The algorithm has been compared with algorithms pdSQP and SNOPT and the numerical tests of Algorithm 2.1 seem to be promising.

References

Ameli

, Alfi

and Aghaebrahimi

, A fuzzy discrete harmony search algorithm applied to annual cost reduction in radial distribution systems, Engineering Optimization 48 (2016), 1529–1549.

Andreani

, Birgin

E.G.

, Martínez

J.M.

and Schuverdt

M.L.

, Augmented Lagrangian methods under the constant positive linear dependence constraint qualification, Mathematical Programming 111 (2008), 5–32.

Antoniou

and Lu

W.S.

, Practical Optimization: Algorithms and Engineering Applications, Springer, 2007.

Bazaraa

M.S.

, Sherali

H.D.

and Shetty

C.M.

, Nonlinear Programming Theory and Algorithms. 2nd edition, John Wiley & Sons, New York 1993.

Bertsekas

D.P.

, On penalty and multiplier methods for constrained minimization, SIAM Journal on Control and Optimization 14(2) (1976), 216–235.

Bertsekas

D.P.

Nonlinear Programming. Athena Scientific, Belmont. 1995.

Bhattacharya

and Vasant

, Soft-sensing of level of satisfaction in TOC product-mix decision heuristic using robust fuzzy-LP, European Journal of Operational Research 177 (2007), 55–70.

Birgin

E.G.

and Martínez

J.M.

, Augmented Lagrangian method with nonmonotone penalty parameters for constrained optimization, Computational Optimization and Applications 51 (2012), 941–965.

Boggs

P.T.

, Kearsley

A.J.

and Tolle

J.W.

, A global convergence analysis of an algorithm for large-scale nonlinear optimization problems, SIAM Journal on Optimization 9(4) (1999), 833–862.

10.

Byrd

R.H.

, Tapia

R.A.

and Zhang

, An SQP augmented Lagrangian BFGS algorithm for constrained optimization, SIAM Journal on Optimization 2 (1992), 210–241.

11.

Conn

A.R.

, Gould

N.I.M.

and Toint

Ph.L.

, A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds, SIAM Journal on Numerical Analysis 28 (1991), 545–572.

12.

Conn

A.R.

, Gould

N.I.M.

, Toint

Ph.L.

LANCELOT: A Fortran Package for Large-scale Nonlinear Optimiza-tion(Release A), Springer, New York, 1992.

13.

Dolgopolik

M.V.

, Existence of augmented Lagrange multipliers: Reduction to exact penalty functions and localization principle, Mathematical Programming 166 (2017), 297–326.

14.

Dolgopolik

M.V.

, Augmented Lagrangian functions for cone constrained optimization: The existence of global saddle points and exact penalty property. Journal of Global Optimization (2018). 10.1007/s10898-017-0603-0 .

15.

Dussault

J.P.

, Augmented penalty algorithms, IMA Journal on Numerical Analysis 18 (1998), 355–372.

16.

Elamvazuthi

, Ganesan

, Vasant

and Webb

J.F.

, Application of a fuzzy programming technique to production planning in the textile industry, International Journal of Computer Science and Information Security 6 (2009), 238–243.

17.

Fernández

, Solodov

Local convergence of exact and inexact augmented Lagrangian methods under the secondorder sufficiency condition, IMPA. preprint A677. 2010.

18.

Fiacco

A.V.

and McCormick

G.P.

, Nonlinear Programming: Sequential Unconstrained Minimization Techniques, Wiley, New York. 1968.

19.

Fletcher

Practical Methods of Optimization. 2nd edition, John Wiley & Sons, New York, 1987.

20.

Fontecilla

, Steihaug

and Tapia

R.A.

, A convergence theory for a class of quasi-Newton methods for constrained optimization, SIAM Journal on Numerical Analysis 24 (1987), 1133–1151.

21.

Ganesan

, Vasant

and Elamvazuthi

, Optimization of nonlinear geological structure mapping using hybrid neuro– genetic techniques, Mathematical and Computer Modelling 54 (2011), 2913–2922.

22.

and Chen

, A penalty– free method with line search for nonlinear equality constrained optimization, Applied Mathematical Modelling 37 (2013), 9934–9949.

23.

Gill

P.E.

, Murray

and Saunders

M.A.

, SNOPT: An SQP algorithm for large– scale constrained optimization, SIAM Review 47 (2005), 99–131.

24.

Ghasemishabankareh

, Li

and Ozlen

, Cooperative coevolutionary differential evolution with improved augmented Lagrangian to solve constrained optimisation problems, Information Sciences 369 (2016), 441–456.

25.

Gill

P.E.

and Robinson

D.P.

, A primal– dual augmented Lagrangian, Computational Optimization and Applications 51 (2012), 1–25.

26.

Gould

N.I.M.

, On the convergence of a sequential penalty function method for constrained minimization, SIAM Journal on Numerical Analysis 26 (1989), 107–128.

27.

Gould

N.I.M.

, Orban

and Toint

Ph.L.

, CUTEr and SifDec: A constrained and unconstrained testing environment, revisite), ACM Transactions on Mathematical Software 29 (2003), 373–394.

28.

Hestenes

M.R.

, Multiplier and gradient methods, Journal of Optimization Theory and Applications 4 (1969), 303–320.

29.

Izmailov

A.F.

and Solodov

M.V.

, On attraction of linearly constrained Lagrangian methods and of stabilized and quasi-Newton SQP methods to critical multipliers, Mathematical Programming 126 (2011), 231–257.

30.

and Ma

, An accelerated augmented Lagrangian method for linearly constrained convex programming with the rate of convergence O(1/k²), Applied Mathematics– A Journal of Chinese Universities 32 (2017), 117–126.

31.

Krejić

, Martínez

J.M.

, Mello

and Pilotta

E.A.

, Validation of an augmented Lagrangian algorithm with a Gauss– Newton Hessian approximation using a set of hard-spheres problems, Computational Optimization and Applications 16 (2000), 247–263.

32.

Díaz-Madroñero

, Peidro

and Vasant

, Vendor selection problem by using an interactive fuzzy multi-objective approach with modified S-curve membership functions, Computers and Mathematics with Applications 60 (2010), 1038–1048.

33.

Mousavi

and Alfi

, A memetic algorithm applied to trajectory control by tuning of fractional order proportional-integralderivative controllers, Applied Soft Computing 36 (2015), 599–617.

34.

Murtagh

B.A.

, Saunders

M.A.

Saunders, MINOS 5.5 User’s Guide. Report SOL 83-20R. Department of Operations Research, Stanford University, Stanford. CA. (Revised 1998).

35.

Niu

L.F.

and Yuan

, A new trust region algorithm for nonlinear constrained optimization, Journal of Computational Mathematics 28 (2010), 72–86.

36.

Nocedal

, Wright

S.J.

, Numerical Optimization. Springer, New York, 2006.

37.

Pahnehkolaei

S.M.A.

, Alfi

, Sadollah

and Kim

J.H.

, Gradient-basedWater cycle algorithm with evaporation rate applied to chaos suppression, Applied Soft Computing 53 (2017), 420–440.

38.

Peidro

and Vasant

, Transportation planning with modified s-curve membership functions using an interactive fuzzy multiobjective approach, Applied Soft Computing 11 (2011), 2656–2663.

39.

Powell

M.J.D.

, A method for nonlinear constraints in minimization problems, in: Optimization. R. Fletcher, ed. Academic Press, London. 1969. pp. 283–298.

40.

Rockafellar

R.T.

, The multiplier method of Hestenes and Powell applied to convex programming, Journal of Optimization Theory and Applications 12 (1973), 555–562.

41.

Salim

M.S.

and Ahmed

A.I.

, A family of quasi-Newton methods for unconstrained optimization problems. Optimization, (2018), 10.1080/02331934.2018.1487423.

42.

Tapia

R.A.

, Diagonalized multiplier methods and quasi-Newton methods for constrained optimization, Journal of Optimization Theory and Applications 22 (1977), 135–194.

43.

Vasant

, Ganesan

and Elamvazuthi

, Fuzzy linear programming using modified logistic membership function, Journal of Engineering and Applied Sciences 5 (2010), 239–245.

44.

Wanga

and Yuan

, An augmented Lagrangian trust region method for equality constrained optimization, Optimization Methods and Software 30 (2015), 559–582.