Operations Research

(1)

Research Reports on Mathematical and Computing Sciences

Department of

Mathematical and Computing Sciences

Tokyo Institute of Technology

Operations Research

ISSN 1342-2804

Lagrangian-Conic Relaxations, Part II:

Applications to

Polynomial Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh

January 2014, B–476

(2)

B-476

Lagrangian-Conic Relaxations, Part II: Applications to Polyno- mial Optimization Problems

Naohiko Arima^⋆, Sunyoung Kim^†, Masakazu Kojima^‡, and Kim-Chuan Toh^♯ January 2014

Abstract. We present the moment cone (MC) relaxation and a hierarchy of sparse Lagrangian- SDP relaxations of polynomial optimization problems (POPs) using the unified framework established in Part I. The MC relaxation is derived for a POP of minimizing a polynomial subject to a nonconvex cone constraint and polynomial equality constraints. It is an extension of the completely positive programming relaxation for QOPs. Under a copositivity condition, we characterize the equivalence of the optimal values between the POP and its MC relaxation. A hierarchy of sparse Lagrangian-SDP relaxations, which is parameterized by a positive integer ω called the relaxation order, is proposed for an equality constrained POP. It is obtained by combining a sparse variant of Lasserre’s hierarchy of SDP relaxation of POPs and the basic idea behind the conic and Lagrangian-conic relaxations from the unified framework. We prove under a certain assumption that the optimal value of the Lagrangian-SDP relaxation with the Lagrangian multiplier λ and the relaxation order ω in the hierarchy converges to that of the POP as λ → ∞ and ω → ∞. The hierarchy of sparse Lagrangian-SDP relaxations is designed to be used in combination with the bisection and 1-dimensional Newton methods, which was proposed in Part I, for solving large-scale POPs efficiently and effectively.

Keywords. Polynomial optimization problem, moment cone relaxation, SOS relaxation, a hierarchy of the Lagrangian-SDP relaxations, exploiting sparsity.

AMS Classification. 90C22, 90C25, 90C26.

⋆ Research and Development Initiative & JST CREST, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo 112-8551 Japan. The research of this author was par- tially supported by the Japan Science and Technology Agency (JST), the Core Research of Evolutionary Science and Technology (CREST) Research Project.

(nao [email protected]).

† Department of Mathematics, Ewha W. University, 52 Ewhayeodae-gil, Sudaemoon- gu, Seoul 120-750 Korea. The research of this author was supported by NRF 2012- R1A1A2-038982. ([email protected]).

‡ Research and Development Initiative & JST CREST, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo 112-8551 Japan. The research of this author was par- tially supported by the Japan Science and Technology Agency (JST), the Core Research of Evolutionary Science and Technology (CREST) Research Project.

([email protected]).

♯ Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076. Research supported in part by Ministry of Ed- ucation Academic Research Fund R-146-000-168-112 ([email protected]).

(3)

1 Introduction

A unified framework for conic and Lagrangian-conic relaxations in Part I [5] was proposed for quadratic optimization problems (QOPs) and polynomial optimization problems (POPs).

From a theoretical viewpoint, this framework is intended to unify and generalize the existing results on the completely positive (CPP) relaxation of QOPs [2, 8, 9, 16] and its extension to POPs, which was called the moment cone (MC) relaxation in [4]. From a practical viewpoint, the Lagrangian-conic relaxation is proposed to solve large-scale conic optimization problems (COPs) obtained from effective conic relaxations, including doubly nonnegative (DNN) and semidefinite programming (SDP) relaxations of QOPs and POPs, by first-order algorithms.

In particular, the CPP relaxation and the sparse DNN relaxation for QOPs in Part I were discussed from these viewpoints.

We consider a general class of POPs of the following form:

ζ^∗ = inf

f⁰(x)

x ∈ J, f^k(x) = 0 (k = 1, 2, . . . , m)

, (1)

where J denotes a closed (but not necessarily convex) cone in the n-dimensional Eu- clidean space Rⁿ, and f^k(x) a real valued polynomial in x = (x1, x2, . . . , xn) ∈ Rⁿ (k = 0, 1, 2, . . . , m).

The first purpose of this paper is to derive the moment cone (MC) relaxation for POP (1). The MC relaxation introduced by Arima, Kim and Kojima [4] for this form of POP under a hierarchy of copositivity conditions is an extension of the CPP relaxation. The main emphasis of our discussion here is on a unified treatment of the CPP relaxation of QOPs and their MC relaxation of POPs. More precisely, we first convert POP (1) into an equivalent COP which serves as the starting optimization model in the framework. Then, we can apply the general results established on the COP in Part I [5] to derive not only the MC relaxation but also the Lagrange-MC relaxation of (1).

Let V be a finite dimensional vector space endowed with an inner product h·, ·i, and K ⊂ V a (not necessarily convex) cone. Let H⁰ ∈ V and Q^k ∈ V (k = 0, 1, 2, . . . , m). The primal COP is of the form:

COP(K): ζ^p(K) = inf

hQ⁰, Xi

X ∈ K, hH⁰, Xi = 1,

hQ^k, Xi = 0 (k = 1, 2, . . . , m)

. (2)

As a basic assumption, the following copsitivity condition (Condition (I)) is assumed: O 6=

H⁰ ∈ K^∗ and Q^k ∈ K^∗ (k = 1, 2, . . . , m), where K^∗ = {Y ∈ V : hX, Y i ≥ 0 for every X ∈ K} (the dual of K). The copositivity condition assumed here is stronger than the hierarchy of copositivity conditions of [4]. The stronger condition is necessary for deriving the Lagrangian-conic relaxation consistently.

For the application of the framework to POP (1), we construct a nonconvex cone Γ in the space V of symmetric matrices with an appropriate dimension, and symmetric matrices H⁰, Q^k∈ V (k = 0, 1, 2, . . . , m) so that COP(Γ) in (2) represents POP (1), i.e., ζ^∗ = ζ^p(Γ).

Then, the MC relaxation of POP (1) is derived as its convexification, i.e., “the strongest convex relaxation” of COP(Γ) in (2) by replacing Γ with its convex hull, co Γ. For the equivalence between COP(Γ) in (2) and its convexification COP(co Γ), or for the identity ζ^p(Γ) = ζ^p(co Γ), we assume the copositivitiy condition for K = Γ. This condition is satisfied for any cone J ∈ Rⁿif each polynomial f^k(x) is a sum of squares polynomials, thus,

(4)

if f^k(x) is replaced by f^k(x)²(k = 1, 2, . . . , m) in POP (1). Under the copositivity condition, we provide a condition (Condition (IV)) that characterizes the equivalence between POP (1) and its MC relaxation based on Theorem 3.1 of Part I [5].

The second purpose of applying the unified framework to POPs is to establish a theoretical foundation for effective, efficient and stable numerical methods, which can be implemented with first-order algorithms, for solving large-scale POPs. The MC relaxation, i.e., COP(co Γ) in (2), can be regarded as the most effective relaxation in terms of the quality of the lower bound provided for the optimal value of POP (1). It is, however, numerically intractable even when J is a simple closed convex cone such as Rⁿ or Rⁿ₊ (the nonnegative orthant of Rⁿ) in (1). As a numerically tractable relaxation of COP(co Γ) in (2) for J = Rⁿ₊, the doubly nonnegative (DNN) relaxation obtained by choosing a DNN cone for K in COP (2) and the Lagrangian-DNN relaxation of POP (1) can be considered. In the recent paper [16], the Lagrangian-DNN relaxation for a class of QOPs with complementarity constraints was implemented with first-order algorithms. It was shown through numerical results on some well-known test problems for nonconvex QOPs that the Lagrangian-DNN relaxation combined with first-order algorithms provided tight lower bounds for their optimal values efficiently and stably. See also Section 5 of Part I [5]. Those results indicate that applying the methods in [16] to POP (1) with J = Rⁿ₊ is an important subject that needs further investigation.

For an alternative to the MC relaxation, a sparse variant [17, 20, 22, 28] of the hierarchy of SDP relaxations proposed by Lasserre [21] can be used for an equality and inequality constrained POP. The hierarchy is parameterized by a positive integer ω called the relaxation order. This relaxation method is numerically tractable and as effective as the MC relaxation, in the sense that under a certain assumption, the optimal value of the SDP relaxation with the order ω converges to the optimal value of the given POP as ω → ∞. This is a theoretically strong result, but solving the SDP relaxations of increasing size with ω is known to be numerically hard. In fact, SDP solvers [12, 26, 27] based on primal-dual interior-point methods have difficulties in solving SDP relaxation problems with the dimension larger than several thousands.

We propose a hierarchy of sparse Lagrangian-SDP relaxations for an equality constrained POP of the form (1) with J = Rⁿ, by combining the hierarchy of sparse SDP relaxations and the Lagrangian-conic relaxation in the unified framework. Notice that inequality constraints can also be dealt with in the form (1) since any polynomial inequality g(x) ≥ 0 is equivalent to the equality g(x) − v² = 0 with a slack variable v ∈ R. Assuming that the polynomials f^k(x) (k = 0, 1, . . . , m) are sparse, we first embed the structured sparsity characterized by a chordal graph as in [17, 20, 22, 28], and derive a sparse sum of squares (SOS) problem equivalent to POP (1) with J = Rⁿ by applying a sparse variant (Corollary 3.3 of [22], Theorem 1 of [20]) of Putinar’s lemma (Lemma 4.1 of [25]) for representing positive polynomials by SOS polynomials. We then transform the SOS problem into a simpler SOS problem with a structure that leads to the copositivity condition in the unified framework.

The transformed SOS problem is still not numerically solvable because SOS polynomials with any degree can be used in the problem. By restricting the degrees of SOS polynomials by the relaxation order ω, we construct a hierarchy of numerically solvable SOS problems from the transformed SOS problem. Finally, we convert the hierarchy of SOS problems into the hierarchy of SDPs in the linear space Vω of symmetric matrices with the dimension determined by the maximum degree of the polynomials f^k(x) (k = 0, 1, 2, . . . , m) and ω.

(5)

Each SDP with the relaxation order ω in the hierarchy is of the form:

ζ_ω^d(Kω) = sup (

y0

Q⁰_ω− H⁰_ωy0+ Xm k=1

Q^k_ωyk ∈ K^∗_ω )

(3) for some cone Kω ⊂ Vω, symmetric matrices H⁰_ω, Q^k_ω ∈ Vω (k = 0, 1, 2, . . . , m). This problem corresponds to the dual of (2). It is constructed so that it satisfies the copositivity condition for K = Kω. Thus, the entire theory of the unified framework can be applied.

In particular, we can derive the following primal-dual pair of Lagrangian-SDP relaxation problems:

ζ_ω^p(λ, Kω) = inf

hQ⁰_ω+ λH¹_ω, Xi

X ∈ Kω, hH⁰_ω, Xi = 1

, (4)

ζ_ω^d(λ, Kω) = sup y0

y0 ∈ R, Q⁰_ω − H⁰_ωy0+ λH¹_ω ∈ K^∗_ω

, (5)

where H¹_ω = Pm

k=1Q^k_ω, and λ ∈ R denotes a Lagrangian multiplier (or parameter) pre- scribed for the problems. These problems are very simple so that efficient first-order algorithms can be designed to solve the problems. In fact, the dual problem (5) involves only one variable, which makes it possible to effectively utilize the bisection and 1-dimensional Newton methods proposed in Part I [5]; see Also [16]. They also inherit the structured sparsity characterized by a chordal graph from the one embedded in POP (1) with J = Rⁿ, for example, K^∗ is described as the Minkovski sum of positive semidefinite cones of small dimensions and linear subspaces of Vω. Moreover, the following theoretical results will be established: for every relaxation order ω and Lagrange multiplier λ,

• the optimal value ζ_ω^d(Kω) of (3) bounds the optimal value ζ^∗ of POP (1) with J = Rⁿ from below, and it monotonically converges to ζ^∗ as ω → ∞,

• the optimal value ζ_ω^d(λ, Kω) of (5) bounds the optimal value ζ_ω^d(Kω) of (3) from below, and it monotonically converges to ζ_ω^d(Kω) as λ → ∞,

• the primal problem (4) is strictly feasible (i.e., there exists a primal feasible solution that lies in the relative interior of the cone Kω) and it has an optimal solution with the optimal value ζ_ω^d(λ, Kω) = ζ_ω^p(λ, Kω) (the strong duality).

The first two results imply that the lower bound ζ_ω^d(λ, Kω) for the optimal value ζ^∗ of POP (1) satisfies ζ^∗− ǫ < ζ_ω^d(λ, Kω) for any ǫ > 0 if sufficiently large ω and λ are taken. The last result contributes to the numerical stability of first-order algorithms.

In section 2, we review the results shown in Part I [5] and describe the notation and symbols used in this paper. We also present how to represent polynomials with symmetric matrices of monomials, and introduce SOS of polynomials. Section 3 includes the discussion on the MC relaxation of POP (1), and section 4 presents the hierarchy of sparse Lagrangian- SDP relaxations of POP (1) with J = Rⁿ.

2 Preliminaries

2.1 Conic and Lagrangian-conic optimization problems

The results in Part I [5] are summarized in this subsection. We first list some notation used in Part I. Let V be a finite dimensional vector space endowed with an inner product h·, ·i

(6)

and its induced norm k · k, and K a nonempty (but not necessarily convex nor closed) cone in V. We denote the dual of K by K^∗, i.e., K^∗ = {Y ∈ V : hX, Y i ≥ 0 for every X ∈ K}, and the convex hull of K by co K.

For H⁰, Q^k∈ V (k = 0, , 2, . . . , m), H¹ = Pm

k=1Q^k, let F (K) =

X ∈ V

X ∈ K, hH⁰, Xi = 1, hQ^k, Xi = 0 (k = 1, 2, . . . , m) , F₀(K) =

X ∈ V

X ∈ K, hH⁰, Xi = 0, hQ^k, Xi = 0 (k = 1, 2, . . . , m) . In Part I, we introduced the following conditions:

Condition (I) O 6= H⁰ ∈ K^∗, Q^k ∈ K^∗ (k = 1, 2, . . . , m) (copositivity condition).

Condition (II) K is closed and convex.

Condition (III) n

X ∈ F (K) : hQ⁰, Xi ≤ ˜ζo

is nonempty and bounded for some ˜ζ ∈ R.

Condition (IV) hQ⁰, Xi ≥ 0 for every X ∈ F0(K).

For the primal-dual pairs of problems, ζ^p(K) := inf

hQ⁰, Xi | X ∈ F (K)

= inf

hQ⁰, Xi

X ∈ K, hH⁰, Xi = 1,

hQ^k, Xi = 0 (k = 1, 2, . . . , m)

, (6)

ζ^d(K) := sup (

z0

Q⁰+ Xm k=1

Q^kzk− H⁰z0 ∈ K^∗ )

, (7)

η^p(λ, K) := inf

h(Q⁰+ λH¹), Xi X ∈ K, hH⁰, Xi = 1

, (8)

η^d(λ, K) := sup y0

Q⁰+ λH¹− H⁰y0 ∈ K^∗

, (9)

the following results are shown.

Theorem 2.1.

(i) η^d(λ, K)↑λ = ζ^d(K) ≤ ζ^p(K) and η^d(λ, K) ≤ η^p(λ, K)

↑λ ≤ ζ^p(K) under Condition (I).

(ii) η^d(λ, K) = η^p(λ, K)

↑λ = ζ^d(K) ≤ ζ^p(K) under Conditions (I) and (II).

(iii) η^d(λ, K) = η^p(λ, K)

↑λ = ζ^d(K) = ζ^p(K) under Conditions (I), (II) and (III).

(iv) ζ^p(K) = ζ^p(co K) under Conditions (I) and (IV).

Here ↑λ means “increases monotonically as λ → ∞”.

(7)

2.2 Notation and symbols

For the application of the unified framework to POPs, we use the following notation: Let R denote the set of real numbers, R+ the set of nonnegative real numbers, and Z+ the set of nonnegative integers. Let |α| = Pn

i=1αi for each α ∈ Zⁿ₊, where αi denotes the ith element of α ∈ Zⁿ₊. R[x] is the set of real-valued multivariate polynomials in xi ∈ R (i = 1, . . . , n), where x = (x1, . . . , xn) ∈ Rⁿ. Each polynomial f (x) ∈ R[x] is represented as f (x) = P

α∈Ffαx^α, where F ⊂ Zⁿ₊ is a nonempty finite set, fα (α ∈ F ) real coefficients, x^α = x^α₁¹x^α₂²· · · x^α_nⁿ and α = (α1, α2, . . . , αn) ∈ Zⁿ₊. We assume that x⁰_i = 1 even if xi = 0, in particular, x⁰ = 1 for any x ∈ Rⁿ. The support of f (x) is defined by supp(f (x)) = {α ∈ F : fα6= 0} ⊂ Zⁿ₊, and the degree of f (x) ∈ R[x] is defined by deg(f (x)) = max{|α| : α ∈ supp(f (x))}. For each nonempty subsets F and G of Zⁿ₊, let F + G denote their Minkowski sum {α + β : α ∈ F, β ∈ G}, and let R[x, F] = {f (x) ∈ R[x] : supp(f (x)) ⊂ F}.

Let F be a nonempty finite subset of Zⁿ₊. |F | stands for the number of elements of F . Let R^F denote the |F|-dimensional Euclidean space whose coordinate are indexed by α ∈ F. Each vector of R^F with elements wα (α ∈ F ) is denoted by (wα : α ∈ F) or simply (wα : F ). We assume that (wα : F) is a column vector when it is multiplied by a matrix. If x ∈ Rⁿ, x^F = (x^α : F ) denotes the |F|-dimensional (column) vector of monomials x^α ∈ R[x] (α ∈ F). Hence each polynomial f (x) ∈ R[x, F] is represented as f (x) = h(fα : F ), x^Fi. S^F denotes the linear space of |F| × |F | symmetric matrices with elements ξαβ (α ∈ F, β ∈ F). We use the notation ✷F for the set F × F = {(α, β) : α, β ∈ F}.

Each matrix of S^F is written as (ξαβ : (α, β) ∈ ✷F) or simply (ξ^αβ : ✷F). If x ∈ Rⁿ, x^F(x^F)^T = (x^α : F)(x^α : F )^T = (x^α+β : (α, β) ∈ ✷F) is a rank-1 symmetric matrix of monomials x^α+β ∈ R[x] ((α, β) ∈ ✷F), which is denoted by x^✷F. (x^F)^T denotes the row vector obtained by taking the transpose of the column vector x^F.

For every pair of Q = (Qαβ : ✷F) and X = (Xαβ : ✷F) ∈ S^F, hQ, Xi denotes the matrix inner product, i.e., hQ, Xi = trace(Q^TX) = P

(α,β)∈✷FQαβξαβ. With this notation, we often write the quadratic form (x^F)^TQx^F as hQ, x^✷Fi to indicate that x^✷F = x^F(x^F)^T will be replaced by X = (Xαβ : ✷F) ∈ S^F.

Let

S^F₊ = the cone of positive semidefinite matrices in S^F

=

(ξαβ : ✷F) ∈ S^F : (wα : F)^T(ξαβ : ✷F)(w^α: F) ≥ 0 for every (wα: F) ∈ R^F

, L^F =

(ξαβ : ✷F) ∈ S^F : ξαβ = ξγδ if α + β = γ + δ .

Then S^F₊ forms a closed convex cone in S^F, and L^F a linear subspace of S^F. We also see that x^✷F ∈ S^F₊ ∩ L^F for every x ∈ Rⁿ. This relation is used repeatedly in the subsequent discussions.

2.3 Representing polynomials with symmetric matrices of mono- mials and sums of squares of polynomials

For a nonempty finite subset G of Zⁿ₊, a given polynomial f (x) ∈ R[x, G] is usually represented as the inner product of its coefficient vector (fα : G) and the vector x^G = (x^α : G)

(8)

of monomials in the polynomial, i.e., g(x) = h(fα : G), x^Gi. However, representing a polynomial with a symmetric matrix of monomials is more convenient in the subsequent discussions, in particular, when discussing nonnegative polynomials on Rⁿ and Rⁿ₊, and exploiting sparsity in conic and Lagrangian-conic relaxations. For the representation of a polynomial f (x) ∈ R[x, G] using a symmetric matrix of monomials, we need to choose a finite subset F of Zⁿ₊ satisfying the property G ⊂ F +F. In fact, a smaller-sized F satisfying this property is preferable for numerical efficiency. See [19] for details of choosing such an F . See also [4].

Let F be a nonempty subset of Zⁿ₊ and f (x) ∈ R[x, F + F ]. Then we can represent the polynomial f (x) using the rank-1 symmetric matrix x^✷F = x^F(x^F)^T of monomials x^α+β ((α, β) ∈ ✷F) and some Q ∈ S^F such that f (x) = hQ, x^✷Fi. Note that x^✷F contains all monomials x^α (α ∈ F + F), and that the choice of such a Q ∈ S^F is not unique as shown in the following example.

Example 2.1. Consider the polynomial f¹(x) = f¹(x1, x2) in two real variables such that f¹(x) = 1 − 2x1− 2x₂+ x²₁+ x²₂+ 2x²₁x2+ 2x1x²₂+ x²₁x²₂.

Let

G = {(0, 0), (1, 0), (0, 1), (2, 0), (0, 2), (2, 1), (1, 2), (2, 2)}, (f_α¹ : G) = (1, −2, −2, 1, 1, 2, 2, 1),

x^G = (x^α : G) = (1, x₁, x₂, x²₁, x²₂, x²₁x₂, x²₂.x²₁x²₂).

Then R[x, G] ∋ f¹(x) = h(f_α¹ : G), x^Gi. To represent f¹(x) using a symmetric matrix of monomials, we can take F = {(0, 0), (1, 0), (0, 1), (1, 1)} so that

G ⊂ F + F = {(0, 0), (1, 0), (0, 1), (2, 0), (0, 2), (1, 1), (2, 1), (1, 2), (2, 2)}.

Let

Q =







1 −1 −1 0

−1 1 0 1

−1 0 1 1

0 1 1 1





 ∈S^F, P =







0 0 0 −1

0 0 1 0

0 1 0 0

−1 0 0 0





 ∈S^F,

x^✷F =







1 x1 x2 x1x2

x₁ x²₁ x₁x₂ x²₁x₂ x2 x1x2 x²₂ x1x²₂ x1x2 x²₁x2 x1x²₂ x²₁x²₂





 .

Then, R[x, F + F ] ∋ f¹(x) = hQ + µP , x^✷Fi for every µ ∈ R.

Lemma 2.1. Let F be a nonempty finite subset of Zⁿ₊, Q ∈ S^F and P ∈ S^F. Then, R[x, F + F] ∋ hQ, x^✷Fi = hQ + P , x^✷Fi if and only if P ∈ L^F⊥

. Proof. We first show that the linear subspace of S^F generated by

x^✷F : x ∈ Rⁿ , i.e., L=

λx^✷F + µy^✷F : x, y ∈ Rⁿ, λ, µ ∈ R ,

(9)

coincides with L^F. By the definition of L^F, we know that

x^✷F : x ∈ Rⁿ

⊂ L^F, hence L ⊂ L^F. It suffices to show that dim(L) = dim(L^F). Let ℓ = dim(L^F), which is equivalent to the number of different elements in F + F. Let M be the linear subspace of R^ℓ generated by the set {x^{F +F} = (x^γ : γ ∈ F + F)} ⊂ R^ℓ. Then we can identify the linear space L as the linear space M since each matrix X = λx^✷F + µy^✷F ∈ L corresponds to a vector λx^{F +F} + µy^{F +F} ∈ M and vice versa. As a result, dim(L) = dim(M). On the other hand, we see that dim(M) = ℓ since there is no nonzero (gα : F + F ) such that h(gα : F + F ), x^{F +F}i = 0 for all x ∈ Rⁿ. Therefore, we obtain that dim(L) = dim(L^F) = ℓ. Thus we have shown that L^F = L. Now assume that R[x, F + F] ∋ hQ, x^✷Fi = hQ + P , x^✷Fi.

Then hP , Xi = 0 for all X ∈ L = L^F, which implies that P ∈ L^F⊥

. Conversely if P ∈ L^F⊥

, then R[x, F + F] ∋ hQ, x^✷Fi = hQ + P , x^✷Fi holds from x^✷F ∈ L^F for every x∈ Rⁿ.

Lemma 2.1 implies that, for each Q ∈ S^F, {Q + P : P ∈ L^F⊥

} forms an equivalent class in S^F represented by the common polynomial f (x) = hQ, x^✷Fi.

We introduce some additional notation for SOS of polynomials. Let

SOS[x, F] = ( _r

X

i=1

(ϕⁱ(x))² : ϕⁱ(x) ∈ R[x, F] (i = 1, . . . , r) ∃r ∈ Z+

)

for every F ⊂ Zⁿ₊, SOS[x] = SOS[x, Zⁿ₊].

We call SOS[x] the cone of SOS of polynomials, and each f (x) ∈ SOS[x] an SOS polynomial.

The following lemma provides a characterization of SOS polynomials.

Lemma 2.2. [10] Let F be a nonempty finite subset of Zⁿ₊. Then SOS[x, F] =

hQ, x^✷Fi : Q ∈ S^F₊ .

In Example 2.1, the matrix Q ∈ S^F itself is not positive semidefinite. But if we choose µ = 1, then the matrix Q¹ = Q + µP ∈ S^F is positive semidefinite. Hence f¹(x) = hQ¹, x^✷Fi ∈ SOS[x, F] by the lemma above. In fact, we see that

SOS[x, F] ∋ f¹(x) = hQ¹, x^✷Fi = (x1+ x2+ x1x2− 1)², (10) where

F = {(0, 0), (1, 0), (0, 1), (1, 1)}, Q¹ =







1 −1 −1 −1

−1 1 1 1





 ∈S^F₊.

The following lemma will be used in Sections 3 and 4.

Lemma 2.3. Let F be a nonempty finite subset of Zⁿ₊ and M a linear subspace of S^F. Then, (S^F₊∩ M) + ( L^F⊥

∩ M) is closed.

(10)

Proof. We first show that the set {X ∈ S^F : X + Y ∈ B, X ∈ S^F₊, Y ∈ L^F⊥

} is bounded for every bounded subset B of S^F. Let q = |F|. Since the set of monomials x^α (α ∈ F ) is independent, i.e., there is no nonzero (gα : F ) such that (gα : F )^T(x^α : F ) is identically zero for all x ∈ Rⁿ, there exist xj ∈ Rⁿ (j = 1, . . . , q) such that the q × q matrix A= (x^α₁ : F), (x^α₂ : F ), . . . , (x^α_q : F )

is nonsingular. For some bounded B of S^F, assume on the contrary that there exists a sequence {Z^p = X^p+ Y^p : p = 1, 2, . . . } satisfying Z^p = X^p + Y^p ∈ B, X^p ∈ S^F₊, Y^p ∈ L^F⊥

(p = 1, 2, . . . ) and kX^pk → ∞ as p → ∞. We may assume without loss of generality that S^F₊ ∋ X^p/kX^pk converges to some nonzero X ∈ S^F₊ as p → ∞. Hence L^F⊥

∋ Y^p/kX^pk = Z^p/kX^pk − X^p/kX^pk converges to

−X as p → ∞. Since L^F⊥

is a linear subspace of S^F, we obtain that X ∈ S^F₊∩ L^F⊥

. Recall that x^F_j = (x^α_j : F)(x^α_j : F )^T ∈ L^F. Hence hAA^T, Xi = hPq

j=1(x^α_j : F )(x^α_j : F )^T, Xi =Pq

j=1h(x^α_j : F)(x^α_j : F )^T, Xi = 0. Since AA^T is a q × q positive definite matrix and X ∈ S^F₊, the identity hAA^T, Xi = 0 implies that X = O. This is a contradiction.

Thus we have shown that {X ∈ S^F : X + Y ∈ B, X ∈ S^F₊, Y ∈ L^F⊥

} is bounded for every bounded subset B of S^F.

Now, we show that (S^F₊∩ M) + ( L^F⊥

∩ M) is closed. Suppose that Z^p = X^p+ Y^p, X^p ∈ S^F₊∩ M, Y^p ∈ L^F⊥

∩ M (p = 1, 2, . . . ) and Z^p → Z for some Z ∈ S^F as p → ∞.

Since the sequence {Z^p = X^p+ Y^p : p = 1, 2, . . . , } is bounded and X^p ∈ S^F₊, Y^p ∈ L^F⊥

(p = 1, 2, . . . ), the sequence {X^p : p = 1, 2, . . . } is bounded. Hence we may assume that it converges to some X ∈ S^F. It follows that Y^p = Z^p−X^p → Y for some Y ∈ S^F as p → ∞.

Since both S^F₊∩ M and L^F⊥

∩ M are closed subsets of S^F, we know that X ∈ S^F₊∩ M and Y ∈ L^F⊥

∩M. Therefore, we have shown that Z = X +Y ∈ (S^F₊∩M)+( L^F⊥

∩M).

3 A class of polynomial optimization problems and their covexification

Consider a class of POPs of the form (1). Assume that J is a nonempty closed (but not necessarily convex) cone in Rⁿ, and f^k(x) ∈ R[x, F + F] (k = 0, 1, . . . , m) for a nonempty finite subset F of Zⁿ₊ including 0 ∈ Zⁿ₊. For practical applications, we focus on Rⁿ, Rⁿ₊ and R^ℓ× R^n−ℓ with 1 ≤ ℓ ≤ n − 1 for the cone J, but the theoretical results in this section are valid for any closed cone in Rⁿ.

We transform POP (1) into the COP of the form (6) to present the moment cone (MC) relaxation of POPs in the unified framework of Section 2, Part I [5]. The MC relaxation was proposed by [4] as an extension of the completely positive (CPP) relaxation [2, 8] for a class of linearly constrained QOPs in continuous and binary variables. We also recall that the CPP relaxation for the class of QOPs was discussed in Section 4 of [16].

Let us take S^F for the underlying linear space V, and represent each polynomial f^k(x) ∈ R[x, F +F] as f^k(x) = hQ^k, x^✷Fi (k = 0, 1, . . . , m), where Q^k ∈ S^F. Let ∆^F₁ =

x^✷F ∈ S^F : x ∈ J} . Then, POP (1) can be rewritten as

ζ^∗ := inf

hQ⁰, Xi

X ∈ ∆^F₁ , hQ^k, Xi = 0 (k = 1, 2, . . . , m)

. (11)

We consider the following illustrative example:

(11)

Example 3.1. As in Example 2.1, we take n = 2 and F = {(0, 0), (1, 0), (0, 1), (1, 1)}. Let m = 1, R[x, F + F ] ∋ f⁰(x) = hQ⁰, x^✷Fi for some Q⁰ ∈ S^F, and let f¹(x) ∈ R[x, F + F]

be an SOS polynomial given in (10). Let J = R²₊. In this case, it is obvious that the feasible region of POP (14) is bounded and contains two points x = (1, 0) and x = (0, 1), thus, its finite minimum value ζ^∗ is attained at some feasible solution. By definition, we see that

∆^F₁ = {x^✷F ∈ S^F : x ∈ R²₊}.

The problem (11) is in a form similar to COP (6), but we still need to embed ∆^F₁ in a cone K ⊂ S^F and introduce an inhomogeneous equality constraint hH⁰, Xi = 1 such that

∆^F₁ = {X ∈ K : hH⁰, Xi = 1}. This can be achieved by two methods. The first method is to take the conic hull of ∆^F₁ such that

∆^F =

µX ∈ S^F : µ ≥ 0, X ∈ ∆^F₁

=

µx^✷F ∈ S^F : µ ≥ 0, x ∈ J . The second method is to homogenize ∆^F₁ such that

Γ^F = n

(x^τ−|α|₀ x^α : F )(x^τ−|α|₀ x^α : F )^T ∈ S^F : (x0, x) ∈ R+× Jo , where τ = max{|α| : α ∈ F }. We note that, for x₀ = 0 and x ∈ Rⁿ,

x^τ−|α|₀ x^α =

0 if τ > |α|

x^α otherwise, i.e., if τ = |α|.

Both ∆^F and Γ^F are cones in S^F. The first construction of the cone ∆^F was (implicitly) employed in [24], while the second Γ^F in [4].

Let H⁰ be a matrix in S^F with 1 in the (0, 0)th element and 0 elsewhere. Then, we see that hH⁰, x^✷Fi = hH⁰, (x^α : F)(x^α : F )^Ti = x⁰x⁰ = 1 for every x ∈ Rⁿ, and that hH⁰, Xi = X⁰⁰= x^2τ₀ for every X = (x^τ−|α|₀ x^α : F)(x^τ−|α|₀ x^α : F)^T ∈ Γ^F. It follows that

X ∈ ∆^F : hH⁰, Xi = 1

=

µx^✷F ∈ S^F : x ∈ J, µ ≥ 0, hH⁰, µx^✷Fi = 1

= ∆^F₁,

X ∈ Γ^F : hH⁰, Xi = 1

=n

(x^τ−|α|₀ x^α: F)(x^τ−|α|₀ x^α: F )^T : (x0, x) ∈ R+× J, x0 = 1o

= ∆^F₁.

(12)

Therefore, both COP (6) with K = ∆^F and COP (6) with K = Γ^F are equivalent to POP (11), and ζ^p(∆^F) = ζ^p(Γ^F) = ζ^∗.

For Example 3.1, we see that

∆^F =















µ µx1 µx2 µx1x2

µx₁ µx²₁ µx₁x₂ µx²₁x₂ µx2 µx1x2 µx²₂ µx1x²₂ µx1x2 µx²₁x2 µx1x²₂ µx²₁x²₂





 ∈S^F₊ : (µ, x₁, x₂) ∈ R³₊







 ,

Γ^F =















x⁴₀ x³₀x1 x³₀x2 x²₀x1x2

x³₀x1 x²₀x²₁ x²₀x1x2 x0x²₁x2

x³₀x₂ x²₀x₁x₂ x²₀x²₂ x₀x₁x²₂ x²₀x1x2 x0x²₁x2 x0x1x²₂ x²₁x²₂





 ∈S^F₊ : (x0, x1, x2) ∈ R³₊









. (13)

(12)

Obviously, for every x1 > 0 and every x2 > 0, the matrix







0 0 0 0

0 0 0 x²₁x²₂







is contained in Γ^F but not in ∆^F.

Now, we are ready to apply all the discussions in Section 3 of Part I [5]. In particular, the following theorem is obtained directly from Theorem 3.1 of [5].

Theorem 3.1. Suppose that Condition (I) holds for K = Γ^F. Then, (i) ζ^p(co ∆^F) = ζ^∗;

(ii) Assume that ζ^∗ is finite, we have that ζ^p(co Γ^F) =

ζ^∗ if Condition (IV) holds for K = Γ^F,

−∞ otherwise.

Next we investigate Conditions (I) and (IV) in detail for K = Γ^F, and in particular, J = Rⁿ and J = Rⁿ₊. First, suppose that J = Rⁿ. Then it follows that Γ^F ⊂ S^F₊∩ L^F, and Γ^F∗

⊃ cl

S^F₊ + L^F⊥

= S^F₊ + L^F⊥

. (See Lemma 2.3 for the last equality).

Therefore, if Q^k ∈ S^F₊+ L^F⊥

, or, equivalently, if f^k(x) = hQ^k, x^✷Fi is a sum of squares of polynomials (k = 1, 2, . . . , m) (see Lemma 2.1 and 2.2), then Condition (I) is satisfied for K = Γ^F. (Recall that O 6= H⁰ ∈ S^F₊ by definition). If a polynomial equation g(x) = 0 is given, it is equivalent to have (g(x))² = 0. Thus, Condition (I) for K = Γ^F is not a strong assumption.

Now suppose that J = Rⁿ₊. Then,

Γ^F ⊂ (C^F)^∗∩ L^F ⊂ S^F₊∩ N^F ∩ L^F, and

Γ^F∗

⊃ cl C^F + (L^F)^⊥

⊃ cl S^F₊+ N^F + (L^F)^⊥ , where

C^F =

Y ∈ S^F : (ξα : F )^TY(ξα : F ) ≥ 0 for every (ξα : F ) ≥ 0

(the copositive cone), (C^F)^∗ =

X ∈ S^F : hX, Y i ≥ 0 for every Y ∈ C^F

= co

(ξα: F)(ξα : F )^T ∈ S^F : (ξα : F) ≥ 0 (the completely positive cone),

N^F =

X ∈ S^F : Xαβ ≥ 0 ((α, β) ∈ ✷F) (the cone of nonnegative matrices).

(13)

Therefore, if Q^k ∈ S^F₊ + N^F + (L^F)^⊥ (or, less restrictively, if Q^k ∈ C^F + (L^F)^⊥) (k = 1, 2, . . . , m), then Condition (I) is satisfied for K = Γ^F.

Now we focus on Condition (IV) with K = Γ^F. Recall that τ = max{|α| : α ∈ F }. Let F = {α ∈ F : |α| = τ }. Suppose that

X = (x^τ−|α|₀ x^α : F)(x^τ−|α|₀ x^α : F )^T ∈ F0(Γ^F) Then,

Xαβ =

x^α+β if α, β ∈ F , 0 otherwise.

As a result, if we define Q^k = (Q^k_αβ : ✷F) ∈ S^F and ¯f^k(x) ∈ R[x, F + F] such that Q^k_αβ =

Q^k_αβ if α, β ∈ F

0 otherwise and ¯f^k(x) = hQ^k, x^✷Fi (k = 0, 1, . . . , m), we can rewrite Condition (IV) for K = Γ^F as

f¯⁰(x) ≥ 0 if x ∈ J and ¯f^k(x) = 0 (k = 1, 2, . . . , m).

Usual cases where the condition above holds are:

(a) Q⁰ = O ∈ S^F or ¯f⁰(x) is an identically zero polynomial, i.e., deg(f⁰(x)) < 2τ . (b)

x∈ J : ¯f^k(x) = 0 (k = 1, 2, . . . , m)

= {0}.

We note that (a) can be always satisfied by choosing a nonempty finite subset F of Zⁿ₊such that f^k(x) ∈ R[x, F + F] (k = 0, 1, 2, . . . , m) and deg(f⁰(x)) < 2τ , and that (b) implies that the feasible region of POP (1) is bounded.

For Example 3.1, we observe that

F0(Γ^F) =







 X =







0 0 0 0

0 0 0 x²₁x²₂





 ∈S^F₊ : (x1, x2) ∈ R²₊,

hQ¹, Xi = Q¹_(1,1)(1,1)X_(1,1)(1,1) = 0









= {O},

where Q¹ is given as in (10). Thus, Condition (IV) is satisfied for K = Γ^F.

If the cone co Γ is closed, i.e., Condition (II) is satisfied for K = co Γ in addition to Condition (I), then we can introduce the primal-dual pair of Lagrangian-conic relaxation problems (8) and (9) for K = co Γ, and apply the discussions given in Section 3 of Part I [4]. In particular, the relation η^d(λ, K) = η^p(λ, K)

↑λ = ζ^d(K) ≤ ζ^p(K) follows; see Theorem 2.1. The cone Γ^F as well as its convex hull co Γ^F, however, are not necessarily closed. In fact, Γ^F given in (13) is not closed. To see this, let

X =







0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1





 .

Then X 6∈ Γ^F, but the matrix (x^τ−|α|₀ x^α : α ∈ F )(x^τ−|α|₀ x^α : α ∈ F)^T ∈ Γ with (x0, x1, x2) = (ǫ, ǫ, 1/ǫ) and ǫ > 0 converges to X.

(14)

Lemma 3.1. Assume that 0 ∈ F and τ ei ∈ F (i = 1, 2, . . . , n), where ei denotes the i-th unit coordinate vector in Rⁿ. Then, Γ^F and co Γ^F are closed.

Proof. Let {X^p : p = 1, 2, . . . } be a sequence in Γ^F converging to X ∈ S^F. By X^p ∈ Γ^F, there exists (x^p₀, x^p) ∈ R+×J such that X_αβ^p = (x^p₀)^τ−|α|(x^p)^α(x^p₀)^τ−|β|(x^p)^β, which converges to Xαβ as p → ∞ for ((α, β) ∈ ✷F). Specifically, (x^p0)^τ−|α|(x^p)^α(x^p₀)^τ−|α|(x^p)^α converges to Xαα for α = 0 ∈ F and α = τ ei ∈ F (i = 1, 2, . . . , n). Thus, (x^p₀)^2τ and (x^p_i)^2τ (i = 1, 2, . . . , n) converge to X00 and X_(τeⁱ)(τeⁱ) (i = 1, 2, . . . , n), respectively. This implies that the sequence {(x^p₀, x^p) : p = 1, 2, . . . } is bounded, and we can take a subsequence which converges to (¯x0, ¯x) ∈ R+× J. Therefore,

X = ((¯x0)^τ−|α|(¯x)^α : F )((¯x0)^τ−|α|(¯x)^α : F)^T ∈ Γ^F,

and we have shown that Γ^F is closed. The closedness of co Γ^F follows from Lemma 3.1 of [4].

Without loss of generality, the assumption of the previous lemma can be satisfied by adding 0 ∈ Zⁿ₊ and τ ei ∈ Zⁿ₊ (i = 1, 2, . . . , n) to F if necessary. For Example 3.1, one can add (2, 0) and (0, 2) to F to satisfy the assumption.

The MC relaxation problem proposed in [4] as an extension of the CPP relaxation for QOPs to POPs is essentially equivalent to COP (6) with K = co Γ^F. A hierarchy of copositivity conditions assumed there is weaker than Condition (I) and can be regarded as a generalization of Condition (I). We have assumed a stronger condition here, Condition (I), to consistently derive the Lagrangian-conic relaxation (8) in the unified framework. On the other hand, the additional condition on zeros at infinity assumed in [4], which was also assumed in [24] for a canonical convexification procedure for a class of POPs, is stronger than Condition (IV). See Condition (IV)’ and Lemma 3.1 of [5].

If F =

α∈ Zⁿ₊ : |α| ≤ 1

, then POP (1) becomes a QOP. In this case, L^F = S^F and (L^F)^⊥ = {0}, and the previous discussions correspond to Section 4.2 of Part I [5], where the convexification of a linearly constrained QOP with complementarity condition was discussed.

4 A hierarchy of sparse Lagrangian-SDP relaxations for POPs

The primal COP (6) and its Lagrangian-conic relaxation (8) have been mainly used when applying the unified framework of Part I [5] to QOPs and POPs, and their duals. As a result of this application, the problem of computing tight lower bounds for the optimal values of general POPs is reduced to the dual COP (7).

In this section, we propose a hierarchy of Lagrangian-SDP relaxations for POPs by combining the approach in [3, 5, 16] for deriving the Lagrangian-CPP and Lagrangian-DNN relaxations for a class of QOPs with the hierarchy of SDP relaxations proposed by [21]

for POPs. The motivation for combining these two approaches, which have been studied almost independently, is to develop efficient and effective numerical methods for POPs. We simultaneously take account the important issue of exploiting sparsity [17, 22, 28] in the proposed hierarchy of Largangian-SDP relaxations for POPs. As a result, the description

(15)

may be slightly complicated, but exploiting sparsity is essential for numerical efficiency of solving large-scale POPs.

4.1 A class of equality constrained POPs with a structured spar- sity

Let us fix J = Rⁿ in POP (1) throughout this section. We deal with a class of equality constrained POP of the form:

ζ^∗ = inf

f⁰(x)

x ∈ Rⁿ, fⁱ(x) = 0 (i = 1, 2, . . . , m0)

. (14)

As mentioned in Section 1, we can include inequality constraints since a polynomial inequality g(x) ≥ 0 can be rewritten as g(x) − v² = 0 with a slack variable v ∈ R. We assume that POP (14) is sparse. More precisely, each equality constraint f^k(x) = 0 involves only a small subset of the variables x1, x2, . . . , xn and the objective polynomial f⁰(x) consists of monomials whose variables include only a small number of pairs of xi and xj. Under this assumption, a structured sparsity [17, 22, 28] can be embed in POP (14). Let

C^k ⊂ {1, . . . , n}, Ck 6= ∅, (k = 1, . . . , m), [m k=1

C^k = {1, . . . , n}, Aτ =

α∈ Zⁿ₊ : |α| ≤ τ

(τ ∈ Z+), A^k_τ =

α∈ Aτ : αi = 0 if i 6∈ C^k

(τ ∈ Z+, k = 1, . . . , m), A^k_∞ = [

τ∈Z+

A^k_τ =

α∈ Zⁿ₊: αi = 0 if i 6∈ Ck

.

Then, the sparsity structure of POP (14) is described as f⁰(x) ∈

Xm k=1

R[x, A^k_∞] and fⁱ(x) ∈ [m k=1

R[x, A^k_∞] (i = 1, 2, . . . , m0).

From the second inclusion relation, we can partition the index set {1, 2, . . . , m0} of equality constraints in (14) into m sets I^k (k = 1, 2, . . . , m) such that

fⁱ(x) ∈ R[x, A^k_∞] for every i ∈ I^k. Now, we rewrite POP (14) as

ζ^∗ = inf

f⁰(x)

x ∈ Rⁿ, fⁱ(x) = 0 (i ∈ I^k, k = 1, 2, . . . , m)

. (15)

As an illustrative example, we consider the same problem as Example 3.1.

Example 4.1. Let

n = 4, m0 = 3,

F = {0, e1, e2, e1+ e2, 2e3, 2e4} ⊂ Z⁴₊,

where ei ∈ Z⁴₊ denotes the ith unit coordinate vector, f⁰(x) ∈ R[x, F⁰+ F⁰], where F0 = {0, e1, e2, e1+ e2},

f¹(x) = x1− x²₃, f²(x) = x2 − x²₄, f³(x) = x1+ x2 + x1x2− 1.

(16)

Then, the POP in Example 3.1 can be rewritten as minimize

f⁰(x)

f¹(x) = x1− x²₃ = 0, f²(x) = x2− x²₄ = 0, f³(x) = x1 + x2+ x1x2− 1 = 0

.

Define I^k= {k} (k = 1, 2, 3), C¹ = {1, 3}, C² = {2, 4} and C³ = {1, 2}. Then, the problem above can be expressed as in (15).

We impose the following two conditions on POP (15) in the subsequent discussions.

Condition (A)

(a) Each C^k (k = 1, 2, . . . , m) is a nonempty maximal subset among the family {C¹, C², . . . , C^m}, i.e., there is no C^s (s 6= k) that includes C^k.

(b) For every k ∈ {1, . . . , m − 1}, there is an s ≥ k + 1 such that C^k∩ C^k+1∪ · · · ∪ C^m

⊂ C^s and 6= C^s.

Condition (B) For each k = 1, 2, . . . , m, there is an i ∈ IX k such that fⁱ(x) = ρk −

i∈C^k

x²_i for some ρk> 0.

Let G(N, E) be an undirected graph with the node set N = {1, 2, . . . , n} and the edge set E = {(i, j) ∈ N × N : i 6= j and (i, j) ∈ C^k for some k = 1, 2, . . . , m},

where each edge (i, j) ∈ E is identified with (j, i) ∈ E. It is known that Condition (A) provides a necessary and sufficient condition for G(N, E) to be a chordal graph and for C¹, C², . . . , C^m to be its maximal cliques. The property (b) of Condition (A) is called the running intersection property. See [7] for the definition of a chordal graph and its properties.

Condition (B) implies that the feasible region of POP (15) is bounded. Conversely, if the feasible region is bounded, we can add the constraints

ρk− X

i∈C^k

x²_i − x²_n+k = 0 (k = 1, 2, . . . , m)

for some ρk > 0, and replace x by (x, xn+1, . . . , xn+m) and C^k by C^k ∪ {n + k} (k = 1, 2, . . . , m), where xn+k denotes a slack variable (k = 1, 2, . . . , m). See [17, 22, 28] for more details on how the structured sparsity can be constructed from a given POP. Condition (B) can be weakened if Theorem 1 of [20] is used instead of Corollary 3.3 of [22] in the proof of Lemma 4.1. But its description is slightly complicated, so we prefer to use Condition (B).

We can easily verify that C¹, C² and C³ constructed in Example 4.1 satisfy Condition (A). In this case, the corresponding graph G(N, E) does not have any cycle, which is a trivial chordal graph. Although Example 4.1 can be modified to satisfy Condition (B), we use Example 4.1 for ease of discussion.