Research Reports on Mathematical and Computing Sciences
Department of
Mathematical and Computing Sciences
Tokyo Institute of Technology
Operations Research
ISSN 1342-2804
Lagrangian-Conic Relaxations, Part II:
Applications to
Polynomial Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh
January 2014, B–476
B-476
Lagrangian-Conic Relaxations, Part II: Applications to Polyno- mial Optimization Problems
Naohiko Arima⋆, Sunyoung Kim†, Masakazu Kojima‡, and Kim-Chuan Toh♯ January 2014
Abstract. We present the moment cone (MC) relaxation and a hierarchy of sparse Lagrangian- SDP relaxations of polynomial optimization problems (POPs) using the unified framework established in Part I. The MC relaxation is derived for a POP of minimizing a polynomial subject to a nonconvex cone constraint and polynomial equality constraints. It is an ex- tension of the completely positive programming relaxation for QOPs. Under a copositivity condition, we characterize the equivalence of the optimal values between the POP and its MC relaxation. A hierarchy of sparse Lagrangian-SDP relaxations, which is parameterized by a positive integer ω called the relaxation order, is proposed for an equality constrained POP. It is obtained by combining a sparse variant of Lasserre’s hierarchy of SDP relaxation of POPs and the basic idea behind the conic and Lagrangian-conic relaxations from the unified framework. We prove under a certain assumption that the optimal value of the Lagrangian-SDP relaxation with the Lagrangian multiplier λ and the relaxation order ω in the hierarchy converges to that of the POP as λ → ∞ and ω → ∞. The hierarchy of sparse Lagrangian-SDP relaxations is designed to be used in combination with the bisection and 1-dimensional Newton methods, which was proposed in Part I, for solving large-scale POPs efficiently and effectively.
Keywords. Polynomial optimization problem, moment cone relaxation, SOS relaxation, a hierarchy of the Lagrangian-SDP relaxations, exploiting sparsity.
AMS Classification. 90C22, 90C25, 90C26.
⋆ Research and Development Initiative & JST CREST, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo 112-8551 Japan. The research of this author was par- tially supported by the Japan Science and Technology Agency (JST), the Core Research of Evolutionary Science and Technology (CREST) Research Project.
(nao [email protected]).
† Department of Mathematics, Ewha W. University, 52 Ewhayeodae-gil, Sudaemoon- gu, Seoul 120-750 Korea. The research of this author was supported by NRF 2012- R1A1A2-038982. ([email protected]).
‡ Research and Development Initiative & JST CREST, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo 112-8551 Japan. The research of this author was par- tially supported by the Japan Science and Technology Agency (JST), the Core Research of Evolutionary Science and Technology (CREST) Research Project.
♯ Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076. Research supported in part by Ministry of Ed- ucation Academic Research Fund R-146-000-168-112 ([email protected]).
1 Introduction
A unified framework for conic and Lagrangian-conic relaxations in Part I [5] was proposed for quadratic optimization problems (QOPs) and polynomial optimization problems (POPs).
From a theoretical viewpoint, this framework is intended to unify and generalize the existing results on the completely positive (CPP) relaxation of QOPs [2, 8, 9, 16] and its extension to POPs, which was called the moment cone (MC) relaxation in [4]. From a practical viewpoint, the Lagrangian-conic relaxation is proposed to solve large-scale conic optimization problems (COPs) obtained from effective conic relaxations, including doubly nonnegative (DNN) and semidefinite programming (SDP) relaxations of QOPs and POPs, by first-order algorithms.
In particular, the CPP relaxation and the sparse DNN relaxation for QOPs in Part I were discussed from these viewpoints.
We consider a general class of POPs of the following form:
ζ∗ = inf
f0(x)
x ∈ J, fk(x) = 0 (k = 1, 2, . . . , m)
, (1)
where J denotes a closed (but not necessarily convex) cone in the n-dimensional Eu- clidean space Rn, and fk(x) a real valued polynomial in x = (x1, x2, . . . , xn) ∈ Rn (k = 0, 1, 2, . . . , m).
The first purpose of this paper is to derive the moment cone (MC) relaxation for POP (1). The MC relaxation introduced by Arima, Kim and Kojima [4] for this form of POP under a hierarchy of copositivity conditions is an extension of the CPP relaxation. The main emphasis of our discussion here is on a unified treatment of the CPP relaxation of QOPs and their MC relaxation of POPs. More precisely, we first convert POP (1) into an equivalent COP which serves as the starting optimization model in the framework. Then, we can apply the general results established on the COP in Part I [5] to derive not only the MC relaxation but also the Lagrange-MC relaxation of (1).
Let V be a finite dimensional vector space endowed with an inner product h·, ·i, and K ⊂ V a (not necessarily convex) cone. Let H0 ∈ V and Qk ∈ V (k = 0, 1, 2, . . . , m). The primal COP is of the form:
COP(K): ζp(K) = inf
hQ0, Xi
X ∈ K, hH0, Xi = 1,
hQk, Xi = 0 (k = 1, 2, . . . , m)
. (2)
As a basic assumption, the following copsitivity condition (Condition (I)) is assumed: O 6=
H0 ∈ K∗ and Qk ∈ K∗ (k = 1, 2, . . . , m), where K∗ = {Y ∈ V : hX, Y i ≥ 0 for every X ∈ K} (the dual of K). The copositivity condition assumed here is stronger than the hierar- chy of copositivity conditions of [4]. The stronger condition is necessary for deriving the Lagrangian-conic relaxation consistently.
For the application of the framework to POP (1), we construct a nonconvex cone Γ in the space V of symmetric matrices with an appropriate dimension, and symmetric matrices H0, Qk∈ V (k = 0, 1, 2, . . . , m) so that COP(Γ) in (2) represents POP (1), i.e., ζ∗ = ζp(Γ).
Then, the MC relaxation of POP (1) is derived as its convexification, i.e., “the strongest convex relaxation” of COP(Γ) in (2) by replacing Γ with its convex hull, co Γ. For the equivalence between COP(Γ) in (2) and its convexification COP(co Γ), or for the identity ζp(Γ) = ζp(co Γ), we assume the copositivitiy condition for K = Γ. This condition is satisfied for any cone J ∈ Rnif each polynomial fk(x) is a sum of squares polynomials, thus,
if fk(x) is replaced by fk(x)2(k = 1, 2, . . . , m) in POP (1). Under the copositivity condition, we provide a condition (Condition (IV)) that characterizes the equivalence between POP (1) and its MC relaxation based on Theorem 3.1 of Part I [5].
The second purpose of applying the unified framework to POPs is to establish a theo- retical foundation for effective, efficient and stable numerical methods, which can be imple- mented with first-order algorithms, for solving large-scale POPs. The MC relaxation, i.e., COP(co Γ) in (2), can be regarded as the most effective relaxation in terms of the quality of the lower bound provided for the optimal value of POP (1). It is, however, numerically intractable even when J is a simple closed convex cone such as Rn or Rn+ (the nonnegative orthant of Rn) in (1). As a numerically tractable relaxation of COP(co Γ) in (2) for J = Rn+, the doubly nonnegative (DNN) relaxation obtained by choosing a DNN cone for K in COP (2) and the Lagrangian-DNN relaxation of POP (1) can be considered. In the recent paper [16], the Lagrangian-DNN relaxation for a class of QOPs with complementarity constraints was implemented with first-order algorithms. It was shown through numerical results on some well-known test problems for nonconvex QOPs that the Lagrangian-DNN relaxation combined with first-order algorithms provided tight lower bounds for their optimal values efficiently and stably. See also Section 5 of Part I [5]. Those results indicate that applying the methods in [16] to POP (1) with J = Rn+ is an important subject that needs further investigation.
For an alternative to the MC relaxation, a sparse variant [17, 20, 22, 28] of the hierarchy of SDP relaxations proposed by Lasserre [21] can be used for an equality and inequality constrained POP. The hierarchy is parameterized by a positive integer ω called the relaxation order. This relaxation method is numerically tractable and as effective as the MC relaxation, in the sense that under a certain assumption, the optimal value of the SDP relaxation with the order ω converges to the optimal value of the given POP as ω → ∞. This is a theoretically strong result, but solving the SDP relaxations of increasing size with ω is known to be numerically hard. In fact, SDP solvers [12, 26, 27] based on primal-dual interior-point methods have difficulties in solving SDP relaxation problems with the dimension larger than several thousands.
We propose a hierarchy of sparse Lagrangian-SDP relaxations for an equality constrained POP of the form (1) with J = Rn, by combining the hierarchy of sparse SDP relaxations and the Lagrangian-conic relaxation in the unified framework. Notice that inequality con- straints can also be dealt with in the form (1) since any polynomial inequality g(x) ≥ 0 is equivalent to the equality g(x) − v2 = 0 with a slack variable v ∈ R. Assuming that the polynomials fk(x) (k = 0, 1, . . . , m) are sparse, we first embed the structured sparsity char- acterized by a chordal graph as in [17, 20, 22, 28], and derive a sparse sum of squares (SOS) problem equivalent to POP (1) with J = Rn by applying a sparse variant (Corollary 3.3 of [22], Theorem 1 of [20]) of Putinar’s lemma (Lemma 4.1 of [25]) for representing positive polynomials by SOS polynomials. We then transform the SOS problem into a simpler SOS problem with a structure that leads to the copositivity condition in the unified framework.
The transformed SOS problem is still not numerically solvable because SOS polynomials with any degree can be used in the problem. By restricting the degrees of SOS polynomials by the relaxation order ω, we construct a hierarchy of numerically solvable SOS problems from the transformed SOS problem. Finally, we convert the hierarchy of SOS problems into the hierarchy of SDPs in the linear space Vω of symmetric matrices with the dimension determined by the maximum degree of the polynomials fk(x) (k = 0, 1, 2, . . . , m) and ω.
Each SDP with the relaxation order ω in the hierarchy is of the form:
ζωd(Kω) = sup (
y0
Q0ω− H0ωy0+ Xm k=1
Qkωyk ∈ K∗ω )
(3) for some cone Kω ⊂ Vω, symmetric matrices H0ω, Qkω ∈ Vω (k = 0, 1, 2, . . . , m). This problem corresponds to the dual of (2). It is constructed so that it satisfies the copositivity condition for K = Kω. Thus, the entire theory of the unified framework can be applied.
In particular, we can derive the following primal-dual pair of Lagrangian-SDP relaxation problems:
ζωp(λ, Kω) = inf
hQ0ω+ λH1ω, Xi
X ∈ Kω, hH0ω, Xi = 1
, (4)
ζωd(λ, Kω) = sup y0
y0 ∈ R, Q0ω − H0ωy0+ λH1ω ∈ K∗ω
, (5)
where H1ω = Pm
k=1Qkω, and λ ∈ R denotes a Lagrangian multiplier (or parameter) pre- scribed for the problems. These problems are very simple so that efficient first-order algo- rithms can be designed to solve the problems. In fact, the dual problem (5) involves only one variable, which makes it possible to effectively utilize the bisection and 1-dimensional Newton methods proposed in Part I [5]; see Also [16]. They also inherit the structured sparsity characterized by a chordal graph from the one embedded in POP (1) with J = Rn, for example, K∗ is described as the Minkovski sum of positive semidefinite cones of small dimensions and linear subspaces of Vω. Moreover, the following theoretical results will be established: for every relaxation order ω and Lagrange multiplier λ,
• the optimal value ζωd(Kω) of (3) bounds the optimal value ζ∗ of POP (1) with J = Rn from below, and it monotonically converges to ζ∗ as ω → ∞,
• the optimal value ζωd(λ, Kω) of (5) bounds the optimal value ζωd(Kω) of (3) from below, and it monotonically converges to ζωd(Kω) as λ → ∞,
• the primal problem (4) is strictly feasible (i.e., there exists a primal feasible solution that lies in the relative interior of the cone Kω) and it has an optimal solution with the optimal value ζωd(λ, Kω) = ζωp(λ, Kω) (the strong duality).
The first two results imply that the lower bound ζωd(λ, Kω) for the optimal value ζ∗ of POP (1) satisfies ζ∗− ǫ < ζωd(λ, Kω) for any ǫ > 0 if sufficiently large ω and λ are taken. The last result contributes to the numerical stability of first-order algorithms.
In section 2, we review the results shown in Part I [5] and describe the notation and symbols used in this paper. We also present how to represent polynomials with symmetric matrices of monomials, and introduce SOS of polynomials. Section 3 includes the discussion on the MC relaxation of POP (1), and section 4 presents the hierarchy of sparse Lagrangian- SDP relaxations of POP (1) with J = Rn.
2 Preliminaries
2.1 Conic and Lagrangian-conic optimization problems
The results in Part I [5] are summarized in this subsection. We first list some notation used in Part I. Let V be a finite dimensional vector space endowed with an inner product h·, ·i
and its induced norm k · k, and K a nonempty (but not necessarily convex nor closed) cone in V. We denote the dual of K by K∗, i.e., K∗ = {Y ∈ V : hX, Y i ≥ 0 for every X ∈ K}, and the convex hull of K by co K.
For H0, Qk∈ V (k = 0, , 2, . . . , m), H1 = Pm
k=1Qk, let F (K) =
X ∈ V
X ∈ K, hH0, Xi = 1, hQk, Xi = 0 (k = 1, 2, . . . , m) , F0(K) =
X ∈ V
X ∈ K, hH0, Xi = 0, hQk, Xi = 0 (k = 1, 2, . . . , m) . In Part I, we introduced the following conditions:
Condition (I) O 6= H0 ∈ K∗, Qk ∈ K∗ (k = 1, 2, . . . , m) (copositivity condition).
Condition (II) K is closed and convex.
Condition (III) n
X ∈ F (K) : hQ0, Xi ≤ ˜ζo
is nonempty and bounded for some ˜ζ ∈ R.
Condition (IV) hQ0, Xi ≥ 0 for every X ∈ F0(K).
For the primal-dual pairs of problems, ζp(K) := inf
hQ0, Xi | X ∈ F (K)
= inf
hQ0, Xi
X ∈ K, hH0, Xi = 1,
hQk, Xi = 0 (k = 1, 2, . . . , m)
, (6)
ζd(K) := sup (
z0
Q0+ Xm k=1
Qkzk− H0z0 ∈ K∗ )
, (7)
ηp(λ, K) := inf
h(Q0+ λH1), Xi X ∈ K, hH0, Xi = 1
, (8)
ηd(λ, K) := sup y0
Q0+ λH1− H0y0 ∈ K∗
, (9)
the following results are shown.
Theorem 2.1.
(i) ηd(λ, K)↑λ = ζd(K) ≤ ζp(K) and ηd(λ, K) ≤ ηp(λ, K)
↑λ ≤ ζp(K) under Condition (I).
(ii) ηd(λ, K) = ηp(λ, K)
↑λ = ζd(K) ≤ ζp(K) under Conditions (I) and (II).
(iii) ηd(λ, K) = ηp(λ, K)
↑λ = ζd(K) = ζp(K) under Conditions (I), (II) and (III).
(iv) ζp(K) = ζp(co K) under Conditions (I) and (IV).
Here ↑λ means “increases monotonically as λ → ∞”.
2.2 Notation and symbols
For the application of the unified framework to POPs, we use the following notation: Let R denote the set of real numbers, R+ the set of nonnegative real numbers, and Z+ the set of nonnegative integers. Let |α| = Pn
i=1αi for each α ∈ Zn+, where αi denotes the ith element of α ∈ Zn+. R[x] is the set of real-valued multivariate polynomials in xi ∈ R (i = 1, . . . , n), where x = (x1, . . . , xn) ∈ Rn. Each polynomial f (x) ∈ R[x] is represented as f (x) = P
α∈Ffαxα, where F ⊂ Zn+ is a nonempty finite set, fα (α ∈ F ) real coefficients, xα = xα11xα22· · · xαnn and α = (α1, α2, . . . , αn) ∈ Zn+. We assume that x0i = 1 even if xi = 0, in particular, x0 = 1 for any x ∈ Rn. The support of f (x) is defined by supp(f (x)) = {α ∈ F : fα6= 0} ⊂ Zn+, and the degree of f (x) ∈ R[x] is defined by deg(f (x)) = max{|α| : α ∈ supp(f (x))}. For each nonempty subsets F and G of Zn+, let F + G denote their Minkowski sum {α + β : α ∈ F, β ∈ G}, and let R[x, F] = {f (x) ∈ R[x] : supp(f (x)) ⊂ F}.
Let F be a nonempty finite subset of Zn+. |F | stands for the number of elements of F . Let RF denote the |F|-dimensional Euclidean space whose coordinate are indexed by α ∈ F. Each vector of RF with elements wα (α ∈ F ) is denoted by (wα : α ∈ F) or simply (wα : F ). We assume that (wα : F) is a column vector when it is multiplied by a matrix. If x ∈ Rn, xF = (xα : F ) denotes the |F|-dimensional (column) vector of monomials xα ∈ R[x] (α ∈ F). Hence each polynomial f (x) ∈ R[x, F] is represented as f (x) = h(fα : F ), xFi. SF denotes the linear space of |F| × |F | symmetric matrices with elements ξαβ (α ∈ F, β ∈ F). We use the notation ✷F for the set F × F = {(α, β) : α, β ∈ F}.
Each matrix of SF is written as (ξαβ : (α, β) ∈ ✷F) or simply (ξαβ : ✷F). If x ∈ Rn, xF(xF)T = (xα : F)(xα : F )T = (xα+β : (α, β) ∈ ✷F) is a rank-1 symmetric matrix of monomials xα+β ∈ R[x] ((α, β) ∈ ✷F), which is denoted by x✷F. (xF)T denotes the row vector obtained by taking the transpose of the column vector xF.
For every pair of Q = (Qαβ : ✷F) and X = (Xαβ : ✷F) ∈ SF, hQ, Xi denotes the matrix inner product, i.e., hQ, Xi = trace(QTX) = P
(α,β)∈✷FQαβξαβ. With this notation, we often write the quadratic form (xF)TQxF as hQ, x✷Fi to indicate that x✷F = xF(xF)T will be replaced by X = (Xαβ : ✷F) ∈ SF.
Let
SF+ = the cone of positive semidefinite matrices in SF
=
(ξαβ : ✷F) ∈ SF : (wα : F)T(ξαβ : ✷F)(wα: F) ≥ 0 for every (wα: F) ∈ RF
, LF =
(ξαβ : ✷F) ∈ SF : ξαβ = ξγδ if α + β = γ + δ .
Then SF+ forms a closed convex cone in SF, and LF a linear subspace of SF. We also see that x✷F ∈ SF+ ∩ LF for every x ∈ Rn. This relation is used repeatedly in the subsequent discussions.
2.3 Representing polynomials with symmetric matrices of mono- mials and sums of squares of polynomials
For a nonempty finite subset G of Zn+, a given polynomial f (x) ∈ R[x, G] is usually repre- sented as the inner product of its coefficient vector (fα : G) and the vector xG = (xα : G)
of monomials in the polynomial, i.e., g(x) = h(fα : G), xGi. However, representing a polynomial with a symmetric matrix of monomials is more convenient in the subsequent discussions, in particular, when discussing nonnegative polynomials on Rn and Rn+, and exploiting sparsity in conic and Lagrangian-conic relaxations. For the representation of a polynomial f (x) ∈ R[x, G] using a symmetric matrix of monomials, we need to choose a finite subset F of Zn+ satisfying the property G ⊂ F +F. In fact, a smaller-sized F satisfying this property is preferable for numerical efficiency. See [19] for details of choosing such an F . See also [4].
Let F be a nonempty subset of Zn+ and f (x) ∈ R[x, F + F ]. Then we can represent the polynomial f (x) using the rank-1 symmetric matrix x✷F = xF(xF)T of monomials xα+β ((α, β) ∈ ✷F) and some Q ∈ SF such that f (x) = hQ, x✷Fi. Note that x✷F contains all monomials xα (α ∈ F + F), and that the choice of such a Q ∈ SF is not unique as shown in the following example.
Example 2.1. Consider the polynomial f1(x) = f1(x1, x2) in two real variables such that f1(x) = 1 − 2x1− 2x2+ x21+ x22+ 2x21x2+ 2x1x22+ x21x22.
Let
G = {(0, 0), (1, 0), (0, 1), (2, 0), (0, 2), (2, 1), (1, 2), (2, 2)}, (fα1 : G) = (1, −2, −2, 1, 1, 2, 2, 1),
xG = (xα : G) = (1, x1, x2, x21, x22, x21x2, x22.x21x22).
Then R[x, G] ∋ f1(x) = h(fα1 : G), xGi. To represent f1(x) using a symmetric matrix of monomials, we can take F = {(0, 0), (1, 0), (0, 1), (1, 1)} so that
G ⊂ F + F = {(0, 0), (1, 0), (0, 1), (2, 0), (0, 2), (1, 1), (2, 1), (1, 2), (2, 2)}.
Let
Q =
1 −1 −1 0
−1 1 0 1
−1 0 1 1
0 1 1 1
∈SF, P =
0 0 0 −1
0 0 1 0
0 1 0 0
−1 0 0 0
∈SF,
x✷F =
1 x1 x2 x1x2
x1 x21 x1x2 x21x2 x2 x1x2 x22 x1x22 x1x2 x21x2 x1x22 x21x22
.
Then, R[x, F + F ] ∋ f1(x) = hQ + µP , x✷Fi for every µ ∈ R.
Lemma 2.1. Let F be a nonempty finite subset of Zn+, Q ∈ SF and P ∈ SF. Then, R[x, F + F] ∋ hQ, x✷Fi = hQ + P , x✷Fi if and only if P ∈ LF⊥
. Proof. We first show that the linear subspace of SF generated by
x✷F : x ∈ Rn , i.e., L=
λx✷F + µy✷F : x, y ∈ Rn, λ, µ ∈ R ,
coincides with LF. By the definition of LF, we know that
x✷F : x ∈ Rn
⊂ LF, hence L ⊂ LF. It suffices to show that dim(L) = dim(LF). Let ℓ = dim(LF), which is equivalent to the number of different elements in F + F. Let M be the linear subspace of Rℓ generated by the set {xF +F = (xγ : γ ∈ F + F)} ⊂ Rℓ. Then we can identify the linear space L as the linear space M since each matrix X = λx✷F + µy✷F ∈ L corresponds to a vector λxF +F + µyF +F ∈ M and vice versa. As a result, dim(L) = dim(M). On the other hand, we see that dim(M) = ℓ since there is no nonzero (gα : F + F ) such that h(gα : F + F ), xF +Fi = 0 for all x ∈ Rn. Therefore, we obtain that dim(L) = dim(LF) = ℓ. Thus we have shown that LF = L. Now assume that R[x, F + F] ∋ hQ, x✷Fi = hQ + P , x✷Fi.
Then hP , Xi = 0 for all X ∈ L = LF, which implies that P ∈ LF⊥
. Conversely if P ∈ LF⊥
, then R[x, F + F] ∋ hQ, x✷Fi = hQ + P , x✷Fi holds from x✷F ∈ LF for every x∈ Rn.
Lemma 2.1 implies that, for each Q ∈ SF, {Q + P : P ∈ LF⊥
} forms an equivalent class in SF represented by the common polynomial f (x) = hQ, x✷Fi.
We introduce some additional notation for SOS of polynomials. Let
SOS[x, F] = ( r
X
i=1
(ϕi(x))2 : ϕi(x) ∈ R[x, F] (i = 1, . . . , r) ∃r ∈ Z+
)
for every F ⊂ Zn+, SOS[x] = SOS[x, Zn+].
We call SOS[x] the cone of SOS of polynomials, and each f (x) ∈ SOS[x] an SOS polynomial.
The following lemma provides a characterization of SOS polynomials.
Lemma 2.2. [10] Let F be a nonempty finite subset of Zn+. Then SOS[x, F] =
hQ, x✷Fi : Q ∈ SF+ .
In Example 2.1, the matrix Q ∈ SF itself is not positive semidefinite. But if we choose µ = 1, then the matrix Q1 = Q + µP ∈ SF is positive semidefinite. Hence f1(x) = hQ1, x✷Fi ∈ SOS[x, F] by the lemma above. In fact, we see that
SOS[x, F] ∋ f1(x) = hQ1, x✷Fi = (x1+ x2+ x1x2− 1)2, (10) where
F = {(0, 0), (1, 0), (0, 1), (1, 1)}, Q1 =
1 −1 −1 −1
−1 1 1 1
−1 1 1 1
−1 1 1 1
∈SF+.
The following lemma will be used in Sections 3 and 4.
Lemma 2.3. Let F be a nonempty finite subset of Zn+ and M a linear subspace of SF. Then, (SF+∩ M) + ( LF⊥
∩ M) is closed.
Proof. We first show that the set {X ∈ SF : X + Y ∈ B, X ∈ SF+, Y ∈ LF⊥
} is bounded for every bounded subset B of SF. Let q = |F|. Since the set of monomials xα (α ∈ F ) is independent, i.e., there is no nonzero (gα : F ) such that (gα : F )T(xα : F ) is identically zero for all x ∈ Rn, there exist xj ∈ Rn (j = 1, . . . , q) such that the q × q matrix A= (xα1 : F), (xα2 : F ), . . . , (xαq : F )
is nonsingular. For some bounded B of SF, assume on the contrary that there exists a sequence {Zp = Xp+ Yp : p = 1, 2, . . . } satisfying Zp = Xp + Yp ∈ B, Xp ∈ SF+, Yp ∈ LF⊥
(p = 1, 2, . . . ) and kXpk → ∞ as p → ∞. We may assume without loss of generality that SF+ ∋ Xp/kXpk converges to some nonzero X ∈ SF+ as p → ∞. Hence LF⊥
∋ Yp/kXpk = Zp/kXpk − Xp/kXpk converges to
−X as p → ∞. Since LF⊥
is a linear subspace of SF, we obtain that X ∈ SF+∩ LF⊥
. Recall that xFj = (xαj : F)(xαj : F )T ∈ LF. Hence hAAT, Xi = hPq
j=1(xαj : F )(xαj : F )T, Xi =Pq
j=1h(xαj : F)(xαj : F )T, Xi = 0. Since AAT is a q × q positive definite matrix and X ∈ SF+, the identity hAAT, Xi = 0 implies that X = O. This is a contradiction.
Thus we have shown that {X ∈ SF : X + Y ∈ B, X ∈ SF+, Y ∈ LF⊥
} is bounded for every bounded subset B of SF.
Now, we show that (SF+∩ M) + ( LF⊥
∩ M) is closed. Suppose that Zp = Xp+ Yp, Xp ∈ SF+∩ M, Yp ∈ LF⊥
∩ M (p = 1, 2, . . . ) and Zp → Z for some Z ∈ SF as p → ∞.
Since the sequence {Zp = Xp+ Yp : p = 1, 2, . . . , } is bounded and Xp ∈ SF+, Yp ∈ LF⊥
(p = 1, 2, . . . ), the sequence {Xp : p = 1, 2, . . . } is bounded. Hence we may assume that it converges to some X ∈ SF. It follows that Yp = Zp−Xp → Y for some Y ∈ SF as p → ∞.
Since both SF+∩ M and LF⊥
∩ M are closed subsets of SF, we know that X ∈ SF+∩ M and Y ∈ LF⊥
∩M. Therefore, we have shown that Z = X +Y ∈ (SF+∩M)+( LF⊥
∩M).
3 A class of polynomial optimization problems and their covexification
Consider a class of POPs of the form (1). Assume that J is a nonempty closed (but not necessarily convex) cone in Rn, and fk(x) ∈ R[x, F + F] (k = 0, 1, . . . , m) for a nonempty finite subset F of Zn+ including 0 ∈ Zn+. For practical applications, we focus on Rn, Rn+ and Rℓ× Rn−ℓ with 1 ≤ ℓ ≤ n − 1 for the cone J, but the theoretical results in this section are valid for any closed cone in Rn.
We transform POP (1) into the COP of the form (6) to present the moment cone (MC) relaxation of POPs in the unified framework of Section 2, Part I [5]. The MC relaxation was proposed by [4] as an extension of the completely positive (CPP) relaxation [2, 8] for a class of linearly constrained QOPs in continuous and binary variables. We also recall that the CPP relaxation for the class of QOPs was discussed in Section 4 of [16].
Let us take SF for the underlying linear space V, and represent each polynomial fk(x) ∈ R[x, F +F] as fk(x) = hQk, x✷Fi (k = 0, 1, . . . , m), where Qk ∈ SF. Let ∆F1 =
x✷F ∈ SF : x ∈ J} . Then, POP (1) can be rewritten as
ζ∗ := inf
hQ0, Xi
X ∈ ∆F1 , hQk, Xi = 0 (k = 1, 2, . . . , m)
. (11)
We consider the following illustrative example:
Example 3.1. As in Example 2.1, we take n = 2 and F = {(0, 0), (1, 0), (0, 1), (1, 1)}. Let m = 1, R[x, F + F ] ∋ f0(x) = hQ0, x✷Fi for some Q0 ∈ SF, and let f1(x) ∈ R[x, F + F]
be an SOS polynomial given in (10). Let J = R2+. In this case, it is obvious that the feasible region of POP (14) is bounded and contains two points x = (1, 0) and x = (0, 1), thus, its finite minimum value ζ∗ is attained at some feasible solution. By definition, we see that
∆F1 = {x✷F ∈ SF : x ∈ R2+}.
The problem (11) is in a form similar to COP (6), but we still need to embed ∆F1 in a cone K ⊂ SF and introduce an inhomogeneous equality constraint hH0, Xi = 1 such that
∆F1 = {X ∈ K : hH0, Xi = 1}. This can be achieved by two methods. The first method is to take the conic hull of ∆F1 such that
∆F =
µX ∈ SF : µ ≥ 0, X ∈ ∆F1
=
µx✷F ∈ SF : µ ≥ 0, x ∈ J . The second method is to homogenize ∆F1 such that
ΓF = n
(xτ−|α|0 xα : F )(xτ−|α|0 xα : F )T ∈ SF : (x0, x) ∈ R+× Jo , where τ = max{|α| : α ∈ F }. We note that, for x0 = 0 and x ∈ Rn,
xτ−|α|0 xα =
0 if τ > |α|
xα otherwise, i.e., if τ = |α|.
Both ∆F and ΓF are cones in SF. The first construction of the cone ∆F was (implicitly) employed in [24], while the second ΓF in [4].
Let H0 be a matrix in SF with 1 in the (0, 0)th element and 0 elsewhere. Then, we see that hH0, x✷Fi = hH0, (xα : F)(xα : F )Ti = x0x0 = 1 for every x ∈ Rn, and that hH0, Xi = X00= x2τ0 for every X = (xτ−|α|0 xα : F)(xτ−|α|0 xα : F)T ∈ ΓF. It follows that
X ∈ ∆F : hH0, Xi = 1
=
µx✷F ∈ SF : x ∈ J, µ ≥ 0, hH0, µx✷Fi = 1
= ∆F1,
X ∈ ΓF : hH0, Xi = 1
=n
(xτ−|α|0 xα: F)(xτ−|α|0 xα: F )T : (x0, x) ∈ R+× J, x0 = 1o
= ∆F1.
(12)
Therefore, both COP (6) with K = ∆F and COP (6) with K = ΓF are equivalent to POP (11), and ζp(∆F) = ζp(ΓF) = ζ∗.
For Example 3.1, we see that
∆F =
µ µx1 µx2 µx1x2
µx1 µx21 µx1x2 µx21x2 µx2 µx1x2 µx22 µx1x22 µx1x2 µx21x2 µx1x22 µx21x22
∈SF+ : (µ, x1, x2) ∈ R3+
,
ΓF =
x40 x30x1 x30x2 x20x1x2
x30x1 x20x21 x20x1x2 x0x21x2
x30x2 x20x1x2 x20x22 x0x1x22 x20x1x2 x0x21x2 x0x1x22 x21x22
∈SF+ : (x0, x1, x2) ∈ R3+
. (13)
Obviously, for every x1 > 0 and every x2 > 0, the matrix
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 x21x22
is contained in ΓF but not in ∆F.
Now, we are ready to apply all the discussions in Section 3 of Part I [5]. In particular, the following theorem is obtained directly from Theorem 3.1 of [5].
Theorem 3.1. Suppose that Condition (I) holds for K = ΓF. Then, (i) ζp(co ∆F) = ζ∗;
(ii) Assume that ζ∗ is finite, we have that ζp(co ΓF) =
ζ∗ if Condition (IV) holds for K = ΓF,
−∞ otherwise.
Next we investigate Conditions (I) and (IV) in detail for K = ΓF, and in particular, J = Rn and J = Rn+. First, suppose that J = Rn. Then it follows that ΓF ⊂ SF+∩ LF, and ΓF∗
⊃ cl
SF+ + LF⊥
= SF+ + LF⊥
. (See Lemma 2.3 for the last equality).
Therefore, if Qk ∈ SF++ LF⊥
, or, equivalently, if fk(x) = hQk, x✷Fi is a sum of squares of polynomials (k = 1, 2, . . . , m) (see Lemma 2.1 and 2.2), then Condition (I) is satisfied for K = ΓF. (Recall that O 6= H0 ∈ SF+ by definition). If a polynomial equation g(x) = 0 is given, it is equivalent to have (g(x))2 = 0. Thus, Condition (I) for K = ΓF is not a strong assumption.
Now suppose that J = Rn+. Then,
ΓF ⊂ (CF)∗∩ LF ⊂ SF+∩ NF ∩ LF, and
ΓF∗
⊃ cl CF + (LF)⊥
⊃ cl SF++ NF + (LF)⊥ , where
CF =
Y ∈ SF : (ξα : F )TY(ξα : F ) ≥ 0 for every (ξα : F ) ≥ 0
(the copositive cone), (CF)∗ =
X ∈ SF : hX, Y i ≥ 0 for every Y ∈ CF
= co
(ξα: F)(ξα : F )T ∈ SF : (ξα : F) ≥ 0 (the completely positive cone),
NF =
X ∈ SF : Xαβ ≥ 0 ((α, β) ∈ ✷F) (the cone of nonnegative matrices).
Therefore, if Qk ∈ SF+ + NF + (LF)⊥ (or, less restrictively, if Qk ∈ CF + (LF)⊥) (k = 1, 2, . . . , m), then Condition (I) is satisfied for K = ΓF.
Now we focus on Condition (IV) with K = ΓF. Recall that τ = max{|α| : α ∈ F }. Let F = {α ∈ F : |α| = τ }. Suppose that
X = (xτ−|α|0 xα : F)(xτ−|α|0 xα : F )T ∈ F0(ΓF) Then,
Xαβ =
xα+β if α, β ∈ F , 0 otherwise.
As a result, if we define Qk = (Qkαβ : ✷F) ∈ SF and ¯fk(x) ∈ R[x, F + F] such that Qkαβ =
Qkαβ if α, β ∈ F
0 otherwise and ¯fk(x) = hQk, x✷Fi (k = 0, 1, . . . , m), we can rewrite Condition (IV) for K = ΓF as
f¯0(x) ≥ 0 if x ∈ J and ¯fk(x) = 0 (k = 1, 2, . . . , m).
Usual cases where the condition above holds are:
(a) Q0 = O ∈ SF or ¯f0(x) is an identically zero polynomial, i.e., deg(f0(x)) < 2τ . (b)
x∈ J : ¯fk(x) = 0 (k = 1, 2, . . . , m)
= {0}.
We note that (a) can be always satisfied by choosing a nonempty finite subset F of Zn+such that fk(x) ∈ R[x, F + F] (k = 0, 1, 2, . . . , m) and deg(f0(x)) < 2τ , and that (b) implies that the feasible region of POP (1) is bounded.
For Example 3.1, we observe that
F0(ΓF) =
X =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 x21x22
∈SF+ : (x1, x2) ∈ R2+,
hQ1, Xi = Q1(1,1)(1,1)X(1,1)(1,1) = 0
= {O},
where Q1 is given as in (10). Thus, Condition (IV) is satisfied for K = ΓF.
If the cone co Γ is closed, i.e., Condition (II) is satisfied for K = co Γ in addition to Condition (I), then we can introduce the primal-dual pair of Lagrangian-conic relaxation problems (8) and (9) for K = co Γ, and apply the discussions given in Section 3 of Part I [4]. In particular, the relation ηd(λ, K) = ηp(λ, K)
↑λ = ζd(K) ≤ ζp(K) follows; see Theorem 2.1. The cone ΓF as well as its convex hull co ΓF, however, are not necessarily closed. In fact, ΓF given in (13) is not closed. To see this, let
X =
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
.
Then X 6∈ ΓF, but the matrix (xτ−|α|0 xα : α ∈ F )(xτ−|α|0 xα : α ∈ F)T ∈ Γ with (x0, x1, x2) = (ǫ, ǫ, 1/ǫ) and ǫ > 0 converges to X.
Lemma 3.1. Assume that 0 ∈ F and τ ei ∈ F (i = 1, 2, . . . , n), where ei denotes the i-th unit coordinate vector in Rn. Then, ΓF and co ΓF are closed.
Proof. Let {Xp : p = 1, 2, . . . } be a sequence in ΓF converging to X ∈ SF. By Xp ∈ ΓF, there exists (xp0, xp) ∈ R+×J such that Xαβp = (xp0)τ−|α|(xp)α(xp0)τ−|β|(xp)β, which converges to Xαβ as p → ∞ for ((α, β) ∈ ✷F). Specifically, (xp0)τ−|α|(xp)α(xp0)τ−|α|(xp)α converges to Xαα for α = 0 ∈ F and α = τ ei ∈ F (i = 1, 2, . . . , n). Thus, (xp0)2τ and (xpi)2τ (i = 1, 2, . . . , n) converge to X00 and X(τei)(τei) (i = 1, 2, . . . , n), respectively. This implies that the sequence {(xp0, xp) : p = 1, 2, . . . } is bounded, and we can take a subsequence which converges to (¯x0, ¯x) ∈ R+× J. Therefore,
X = ((¯x0)τ−|α|(¯x)α : F )((¯x0)τ−|α|(¯x)α : F)T ∈ ΓF,
and we have shown that ΓF is closed. The closedness of co ΓF follows from Lemma 3.1 of [4].
Without loss of generality, the assumption of the previous lemma can be satisfied by adding 0 ∈ Zn+ and τ ei ∈ Zn+ (i = 1, 2, . . . , n) to F if necessary. For Example 3.1, one can add (2, 0) and (0, 2) to F to satisfy the assumption.
The MC relaxation problem proposed in [4] as an extension of the CPP relaxation for QOPs to POPs is essentially equivalent to COP (6) with K = co ΓF. A hierarchy of copositivity conditions assumed there is weaker than Condition (I) and can be regarded as a generalization of Condition (I). We have assumed a stronger condition here, Condition (I), to consistently derive the Lagrangian-conic relaxation (8) in the unified framework. On the other hand, the additional condition on zeros at infinity assumed in [4], which was also assumed in [24] for a canonical convexification procedure for a class of POPs, is stronger than Condition (IV). See Condition (IV)’ and Lemma 3.1 of [5].
If F =
α∈ Zn+ : |α| ≤ 1
, then POP (1) becomes a QOP. In this case, LF = SF and (LF)⊥ = {0}, and the previous discussions correspond to Section 4.2 of Part I [5], where the convexification of a linearly constrained QOP with complementarity condition was discussed.
4 A hierarchy of sparse Lagrangian-SDP relaxations for POPs
The primal COP (6) and its Lagrangian-conic relaxation (8) have been mainly used when applying the unified framework of Part I [5] to QOPs and POPs, and their duals. As a result of this application, the problem of computing tight lower bounds for the optimal values of general POPs is reduced to the dual COP (7).
In this section, we propose a hierarchy of Lagrangian-SDP relaxations for POPs by combining the approach in [3, 5, 16] for deriving the Lagrangian-CPP and Lagrangian-DNN relaxations for a class of QOPs with the hierarchy of SDP relaxations proposed by [21]
for POPs. The motivation for combining these two approaches, which have been studied almost independently, is to develop efficient and effective numerical methods for POPs. We simultaneously take account the important issue of exploiting sparsity [17, 22, 28] in the proposed hierarchy of Largangian-SDP relaxations for POPs. As a result, the description
may be slightly complicated, but exploiting sparsity is essential for numerical efficiency of solving large-scale POPs.
4.1 A class of equality constrained POPs with a structured spar- sity
Let us fix J = Rn in POP (1) throughout this section. We deal with a class of equality constrained POP of the form:
ζ∗ = inf
f0(x)
x ∈ Rn, fi(x) = 0 (i = 1, 2, . . . , m0)
. (14)
As mentioned in Section 1, we can include inequality constraints since a polynomial inequal- ity g(x) ≥ 0 can be rewritten as g(x) − v2 = 0 with a slack variable v ∈ R. We assume that POP (14) is sparse. More precisely, each equality constraint fk(x) = 0 involves only a small subset of the variables x1, x2, . . . , xn and the objective polynomial f0(x) consists of monomials whose variables include only a small number of pairs of xi and xj. Under this assumption, a structured sparsity [17, 22, 28] can be embed in POP (14). Let
Ck ⊂ {1, . . . , n}, Ck 6= ∅, (k = 1, . . . , m), [m k=1
Ck = {1, . . . , n}, Aτ =
α∈ Zn+ : |α| ≤ τ
(τ ∈ Z+), Akτ =
α∈ Aτ : αi = 0 if i 6∈ Ck
(τ ∈ Z+, k = 1, . . . , m), Ak∞ = [
τ∈Z+
Akτ =
α∈ Zn+: αi = 0 if i 6∈ Ck
.
Then, the sparsity structure of POP (14) is described as f0(x) ∈
Xm k=1
R[x, Ak∞] and fi(x) ∈ [m k=1
R[x, Ak∞] (i = 1, 2, . . . , m0).
From the second inclusion relation, we can partition the index set {1, 2, . . . , m0} of equality constraints in (14) into m sets Ik (k = 1, 2, . . . , m) such that
fi(x) ∈ R[x, Ak∞] for every i ∈ Ik. Now, we rewrite POP (14) as
ζ∗ = inf
f0(x)
x ∈ Rn, fi(x) = 0 (i ∈ Ik, k = 1, 2, . . . , m)
. (15)
As an illustrative example, we consider the same problem as Example 3.1.
Example 4.1. Let
n = 4, m0 = 3,
F = {0, e1, e2, e1+ e2, 2e3, 2e4} ⊂ Z4+,
where ei ∈ Z4+ denotes the ith unit coordinate vector, f0(x) ∈ R[x, F0+ F0], where F0 = {0, e1, e2, e1+ e2},
f1(x) = x1− x23, f2(x) = x2 − x24, f3(x) = x1+ x2 + x1x2− 1.
Then, the POP in Example 3.1 can be rewritten as minimize
f0(x)
f1(x) = x1− x23 = 0, f2(x) = x2− x24 = 0, f3(x) = x1 + x2+ x1x2− 1 = 0
.
Define Ik= {k} (k = 1, 2, 3), C1 = {1, 3}, C2 = {2, 4} and C3 = {1, 2}. Then, the problem above can be expressed as in (15).
We impose the following two conditions on POP (15) in the subsequent discussions.
Condition (A)
(a) Each Ck (k = 1, 2, . . . , m) is a nonempty maximal subset among the family {C1, C2, . . . , Cm}, i.e., there is no Cs (s 6= k) that includes Ck.
(b) For every k ∈ {1, . . . , m − 1}, there is an s ≥ k + 1 such that Ck∩ Ck+1∪ · · · ∪ Cm
⊂ Cs and 6= Cs.
Condition (B) For each k = 1, 2, . . . , m, there is an i ∈ IX k such that fi(x) = ρk −
i∈Ck
x2i for some ρk> 0.
Let G(N, E) be an undirected graph with the node set N = {1, 2, . . . , n} and the edge set E = {(i, j) ∈ N × N : i 6= j and (i, j) ∈ Ck for some k = 1, 2, . . . , m},
where each edge (i, j) ∈ E is identified with (j, i) ∈ E. It is known that Condition (A) provides a necessary and sufficient condition for G(N, E) to be a chordal graph and for C1, C2, . . . , Cm to be its maximal cliques. The property (b) of Condition (A) is called the running intersection property. See [7] for the definition of a chordal graph and its properties.
Condition (B) implies that the feasible region of POP (15) is bounded. Conversely, if the feasible region is bounded, we can add the constraints
ρk− X
i∈Ck
x2i − x2n+k = 0 (k = 1, 2, . . . , m)
for some ρk > 0, and replace x by (x, xn+1, . . . , xn+m) and Ck by Ck ∪ {n + k} (k = 1, 2, . . . , m), where xn+k denotes a slack variable (k = 1, 2, . . . , m). See [17, 22, 28] for more details on how the structured sparsity can be constructed from a given POP. Condition (B) can be weakened if Theorem 1 of [20] is used instead of Corollary 3.3 of [22] in the proof of Lemma 4.1. But its description is slightly complicated, so we prefer to use Condition (B).
We can easily verify that C1, C2 and C3 constructed in Example 4.1 satisfy Condition (A). In this case, the corresponding graph G(N, E) does not have any cycle, which is a trivial chordal graph. Although Example 4.1 can be modified to satisfy Condition (B), we use Example 4.1 for ease of discussion.