Noncrossing varying coefficient support vector quantile regression †
Jooyong Shim 1 · Changha Hwang 2 · Insuk Sohn 3 · Kyungha Seok 4
14 Department of Statistics, Inje University
2 Department of Applied Statistics, Dankook University
3 Arontier
Received 12 July 2020, revised 12 August 2020, accepted 22 August 2020
Abstract
Quantile regression fits specified percentiles of the response, such as the 90th per- centile, and can potentially describe the entire conditional distribution of the response.
Sometimes quantile functions estimated at different quantiles can cross each other.
Varying coefficient models are a useful extension of classical linear models. We pro- pose a new noncrossing varying coefficient support vector quantile regression method based on a location-scale model. To choose the hyper-parameters we apply the model selection method that use cross validation techniques. The proposed method provides a good solution for estimating noncrossing quantile regression functions when several quantiles are required. Real examples are provided to show the usefulness of the pro- posed method.
Keywords: Location-scale model, non-crossing quantile regression, quantile regression, support vector quantile regression, varying coefficient quantile regression.
1. Introduction
Quantile regression, which was introduced by Koenker and Bassett (1978), fits specified percentiles of the response, such as the 90th percentile, and can potentially describe the entire conditional distribution of the response. It has been used widely for estimating the quantiles of a conditional distribution of the response variable given the values of input variables. Just as the classic linear regression model that minimizes the sum of squared residuals provides to estimate various models for conditional mean functions, quantile regression methods allow
† This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (NRF- 2018R1D1A1B07042349, NRF-2017R1D1A1B03029792 and NRF-2017R1E1A1A01075541).
1
Adjunct Professor, Institute of Statistical Information, Department of Statistics, Inje University, Gyungnam 50834, Korea.
2
Professor, Department of Applied Statistics, Dankook University, Yongin, Gyeonggido 16890, Korea.
3
Chief research officer, Arontier, Seoul 06735, Korea.
4
Corresponding author: Professor, Institute of Statistical Information, Department of Statistics, Inje
University, Gyungnam 50834, Korea. E-mail: [email protected]
a models for the conditional median function, and the full range of other quantile functions.
Cole (1990) introduced a parametric LMS method which is based on the Box-Cox power transformation (L), the mean or median function (M), and the coefficient of variation (S).
Koenker and Bassett (1978) used a nonparametric approach based on M- estimation sim- ilarly to least absolute deviation methods, which yields consistent estimates of the quan- tile regression function under general conditions without requiring that the form of the distribution of output variable be specified. However, the main weakness is that separate specifications and estimates are needed for each quantile of interest. Although true quan- tile functions are defined as not intersecting, the quantile functions estimated in different quantile can intersect with each other without special restrictions. He (1997) proposed the restricted regression quantile (RRQ) to ensure quantile regression functions would not cross, which is based on a location-scale model. It can be used in a wide range of models, includ- ing linear heteroscedastic models and nonlinear quantile regression models. In RRQ, the non-crossing constraint is transformed into positivity constraint. To conduct such transfor- mation, some restrictions should imposed on the conditional moment structure of the prob- lem, which is not desirable from nonparametric modeling view point. Heagerty and Pepe (1999) modeled locations and scales as a flexible regression spline functions and proposed a semi-parametric method that could change the distribution of error as a function of input variables. Their method combines the strengths of the parametric LMS method with the ad- vantages of the nonparametric method of Koenker and Bassett (1978). Takeuchi (2004) and Takeuchi et al. (2006) utilized the non-crossing constraint as a simple linear constraint via support vector machine (SVM; Vapnik, 1995) for non-crossing quantile regression method.
This SVM approach shows good performance, See Hwang and Shim(2017a, 2017b), Shim et al. (2016, 2017) for more information on SVM applications. However, the non-crossing quantile regression method through the SVM has the disadvantage of calculating pairs of all adjacent conditional quantile functions when multiple quantiles are required. Shim et al. (2009) proposed a non-crossing quantile regression method using the doubly penalized kernel machine. They estimates both location and scale functions simultaneously from the basic heteroscedastic location-scale model.
Introduced by Hastie and Tibshirani (1993), varying coefficient models are a useful ex- tension of classical linear models. They arise naturally when one wishes to examine how regression coefficients change over different groups characterized by certain covariates such as age.
In this paper, we propose a new support vector quantile regression method applying vary- ing coefficient non-crossing quantile regression, which is based on a location-scale model and uses a step-wise strategy. We utilize the model selection method that uses the generalized approximate cross validation function for choosing the hyper-parameters which are impor- tant to the performance of the proposed method. The proposed method provides a good solution to estimating noncrossing quantile regression functions when multiple quantiles for high dimensional data are required. Real examples reveal the usefulness of the proposed method.
The remainder of this paper is organized as follows. In Section 2 we present the varying
coefficients support vector quantile regression. In Section 3 we state the proposed non-
crossing varying coefficients support vector quantile regression. In Section 4 we perform
numerical studies through two real examples. In Section 5 we give the conclusions.
2. Varying coefficient support vector quantile regression
In this section we briefly present the varying coefficient support vector quantile regression proposed by Shim et al. (2016). We denote the training data set by {(x x x i , u u u i , y i )} n i=1 with each input vector x x x i ∈ R d
xincluding a constant 1, and the output y i ∈ R, which is linearly related to the input vector x x x i conditionally on the smooth vector u u u i ∈ R d u . We consider the varying coefficient quantile regression model as follows:
q θ (x x x i , u u u i ) =
d
xX
k=0
x ik β k (u u u i ) for θ ∈ (0, 1), (2.1)
where x i0 = 1. In the varying coefficient quantile regression model (2.1) we assume that β k (u u u i ) is nonlinearly related to the smoothing variable vector u u u i such that β k (u u u i ) = ω ω ω 0 k φ(u u u i )+
b k for k = 0, · · · , d x where ω ω ω k is a corresponding d f × 1 weight vector. Here the nonlinear feature mapping function φ : R u d → R d
fmaps the input space to the feature space where the feature dimension d f is defined by implicit way. From Mercer (1909), we know an inner product in feature space has an equivalent kernel in input space, φ(u u u i ) 0 φ(u u u j ) = K(u u u i , u u u j ).
Several options of the kernel K(·, ·) are available.
We consider the nonlinear case, in which the th quantile regression function given (x x x i , u u u i ), can be expressed as a nonlinear function of smooth vector u u u i such that
q θ (x x x i , u u u i ) =
d
xX
k=0
x ik (ω ω ω 0 k φ(u u u i ) + b k ) for θ ∈ (0, 1).
The θth quantile regression function can be defined as a function of any solution to the following optimization problem :
min L = 1 2
d
xX
k=0
||ω ω ω k || 2 + C
n
X
i=1
ρ theta (y i −
d
xX
k=1
(x ik ω ω ω 0 k φ(u u u i ) + b k )), (2.2)
where ρ θ (·) is a check function and C > 0 is a penalty parameter which controls the trade- off between the smoothness and fitness of the estimator. We can express the optimization problem by the formulation for SVM as follows:
min L = 1 2
d
xX
k=0
||ω ω ω k || 2 + Cθ
n
X
i=1
ξ i + C(1 − θ)
n
X
i=1
ξ i ∗
subject to
y i −
d
xX
k=0
x ik (ω ω ω 0 k φ(u u u i ) + b k ) ≤ ξ i , − y i +
d
xX
k=1
x ik (ω ω ω 0 k φ(u u u i ) + b k ) ≤ ξ i ∗ , i = 1, · · · , n.
We construct a Lagrange function as follows:
L = 1 2
d
xX
k=0
||ω ω ω k || 2 + Cθ
n
X
i=1
ξ i + C(1 − θ)
n
X
i=1
ξ i ∗ −
n
X
i=1
α i (ξ i − y i +
d
xX
k=1
x ik (ω ω ω 0 k φ(u u u i ) + b k ))
−
n
X
i=1
α ∗ i (ξ i ∗ + y i −
d
xX
k=1
x ik (ω ω ω 0 k φ(u u u i ) + b k )), (2.3)
where α (∗) i , η i (∗) ≥ 0. Taking partial derivatives of equation (2.3) with regard to the primal variables (ω ω ω k , ξ i (∗) , b k ) we have,
∂L
∂ω ω ω k
= 0 0 0 → ω ω ω k =
n
X
i=1
x ik φ(u u u i )(α i − α ∗ i ), k = 0, · · · d x ,
∂L
∂ξ i
= 0 → Cθ = α i + η i , i = 1, · · · , n,
∂L
∂ξ ∗ i = 0 → C(1 − θ) = α ∗ i + η i ∗ , i = 1, · · · , n,
∂L
∂b k = 0 →
n
X
i=1
x ik (α i − α ∗ i ) = 0, k = 0, · · · d x .
Plugging the above results into (2.3), we have the optimization problem as follows:
max − 1 2
n
X
i,j=1
(α i − α ∗ i )(α j − α ∗ j )
d
xX
k=0
x ik x jk K(u u u i , u u u j ) +
n
X
i=1
y i (α i − α ∗ i ) (2.4)
subject to 0 ≤ α i ≤ Cθ, 0 ≤ α ∗ i ≤ C(1 − θ) and P n
i=1 x ik (α i − α ∗ i ) = 0 for k = 0, · · · d x . The optimal Lagrange multipliers (α i , α ∗ i ) can be obtained from the above problem with the constraints. Thus, the estimator of β k (u u u t ) for k = 0, · · · , d x is obtained as follows:
β ˆ k (u u u t ) =
n
X
i=1
x ik K(u u u t , u u u i )( ˆ α i − ˆ α ∗ i ) + ˆ b k . (2.5)
From (2.5) we can obtain θth quantile regression function estimator given an input (x x x ` , u u u t ) as follows :
ˆ
q θ (x x x ` , u u u t ) =
d
xX
k=0
x `k β β β ˆ k (u u u t ) =
n
X
i=1 d
xX
k=0
x `k (x ik K(u u u t , u u u i )( ˆ α i − ˆ α ∗ i ) + ˆ b k ). (2.6)
Here ˆ b k for k = 0, · · · , d x is obtained via Kuhn-Tucker conditions (Kuhn and Tucker,
1951) such as,
ˆ b 0
ˆ b 1
.. . ˆ b d
x
= (X X X s X X X s ) −1 X X X s Y Y Y s , (2.7)
where X X X s is an n s × d x matrix of x x x 0 i for I s = {i = 1, · · · , n|0 < α i < Cθ, C(1 − θ)}, Y Y Y s is an n s × 1 vector of (y i − P n
j=1
P d
xk=0 x ik (x jk K(u u u i , u u u j )( ˆ α j − ˆ α ∗ j ))s for i ∈ I s and n s is the size of I s .
The functional structures of the varying coefficient support vector quantile regression is characterized by hyper-parameters (C and the kernel parameters). To select the hyper- parameters we consider the cross validation (CV) function as follows:
CV (λ) =
n
X
i=1
ρ θ (y i − ˆ q θ (x x x i , u u u i ) (−i) ), (2.8)
where λ is the set of hyper-parameters and ˆ q θ (x x x i , u u u i ) (−i) is the estimated quantile regression function without ith observation. Since ˆ q θ (x x x i , u u u i ) (−i) for i = 1, · · · , n, should be evaluated for each set of candidates, using the CV function to select the hyper parameter is computa- tionally formidable.
Yuan (2006) proposed the generalized approximate CV function to select the set of hyper- parameters λ as follows:
GACV (λ) = P n
i=1 ρ θ (y i − ˆ q θ (x x x i , u u u i ))
n − trace(H) , (2.9)
where H is the hat matrix such that ˆ q θ (x x x, u u u) = Hy y y with the (i, j)th element h ij = ∂ ˆ q
θ∂y (x x x
i,u u u
i)
j