A study on a composite support vector quantile regression with varying coefficient model<sup>†</sup>

(1)

A study on a composite support vector quantile regression with varying coefficient model ^†

Insuk Sohn ¹ · Jooyong Shim ² · Kyungha Seok ³

1 Statistics and Data Center, Samsung Medical Center

23 Department of Statistics, Inje University

Received 1 June 2018, revised 25 June 2018, accepted 29 June 2018

Abstract

Varying coefficient models are widely used to explore dynamic patterns of regression parameters among regression models available to avoid the curse of dimensionality. In this paper we propose a new regression estimation of the varying coefficient composite support vector quantile regression which combines the formulations of the composite quantile regression and the varyng coefficient support vector quantile regression which is a nonparametric quantile regression with varying regression quantiles. We also consider a cross validation method for the optimal values of hyperparameters which affect the performance of the proposed method. Numerical studies with synthetic and real data are conducted to illustrate the performance of the proposed estimation of the regression functions.

Keywords: Composite quantile regression, cross validation function, quantile regression, support vector quantile regression, varying coefficient model.

1. Introduction

Koenker and Bassett (1978) introduced the quantile regression (QR), which is known as a useful and robust statistical methods for estimating and better statistical analysis of the relationships among variables included in the model. Applications of QR in many different areas include the medicines (Heagerty and Pepe, 1999), the survival analysis (Koenker and Geling, 2001; Shim and Hwang, 2009), and the growth chart (Wei and He, 2006).

Generally QR is less efficient than the least squares estimation when errors have a normal distribution. The composite QR (CQR) estimator for the classical linear model was proposed by Zou and Yuan (2008) to overcome the weakness of QR. The CQR estimator can be viewed

† This research was supported by Basic Science Research Program through the National Research Foun- dation of Korea(NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01056582, NRF- 2017R1D1A1B03029792 and NRF-2017R1E1A1A01075541).

1

Senior Researcher, Statistics and Data Center, Samsung Medical Center, Seoul 06351, Korea.

2

Adjunct Professor, Institute of Statistical Information, Department of Statistics, Inje University, Gyungnam 50834, Korea.

3

Corresponding author: Professor, Institute of Statistical Information, Department of Statistics, Inje

University, Gyungnam 50834, Korea. [email protected]

(2)

as a compromise between a set of QR functions with different regression quantiles and a single summarized regression function. The loss of CQR can be regarded as a weighted sum of the check functions with same weights (Kai et al., 2010). They showed that the relative efficiency of the CQR estimator has lower a bound of 70% compared with the least squares estimator regardless of the error distribution, and proposed the local polynomial CQR estimators for estimating the regression function. They showed that the local polynomial CQR method can significantly improve the estimation efficiency of the local least squares estimator for commonly used nonnormal error distributions.

To address the curse of dimensionality problem in regression study, the varying coefficient (VC) model has been proposed by Hastie and Tibshirani (1993). It is well known that a general form of VC model includes the additive model (Breiman and Friedman, 1985) as a special case. VC models have inherited the simplicity and easy interpretation of classical linear models. The introductions, various applications of VC models can be found in Hoover et al. (1998), Fan and Zhang (2013), Hwang et al. (2016), and Hwang and Shim (2017).

Support vector QR (SVQR) is obtained by applying SV regression with a check function used in QR. Li et al. (2007) derived a simple formula for the effective dimension of SVQR, which allows easy model selection. Shim et al. (2016) proposed a SVQR with VC and showed that it is an attractive approach for modelling the input vector and regression quantile which is a function of smoothing vector.

Inspired by the attractive properties of CQR and VCSVQR, a better regression function estimation can be made by averaging multiple quantile functions ( _{J +1} ¹ th, _{J +1} ² th, · · · , _{J +1} ^J th) estimated from J VCSVQRs. In this paper we propose a new regression function estimation of the varying coefficient composite support vector quantile regression (VCCSVQR) which combines CQR and VCSVQR in one formulation. The optimization problem of VCCSVQR is solved via quadratic programming and the optimal values of hyperparameters which are the penalty parameter, kernel parameters and the number of quantiles are obtained by k- fold cross validation function. In VCCSVQR, the basic idea of kernel method is used for the computation in the input space rather than in the reproducing kernel Hilbert space. This enables us to handle both the nonlinear and the liner regression functions. The experimental results from synthetic and real examples show the successful performance of the proposed estimators.

The rest of this paper is organized as follows. Section 2 reviews the VCSVQR of Shim et al.

(2016). VcCSVQR is proposed in Section 3. Experiments with synthetic and real examples are given in Section 4. Section 5 contains conclusions.

2. Varying coefficient support vector quantile regression

In this section we present the reduced version of VCSVQR (Shim et al., 2016). We set the training dataset by {x x x i , u u u i , y i } ⁿ _i=1 with each input vector x x x i ∈ R ^d , smoothing vector u u u i ∈ R ^d and the output y i ∈ R. We consider the varying coefficient quantile regression model, the θth quantile regression function given (x x x i , u u u i ) is assumed to be linearly related with the input vector x x x i whereas its regression quantile (β β β _θ ) varies with the smoothing vector u u u i ,

q θ (x x x i , u u u i ) =

d+1

X

k=1

X ik β kθ (u u u i ) + b for θ ∈ (0, 1), (2.1)

(3)

where b is a bias and X _i1 = 1, X _ik+1 = x _ik for k = 1, · · · , d.

In the varying coefficient quantile regression model (2.1) we assume that the regression quantile β k (u u u i ) is nonlinearly related to the smoothing vector u u u i such that β kθ (u u u i ) = ω ω ω k φ(u u u i ) where ω ω ω k is a corresponding d f ×1 weight vector to φ(u u u i ). Here the feature mapping function φ(·) maps the input space to the reproducing kernel Hilbert space. An inner product in the reproducing kernel Hilbert space has an equivalent kernel in the input space, φ(u u u i )0φ(u u u j ) = K(u u u i , u u u j ) (Mercer, 1909).

The θth quantile regression function can be defined as a function of any solution to the optimization problem as follows:

min L = 1 2

d+1

X

k=1

||ω ω ω _k || ² + C

n

X

i=1

ρ _θ (y _i −

d+1

X

k=1

X _ik ω ω ω _k φ(u u u _i ) − b),

where ρ θ (r) = θr if r > 0 and ρ θ (r) = (θ − 1)r if r ≤ 0 for θ ∈ (0, 1), C > 0 is a penalty parameter. We can express the optimal problem by formulation for support vector quantile regression as follows:

min L = 1 2

d+1

X

k=1

||ω ω ω _k || ² + Cθ

n

X

i=1

xi _i + C(1 − θ)

n

X

i=1

xi ^∗ _i (2.2)

subject to y i − P d+1

k=1 X ik ω ω ω k φ(u u u i ) − b ≤ ξ i , −y i + P d+1

k=1 X ik ω ω ω k φ(u u u i ) + b ≤ ξ _i ^∗ , i = 1, · · · , n.

A Lagrange function is constructed as follows:

L = 1 2

d+1

X

k=1

||ω ω ω _k || ² + Cθ

n

X

i=1

xi _i + C(1 − θ)

n

X

i=1

ξ ^∗ _i −

n

X

i=1

α _i (ξ _i − y i +

d+1

X

k=1

X _ik ω ω ω _k φ(u u u _i ) + b)

−

n

X

i=1

α ^∗ _i (ξ _i ^∗ + y _i −

d+1

X

k=1

X _ik ω ω ω _k φ(u u u _i ) − b), (2.3)

where the non-negative constraints α ^(∗) _i , η _i ^(∗) ≥ 0 should be satisfied. Taking partial deriva- tives of equation (2.3) with respect to (ω ω ω k , ξ _i ^(∗) , b k ) and plugging them into the equation (2.3) leads to the optimization problem as follows:

max − 1 2

n

X

i,j=1

(α _i − α ^∗ _i )(α _j − α ^∗ _j )

d+1

X

k=1

X _ik X _jk K(u u u _i , u u u _j ) +

n

X

i=1

y _i (α _i − α ^∗ _i ) (2.4)

subject to 0 ≤ α i ≤ Cθ, 0 ≤ α ^∗ _i ≤ C(1 − θ), P n

i=1 (α i − α ^∗ _i ) = 0 and P n

i=1 x ik (α i − α ^∗ _i ) = 0.

Solving the above optimization problem (2.4) results in the optimal Lagrange multipliers (α _i , α ^∗ _i ). Thus, the estimated θth quantile regression function given (x x x _t , u u u _t ) is obtained as,

ˆ

q θ (x x x t , u u u t ) =

d+1

X

k=1

X tk β ˆ kθ (u u u t ) + ˆ b =

d+1

X

k=1 n

X

i=1

X tk X ik K(u u u t , u u u i )( ˆ α i − ˆ α ^∗ _i ) + ˆ b. (2.5)

(4)

Here ˆ b _k is obtained via Kuhn-Tucker conditions (Kuhn and Tucker, 1951) as follows:

ˆ b = 1 n s

X

i∈I

s

(y i −

d+1

X

k=1 n

X

l=1

X ik X lk K(u u u i , u u u l )( ˆ α l − ˆ α ^∗ _l )), (2.6)

where n _s is the size of I _s = {i = 1, · · · , n|0 < ˆ α _i < Cθ, 0 < ˆ α ^∗ _i < C(1 − θ)}.

3. Varying coefficient composite support vector quantile regression

With a check function ρ _θ (·), we consider the varying coefficient composite quantile regres- sion whose objective function can be defined as follows:

min

J

X

j=1

1 2

d+1

X

k=1

||ω ω ω k || ² + C

n

X

i=1

ρ θ

_j

(y i − q θ

_j

(x x x i , u u u i ))

!

= J 2

d+1

X

k=1

||ω ω ω _k || ² + C

J

X

j=1 n

X

i=1

ρ _θ

_j

(y _i −

d+1

X

k=1

X _ik ω ω ω _k 0φ(u u u _i ) − b _j ),

where θ j = _{J +1} ^j . We can express the optimization problem of the varying coefficient com- posite quantile regression problem by formulation for support vector quantile regression as follows.

min J 2

d+1

X

k=1

||ω ω ω k || ² + C

J

X

j=1 n

X

i=1

(θ j ξ ij + (1 − θ j )ξ _ji ^∗ ) (3.1)

subject to

y i −

d+1

X

k=1

X ik ω ω ω k φ(u u u i ) − b j ≤ ξ ij , − y i +

d+1

X

k=1

X ik ω ω ω k φ(u u u i ) + b j ≤ ξ _ij ^∗ , ξ _ij ^(∗) ≥ 0, where C > 0 is a penalty parameter.

A Lagrange function is constructed as follows:

L = J 2

d+1

X

k=1

||ω ω ω _k || ² + C

J

X

j=1 n

X

i=1

(θ _j ξ _ij + (1 − θ _j )ξ _ij ^∗ ) −

J

X

j=1 n

X

i=1

α _ij (ξ _ij − y i +

d+1

X

k=1

X _ik ω ω ω _k φ(u u u _i ) + b _j )

−

J

X

j=1 n

X

i=1

α ^∗ _ij (ξ ^∗ _ij + y i −

d+1

X

k=1

X ik ω ω ω k φ(u u u i ) + b j ) −

J

X

j=1 n

X

i=1

η ij ξ ij −

J

X

j=1 n

X

i=1

η _ij ^∗ ξ _ij ^∗ , (3.2)

where the non-negative constraints α ^(∗) _ij , η _ij ^(∗) ≥ 0 should be satisfied. Taking partial deriva-

tives of the equation (3.2) with respect to (ω ω ω _k , ξ _ij ^(∗) , b _jk ) results in,

(5)

∂L

∂w w w _k = 0 0 0 → w w w k = 1 J

J

X

j=1 n

X

i=1

X ik φ(u u u i )(α ij − α ^∗ _ij ), k = 1, · · · , d + 1,

∂L

∂ξ ij

= 0 → Cθ j − α ij − η ij = 0, i = 1, · · · , n, j = 1, · · · , J,

∂L

∂ξ ^∗ _ij = 0 → C(1 − θ j ) − α ^∗ _ij − η _ij ^∗ = 0, i = 1, · · · , n, j = 1, · · · , J,

∂L

∂b j

= 0 →

n

X

i=1

(α ij − α ^∗ _ij ) = 0, j = 1, · · · , J.

Plugging the above results into the equation (3.2), we have the optimization problem as follows;

max − 1 2

J

X

j=1 n

X

i,l=1

(α _ij − α ^∗ _ij )(α _lj − α ^∗ _lj )

d+1

X

k=1

X _ik X _lk K(u u u _i , u u u _l ) +

J

X

j=1 n

X

i=1

y _i (α _ij − α ^∗ _ij ) (3.3)

subject to P n

i=1 (α _ij − α ^∗ _ij ) = 0 for j = 1, · · · , J , 0 ≤ α _ij ≤ θ _j C and 0 ≤ α ^∗ _ij ≤ (1 − θ _j )C.

Solving the above optimization problem (3.3) results in the optimal Lagrange multipliers ˆ

α ij and ˆ α ^∗ _ij . Thus, the estimator of the regression function given (x x x t , u u u t ) is obtained as follows;

f (x ˆ x x t , u u u t ) = 1 J

J

X

k=1

ˆ

q θ

_j

(x x x t , u u u t ) = 1 J

J

X

j=1

(

d+1

X

k=1

X tk β ˆ kθ

_j

(u u u t ) + ˆ b j )

= 1 J

J

X

j=1 d+1

X

k=1

X _tk (

n

X

i=1

X _ik K(u u u _t , u u u _i )( ˆ α _ij − ˆ α ^∗ _ij )) + ¯ ˆ b

= 1 J

J

X

j=1 d+1

X

k=1 n

X

i=1

X tk X ik K(u u u t , u u u i )( ˆ α ij − ˆ α ^∗ _ij ) + ¯ ˆ b, (3.4)

where ¯ ˆ b = _J ¹ P J

j=1 ˆ b j and ˆ b j is obtained via Kuhn-Tucker conditions (Kuhn and Tucker, 1951) such as,

ˆ b j = 1 n j

X

iinI

j

(y i −

d+1

X

k=1 n

X

l=1

X ik X lk K(u u u i , u u u l )( ˆ α lj − ˆ α ^∗ _lj )), (3.5)

where n j is the size of I j = {i = 1, · · · , n|0 < ˆ α ij < Cθ j , 0 < ˆ α ^∗ _ij < C(1 − θ j )} for j = 1, · · · , J .

The functional structures of VCCSVQR is characterized by hyperparameters which are

penalty parameter (C), the kernel parameter (σ ² ) and number of quantiles (J ). To select

(6)

the optimal values of hyperparameters of VCCSVQR we consider the leave-one out cross validation (LOO-CV) function as follows:

CV (λ) =

n

X

i=1

(y _i − ˆ f _λ (x x x _i , u u u _i ) ⁽⁻ⁱ⁾ ) ² ,

where λ is a set of hyperparameters and ˆ f λ (x x x i , u u u i ) ⁽⁻ⁱ⁾ is the regression function estimated without ith observation using λ, while ˆ f (x x x i , u u u i ) is the regression function estimated from full data using λ. Since for each candidates of hyperparameters, ˆ f (x x x i , u u u i ) ⁽⁻ⁱ⁾ for i = 1, · · · , n, should be computed, selecting the optimal values of hyperparameters by CV function is com- putationally formidable. We consider the k-fold cross validation (kCV) function as follows:

kCV (λ) = X

i∈v

_k

(y i − ˆ f λ (x x x i , u u u i ) ^(−k) ) ² , (3.6)

where ˆ f _λ (x x x _i , u u u _i ) ^(−k) is the regression function estimated without observations in kth sub- dataset v _k using λ.

4. Numerical studies

In this section, we illustrate the performance of the varying coefficient composite support vector quantile regression (VCCSVQR) through the synthetic and real data of multiple non- linear regression. For our numerical studies, we compared the proposed VCCSVQR with the kernel local polynomial smoothing (KLPS) method (Fan and Zhang, 2008), the local poly- nomial quantile regression with VCs (LPQRVC) (θ=0.5) (Cai and Xu, 2008), and VCSVQR (θ=0.5) (Shim et al., 2016). Throughout this paper, we use Epanechnikov kernel function for KLPS and LPQRVC, and Gaussian kernel function for VCSVQR and VCCSVQR. Hy- perparameters considered are the penalty parameter (C), the kernel parameter (σ ² ) and the number of quatiles (J ). For hyperparameter selection we use LOO-CV function for σ ² of KLPS and LPQRVC, the generalized approximate CV function for (C, σ ² ) of VCSVQR, and 5-fold CV function for (C, σ ² , J ) of the proposed method (VCCSVQR).

The performance of the estimates of f (x x x, u) is measured by the average of mean squared errors and by their standard error.

For the first example, we generate the synthetic data {x x x _i , u _i , y _i } ⁿ _i=1 from the model as follows:

y _i = β ₀ (u _i ) + β ₁ (u _i )x _i1 + β ₂ (u _i )x _i2 + e _i , where β 0 (u i ) = cos(πu i ), β 1 (u i ) = sin( √

2πu i ) and β 2 (u i ) = cos( √

2πu i ), u i follows i.i.d.

U (0, 2), x i1 , x i2 follows i.i.d. U (−1, 1). Figure 4.1 shows the plots of the smoothing variable u equally spaced in (0,2) and the varying coefficient β k (u). Here we assume that e i follows i.i.d.

one of 4 types of error distributions: N (0, 1), double exponential distribution (DE(0, 1)), t 3 -

distribution, and a mixture of normal distributions (0.95N (0, 1)+0.05N (0, 100)). The mean

squared error and the mean absolute error are used for the performance metrics as follows:

(7)

M SE = 1 n

n

X

i=1

(f (x x x i , u u u i ) − ˆ f (x x x i , u u u i )) ² , M AE = 1 n

n

X

i=1

|f (x x x i , u u u i ) − ˆ f (x x x i , u u u i )|.

For our experiment, we repeat 50 times with each training dataset of size n=100 and each test dataset of size n=50. Table 4.1 and 4.2 show the results of the first example. Boldfaced values indicate the best performance for the given quantity. As seen in Table 4.1 and 4.2, the proposed VCCSVQR outperform KLPS, LPQRVC(0.5) and VCSVQR(0.5) for training datasets and test datasets when the error distributions are 4 types of error distributions.

This implies the proposed VCCSVQR provides best performance even for the thick-tailed error distributions.

Figure 4.1 Plots of the smoothing variable and the varying coefficient in the first example

Table 4.1 Averages of mean squared errors from synthetic datasets. Standard errors are in parentheses.

error data KLPS LPQRVC VCSVQR VCCSVQR

distribution (0.5) (0.5)

N(0,1) training 0.2240 0.3174 0.2931 0.1863 (0.0139) (0.0152) (0.0183) (0.0110) test 0.3785 0.4977 0.4117 0.2516 (0.0708) (0.0469) (0.0382) (0.0247) DE(0,1) training 0.3601 0.3418 0.2617 0.2410 (0.0188) (0.0157) (0.0176) (0.0150) test 0.4842 0.5672 0.3464 0.3353 (0.0331) (0.0689) (0.0298) (0.0348) t(3) training 0.4842 0.4417 0.3757 0.3312 (0.0400) (0.0306) (0.0268) (0.0263) test 0.6471 0.6701 0.4871 0.3862 (0.0919) (0.0783) (0.0383) (0.0335) 0.95N (0, 1) training 0.8354 0.3846 0.3134 0.2778 +0.05N (0, 100) (0.0595) (0.0228) (0.0198) (0.0205) test 1.1971 0.6176 0.4243 0.3910 (0.1258) (0.0624) (0.0381) (0.0388)

For the second example, we use the wage dataset in Wooldridge (2003), where the response

(y) is the logarithm of wages in dollars per hour, the smoothing variable (u) is the education

(8)

Table 4.2 Averages of mean absolute errors from synthetic datasets. Standard errors are in parentheses.

error data KLPS LPQRVC VCSVQR VCCSVQR

distribution (0.5) (0.5)

N(0,1) training 0.2240 0.3174 0.2931 0.1863 (0.0139) (0.0152) (0.0183) (0.0110) test 0.3785 0.4977 0.4117 0.2516 (0.0708) (0.0469) (0.0382) (0.0247) DE(0,1) training 0.3601 0.3418 0.2617 0.2410 (0.0188) (0.0157) (0.0176) (0.0150) test 0.4842 0.5672 0.3464 0.3353 (0.0331) (0.0689) (0.0298) (0.0348) t(3) training 0.4842 0.4417 0.3757 0.3312 (0.0400) (0.0306) (0.0268) (0.0263) test 0.6471 0.6701 0.4871 0.3862 (0.0919) (0.0783) (0.0383) (0.0335) 0.95N (0, 1) training 0.8354 0.3846 0.3134 0.2778 +0.05N (0, 100) (0.0595) (0.0228) (0.0198) (0.0205) test 1.1971 0.6176 0.4243 0.3910 (0.1258) (0.0624) (0.0381) (0.0388)

year, and the input variables (x 1 , x 2 , x 3 ) are gender (1=male), marital status (1=married) and years of labor force experience. We consider the varying coefficient regression model as follows:

y i = β 0 (u i ) + β 1 (u i )x i1 + β 2 (u i )x i2 + β 3 (u i )x i3 + e i .

We randomly divide the wage dataset into training dataset of size 300 and test dataset of size 226. We repeat the above procedure 50 times. The mean squared error and the mean absolute error are are used for the performance metrics as follows:

M SE = 1 n

n

X

i=1

(y _i − ˆ f (x x x _i , u u u _i )) ² , M AE = 1 n

n

X

i=1

|y i − ˆ f (x x x _i , u u u _i )|.

Table 4.3 and 4.4 show the results of the second example. As seen in Table 4.3 and 4.4, the proposed VCCSVQR outperforms KLPS, LPQRVC(0.5), and VCSVQR(0.5) for training datasets and test datasets of the second example.

Table 4.3 Averages of mean squared errors from the wage datasets. Standard errors are in parentheses.

data KLPS LPQRVC(0.5) VCSVQR(0.5) VCCSVQR

training 0.1632 0.1687 0.1563 0.1476

(0.0016) (0.0017) (0.0018) (0.0017)

test 0.2658 0.6568 0.2317 0.2123

(0.0756) (0.2437) (0.0046) (0.0031)

(9)

Table 4.4 Averages of mean absolute errors from the wage datasets. Standard errors are in parentheses.

data KLPS LPQRVC(0.5) VCSVQR(0.5) VCCSVQR

training 0.2888 0.2846 0.2679 0.2729

(0.0069) (0.0056) (0.0034) (0.0046)

test 0.3460 0.4693 0.3619 0.3520

(0.0079) (0.0656) (0.0027) (0.0024)

5. Conclusions

In this paper we propose a new regression estimation of VCCSVQR that combines the formulations of CQR and VCSVQR. In particular, the basic idea of kernel method is used for the computation in the input space rather than in the reproducing kernel Hilbert space.

Numerical studies are conducted to illustrate the performance of the proposed estimators.

Through the experiments, we showed that our estimators appear useful in estimating regres- sion function regardless of error distribution. For the future work, we consider the estimation of VCCSVQR using the iteratively reweighted least squares procedure.

References

Breiman, L. and Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation (with discussion). Journal of the American Statistical Association, 80, 580-619.

Cai, Z. and Xu, X. (2008). Nonparametric quantile estimations for dynamic smooth coefficient models.

Journal of the American Statistical Association, 103, 1595-1608.

Fan, J. and Zhang, W. (2008). Statistical methods with varying coefficient models. Statistics and Its Inter- face, 1, 179-195.

Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society:

Series B , 55, 757-796.

Heagerty, P. and Pepe, M. (1999). Semiparametric estimation of regression quantiles with application to standardizing weight for height and age in U. S. children. Journal of the Royal Statistical Society:

Series C , 48, 533-551.

Hoover, D. R., Rice, J. A., Wu, C. O. and Yang, L. P. (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika, 85, 809-822.

Hwang, C. and Shim, J. (2017). Feature selection in the semivarying coefficient LS-SVR. Journal of the Korean Data & Information Science Society, 28, 461-471.

Hwang, C., Bae, J. and Shim, J. (2016). Robust varying coefficient model using L1 penalized locally weighted regression. Journal of the Korean Data and Information Science Society, 27, 1059-1066.

Kai, B., Li, R. and Zou, H. (2010). Local composite quantile regression smoothing: An efficient and safe alternative to local polynomial regression. Journal of the Royal Statistical Society. Series B , 72, 49-69.

Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33-50.

Koenker, R. and Geling, R. (2001). Reappraising medfly longevity: A quantile regression survival analysis.

Journal of the American Statistical Association, 96, 458-468.

Kuhn, H. and Tucker, A. (1951). Nonlinear programming, Proceedings of 2nd Berkeley symposium, Uni- versity of California Press, Berkeley.

Li, Y., Kiu, Y. and Zhu, J. (2007). Quantile regression in reproducing kernel Hilbert space. Journal of the American Statistical Association, 103, 255-268.

Mercer. J. (1909) Function of positive and negative type and their connection with theory of integral equations. Philosophical Transactions of Royal Society A, 415-446.

Shim, J. and Hwang, C. (2009). Support vector censored quantile regression under random censoring.

Computational Statistics and Data Analysis, 53, 912-919.

Shim, J., Hwang, C. and Seok, K. (2016). Support vector quantile regression with varying coefficients.

Computational Statistics, 31, 1015-1050.

(10)

Wooldridge, J. M. (2003). Introductory econometrics: A modern approach, South-Western Cengage Learn- ing, Mason.

Wei, Y. and He, X. (2006). Conditional growth charts (with discussions). Annals of Statistics, 34, 2069-2097.

Zou, H. and Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. Annals

of Statistics, 36, 1108-1126.

A study on a composite support vector quantile regression with varying coefficient model<sup>†</sup>

A study on a composite support vector quantile regression with varying coefficient model †

Insuk Sohn 1 · Jooyong Shim 2 · Kyungha Seok 3

1 Statistics and Data Center, Samsung Medical Center

23 Department of Statistics, Inje University

Received 1 June 2018, revised 25 June 2018, accepted 29 June 2018

Abstract

Keywords: Composite quantile regression, cross validation function, quantile regression, support vector quantile regression, varying coefficient model.

1. Introduction

Generally QR is less efficient than the least squares estimation when errors have a normal distribution. The composite QR (CQR) estimator for the classical linear model was proposed by Zou and Yuan (2008) to overcome the weakness of QR. The CQR estimator can be viewed

† This research was supported by Basic Science Research Program through the National Research Foun- dation of Korea(NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01056582, NRF- 2017R1D1A1B03029792 and NRF-2017R1E1A1A01075541).

Senior Researcher, Statistics and Data Center, Samsung Medical Center, Seoul 06351, Korea.

Adjunct Professor, Institute of Statistical Information, Department of Statistics, Inje University, Gyungnam 50834, Korea.

Corresponding author: Professor, Institute of Statistical Information, Department of Statistics, Inje

University, Gyungnam 50834, Korea. [email protected]

The rest of this paper is organized as follows. Section 2 reviews the VCSVQR of Shim et al.

(2016). VcCSVQR is proposed in Section 3. Experiments with synthetic and real examples are given in Section 4. Section 5 contains conclusions.

2. Varying coefficient support vector quantile regression

q θ (x x x i , u u u i ) =

d+1

X

k=1

X ik β kθ (u u u i ) + b for θ ∈ (0, 1), (2.1)

where b is a bias and X i1 = 1, X ik+1 = x ik for k = 1, · · · , d.

The θth quantile regression function can be defined as a function of any solution to the optimization problem as follows:

min L = 1 2

d+1

X

k=1

||ω ω ω k || 2 + C

n

X

i=1

ρ θ (y i −

d+1

X

k=1

X ik ω ω ω k φ(u u u i ) − b),

where ρ θ (r) = θr if r > 0 and ρ θ (r) = (θ − 1)r if r ≤ 0 for θ ∈ (0, 1), C > 0 is a penalty parameter. We can express the optimal problem by formulation for support vector quantile regression as follows:

min L = 1 2

d+1

X

k=1

||ω ω ω k || 2 + Cθ

n

X

i=1

xi i + C(1 − θ)

n

X

i=1

xi ∗ i (2.2)

subject to y i − P d+1

k=1 X ik ω ω ω k φ(u u u i ) − b ≤ ξ i , −y i + P d+1

k=1 X ik ω ω ω k φ(u u u i ) + b ≤ ξ i ∗ , i = 1, · · · , n.

A Lagrange function is constructed as follows:

L = 1 2

d+1

X

k=1

||ω ω ω k || 2 + Cθ

n

X

i=1

xi i + C(1 − θ)

n

X

i=1

ξ ∗ i −

n

X

i=1

α i (ξ i − y i +

d+1

X

k=1

X ik ω ω ω k φ(u u u i ) + b)

−

n

X

A study on a composite support vector quantile regression with varying coefficient model ^†

Insuk Sohn ¹ · Jooyong Shim ² · Kyungha Seok ³

where b is a bias and X _i1 = 1, X _ik+1 = x _ik for k = 1, · · · , d.

||ω ω ω _k || ² + C

ρ _θ (y _i −

X _ik ω ω ω _k φ(u u u _i ) − b),

||ω ω ω _k || ² + Cθ

xi _i + C(1 − θ)

xi ^∗ _i (2.2)

k=1 X ik ω ω ω k φ(u u u i ) + b ≤ ξ _i ^∗ , i = 1, · · · , n.

||ω ω ω _k || ² + Cθ

xi _i + C(1 − θ)

ξ ^∗ _i −

α _i (ξ _i − y i +

X _ik ω ω ω _k φ(u u u _i ) + b)

α ^∗ _i (ξ _i ^∗ + y _i −

X _ik ω ω ω _k φ(u u u _i ) − b), (2.3)

where the non-negative constraints α ^(∗) _i , η _i ^(∗) ≥ 0 should be satisfied. Taking partial deriva- tives of equation (2.3) with respect to (ω ω ω k , ξ _i ^(∗) , b k ) and plugging them into the equation (2.3) leads to the optimization problem as follows:

(α _i − α ^∗ _i )(α _j − α ^∗ _j )

X _ik X _jk K(u u u _i , u u u _j ) +

y _i (α _i − α ^∗ _i ) (2.4)

subject to 0 ≤ α i ≤ Cθ, 0 ≤ α ^∗ _i ≤ C(1 − θ), P n

i=1 (α i − α ^∗ _i ) = 0 and P n

i=1 x ik (α i − α ^∗ _i ) = 0.

Solving the above optimization problem (2.4) results in the optimal Lagrange multipliers (α _i , α ^∗ _i ). Thus, the estimated θth quantile regression function given (x x x _t , u u u _t ) is obtained as,

X tk X ik K(u u u t , u u u i )( ˆ α i − ˆ α ^∗ _i ) + ˆ b. (2.5)

Here ˆ b _k is obtained via Kuhn-Tucker conditions (Kuhn and Tucker, 1951) as follows:

X ik X lk K(u u u i , u u u l )( ˆ α l − ˆ α ^∗ _l )), (2.6)

where n _s is the size of I _s = {i = 1, · · · , n|0 < ˆ α _i < Cθ, 0 < ˆ α ^∗ _i < C(1 − θ)}.

With a check function ρ _θ (·), we consider the varying coefficient composite quantile regres- sion whose objective function can be defined as follows:

||ω ω ω k || ² + C