• 검색 결과가 없습니다.

Noncrossing varying coefficient support vector quantile regression <sup>†</sup>

N/A
N/A
Protected

Academic year: 2021

Share "Noncrossing varying coefficient support vector quantile regression <sup>†</sup>"

Copied!
11
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Noncrossing varying coefficient support vector quantile regression

Jooyong Shim 1 · Changha Hwang 2 · Insuk Sohn 3 · Kyungha Seok 4

14 Department of Statistics, Inje University

2 Department of Applied Statistics, Dankook University

3 Arontier

Received 12 July 2020, revised 12 August 2020, accepted 22 August 2020

Abstract

Quantile regression fits specified percentiles of the response, such as the 90th per- centile, and can potentially describe the entire conditional distribution of the response.

Sometimes quantile functions estimated at different quantiles can cross each other.

Varying coefficient models are a useful extension of classical linear models. We pro- pose a new noncrossing varying coefficient support vector quantile regression method based on a location-scale model. To choose the hyper-parameters we apply the model selection method that use cross validation techniques. The proposed method provides a good solution for estimating noncrossing quantile regression functions when several quantiles are required. Real examples are provided to show the usefulness of the pro- posed method.

Keywords: Location-scale model, non-crossing quantile regression, quantile regression, support vector quantile regression, varying coefficient quantile regression.

1. Introduction

Quantile regression, which was introduced by Koenker and Bassett (1978), fits specified percentiles of the response, such as the 90th percentile, and can potentially describe the entire conditional distribution of the response. It has been used widely for estimating the quantiles of a conditional distribution of the response variable given the values of input variables. Just as the classic linear regression model that minimizes the sum of squared residuals provides to estimate various models for conditional mean functions, quantile regression methods allow

† This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (NRF- 2018R1D1A1B07042349, NRF-2017R1D1A1B03029792 and NRF-2017R1E1A1A01075541).

1

Adjunct Professor, Institute of Statistical Information, Department of Statistics, Inje University, Gyungnam 50834, Korea.

2

Professor, Department of Applied Statistics, Dankook University, Yongin, Gyeonggido 16890, Korea.

3

Chief research officer, Arontier, Seoul 06735, Korea.

4

Corresponding author: Professor, Institute of Statistical Information, Department of Statistics, Inje

University, Gyungnam 50834, Korea. E-mail: [email protected]

(2)

a models for the conditional median function, and the full range of other quantile functions.

Cole (1990) introduced a parametric LMS method which is based on the Box-Cox power transformation (L), the mean or median function (M), and the coefficient of variation (S).

Koenker and Bassett (1978) used a nonparametric approach based on M- estimation sim- ilarly to least absolute deviation methods, which yields consistent estimates of the quan- tile regression function under general conditions without requiring that the form of the distribution of output variable be specified. However, the main weakness is that separate specifications and estimates are needed for each quantile of interest. Although true quan- tile functions are defined as not intersecting, the quantile functions estimated in different quantile can intersect with each other without special restrictions. He (1997) proposed the restricted regression quantile (RRQ) to ensure quantile regression functions would not cross, which is based on a location-scale model. It can be used in a wide range of models, includ- ing linear heteroscedastic models and nonlinear quantile regression models. In RRQ, the non-crossing constraint is transformed into positivity constraint. To conduct such transfor- mation, some restrictions should imposed on the conditional moment structure of the prob- lem, which is not desirable from nonparametric modeling view point. Heagerty and Pepe (1999) modeled locations and scales as a flexible regression spline functions and proposed a semi-parametric method that could change the distribution of error as a function of input variables. Their method combines the strengths of the parametric LMS method with the ad- vantages of the nonparametric method of Koenker and Bassett (1978). Takeuchi (2004) and Takeuchi et al. (2006) utilized the non-crossing constraint as a simple linear constraint via support vector machine (SVM; Vapnik, 1995) for non-crossing quantile regression method.

This SVM approach shows good performance, See Hwang and Shim(2017a, 2017b), Shim et al. (2016, 2017) for more information on SVM applications. However, the non-crossing quantile regression method through the SVM has the disadvantage of calculating pairs of all adjacent conditional quantile functions when multiple quantiles are required. Shim et al. (2009) proposed a non-crossing quantile regression method using the doubly penalized kernel machine. They estimates both location and scale functions simultaneously from the basic heteroscedastic location-scale model.

Introduced by Hastie and Tibshirani (1993), varying coefficient models are a useful ex- tension of classical linear models. They arise naturally when one wishes to examine how regression coefficients change over different groups characterized by certain covariates such as age.

In this paper, we propose a new support vector quantile regression method applying vary- ing coefficient non-crossing quantile regression, which is based on a location-scale model and uses a step-wise strategy. We utilize the model selection method that uses the generalized approximate cross validation function for choosing the hyper-parameters which are impor- tant to the performance of the proposed method. The proposed method provides a good solution to estimating noncrossing quantile regression functions when multiple quantiles for high dimensional data are required. Real examples reveal the usefulness of the proposed method.

The remainder of this paper is organized as follows. In Section 2 we present the varying

coefficients support vector quantile regression. In Section 3 we state the proposed non-

crossing varying coefficients support vector quantile regression. In Section 4 we perform

numerical studies through two real examples. In Section 5 we give the conclusions.

(3)

2. Varying coefficient support vector quantile regression

In this section we briefly present the varying coefficient support vector quantile regression proposed by Shim et al. (2016). We denote the training data set by {(x x x i , u u u i , y i )} n i=1 with each input vector x x x i ∈ R d

x

including a constant 1, and the output y i ∈ R, which is linearly related to the input vector x x x i conditionally on the smooth vector u u u i ∈ R d u . We consider the varying coefficient quantile regression model as follows:

q θ (x x x i , u u u i ) =

d

x

X

k=0

x ik β k (u u u i ) for θ ∈ (0, 1), (2.1)

where x i0 = 1. In the varying coefficient quantile regression model (2.1) we assume that β k (u u u i ) is nonlinearly related to the smoothing variable vector u u u i such that β k (u u u i ) = ω ω ω 0 k φ(u u u i )+

b k for k = 0, · · · , d x where ω ω ω k is a corresponding d f × 1 weight vector. Here the nonlinear feature mapping function φ : R u d → R d

f

maps the input space to the feature space where the feature dimension d f is defined by implicit way. From Mercer (1909), we know an inner product in feature space has an equivalent kernel in input space, φ(u u u i ) 0 φ(u u u j ) = K(u u u i , u u u j ).

Several options of the kernel K(·, ·) are available.

We consider the nonlinear case, in which the th quantile regression function given (x x x i , u u u i ), can be expressed as a nonlinear function of smooth vector u u u i such that

q θ (x x x i , u u u i ) =

d

x

X

k=0

x ik (ω ω ω 0 k φ(u u u i ) + b k ) for θ ∈ (0, 1).

The θth quantile regression function can be defined as a function of any solution to the following optimization problem :

min L = 1 2

d

x

X

k=0

||ω ω ω k || 2 + C

n

X

i=1

ρ theta (y i

d

x

X

k=1

(x ik ω ω ω 0 k φ(u u u i ) + b k )), (2.2)

where ρ θ (·) is a check function and C > 0 is a penalty parameter which controls the trade- off between the smoothness and fitness of the estimator. We can express the optimization problem by the formulation for SVM as follows:

min L = 1 2

d

x

X

k=0

||ω ω ω k || 2 + Cθ

n

X

i=1

ξ i + C(1 − θ)

n

X

i=1

ξ i

subject to

y i −

d

x

X

k=0

x ik (ω ω ω 0 k φ(u u u i ) + b k ) ≤ ξ i , − y i +

d

x

X

k=1

x ik (ω ω ω 0 k φ(u u u i ) + b k ) ≤ ξ i , i = 1, · · · , n.

(4)

We construct a Lagrange function as follows:

L = 1 2

d

x

X

k=0

||ω ω ω k || 2 + Cθ

n

X

i=1

ξ i + C(1 − θ)

n

X

i=1

ξ i

n

X

i=1

α i (ξ i − y i +

d

x

X

k=1

x ik (ω ω ω 0 k φ(u u u i ) + b k ))

n

X

i=1

α ii + y i −

d

x

X

k=1

x ik (ω ω ω 0 k φ(u u u i ) + b k )), (2.3)

where α (∗) i , η i (∗) ≥ 0. Taking partial derivatives of equation (2.3) with regard to the primal variables (ω ω ω k , ξ i (∗) , b k ) we have,

∂L

∂ω ω ω k

= 0 0 0 → ω ω ω k =

n

X

i=1

x ik φ(u u u i )(α i − α i ), k = 0, · · · d x ,

∂L

∂ξ i

= 0 → Cθ = α i + η i , i = 1, · · · , n,

∂L

∂ξ i = 0 → C(1 − θ) = α i + η i , i = 1, · · · , n,

∂L

∂b k = 0 →

n

X

i=1

x ik (α i − α i ) = 0, k = 0, · · · d x .

Plugging the above results into (2.3), we have the optimization problem as follows:

max − 1 2

n

X

i,j=1

(α i − α i )(α j − α j )

d

x

X

k=0

x ik x jk K(u u u i , u u u j ) +

n

X

i=1

y i (α i − α i ) (2.4)

subject to 0 ≤ α i ≤ Cθ, 0 ≤ α i ≤ C(1 − θ) and P n

i=1 x ik (α i − α i ) = 0 for k = 0, · · · d x . The optimal Lagrange multipliers (α i , α i ) can be obtained from the above problem with the constraints. Thus, the estimator of β k (u u u t ) for k = 0, · · · , d x is obtained as follows:

β ˆ k (u u u t ) =

n

X

i=1

x ik K(u u u t , u u u i )( ˆ α i − ˆ α i ) + ˆ b k . (2.5)

From (2.5) we can obtain θth quantile regression function estimator given an input (x x x ` , u u u t ) as follows :

ˆ

q θ (x x x ` , u u u t ) =

d

x

X

k=0

x `k β β β ˆ k (u u u t ) =

n

X

i=1 d

x

X

k=0

x `k (x ik K(u u u t , u u u i )( ˆ α i − ˆ α i ) + ˆ b k ). (2.6)

Here ˆ b k for k = 0, · · · , d x is obtained via Kuhn-Tucker conditions (Kuhn and Tucker,

1951) such as,

(5)

 ˆ b 0

ˆ b 1

.. . ˆ b d

x

= (X X X s X X X s ) −1 X X X s Y Y Y s , (2.7)

where X X X s is an n s × d x matrix of x x x 0 i for I s = {i = 1, · · · , n|0 < α i < Cθ, C(1 − θ)}, Y Y Y s is an n s × 1 vector of (y i − P n

j=1

P d

x

k=0 x ik (x jk K(u u u i , u u u j )( ˆ α j − ˆ α j ))s for i ∈ I s and n s is the size of I s .

The functional structures of the varying coefficient support vector quantile regression is characterized by hyper-parameters (C and the kernel parameters). To select the hyper- parameters we consider the cross validation (CV) function as follows:

CV (λ) =

n

X

i=1

ρ θ (y i − ˆ q θ (x x x i , u u u i ) (−i) ), (2.8)

where λ is the set of hyper-parameters and ˆ q θ (x x x i , u u u i ) (−i) is the estimated quantile regression function without ith observation. Since ˆ q θ (x x x i , u u u i ) (−i) for i = 1, · · · , n, should be evaluated for each set of candidates, using the CV function to select the hyper parameter is computa- tionally formidable.

Yuan (2006) proposed the generalized approximate CV function to select the set of hyper- parameters λ as follows:

GACV (λ) = P n

i=1 ρ θ (y i − ˆ q θ (x x x i , u u u i ))

n − trace(H) , (2.9)

where H is the hat matrix such that ˆ q θ (x x x, u u u) = Hy y y with the (i, j)th element h ij = ∂ ˆ q

θ

∂y (x x x

i

,u u u

i

)

j

.

From Li et al. (2007) we have the trace of the hat matrix H equals to the size of set I s used in (2.7).

3. Noncrossing varying coefficient support vector quantile regression

We want to estimate the quantile regression functions from varying coefficient support vector quantile regression model at different probabilities noncrossing for given smoothing variable vector.

To motivate the restricted regression quantiles (He, 1997), we consider the heteroscedastic varying coefficient support vector regression model as follows:

y i = x x x i β β β(u u u i ) + s(u u u i ) i . (3.1) For the identifiability of restricted regression quantiles model we assume  i has median 0,

| i | has median 1 and s(u u u i ) > 0 as in He (1997).

(6)

The algorithm for training and model selection of the varying coefficient quantile regression model for non-crossing quantile regression is as follows:

(i) Obtain ˆ β β β 0.5 (u u u i ) from (2.5) and (2.9) using {(x x x i , u u u i , y i )} n i=1 to obtain r i = y i −x x x i β β β ˆ 0.5 (u u u t ) for i = 1, · · · , n.

(ii) Obtain ˆ s(u u u i ), the estimated median regression function of |r i | given u u u i using (|r i |, u u u i ) n i=1 by the linear quantile regression, which is the estimate of s(u u u i ) since the median of | i | is assumed 1.

(iii) Obtain ˆ γ θ (u u u i ), the estimated θth quantile regression function of r i given ˆ s(u u u i ) using (r i , ˆ s(u u u i )) n i=1 by the linear quantile regression. For example, ˆ γ θ (u u u i ) = ˆ s(u u u i )ˆ b θ , where ˆ b θ is the θth regression quantile (regression coefficient for θth quantile regression).

Then the estimated noncrossing quantile regression function can be obtained as follows:

ˆ

q θ (x x x i , u u u i ) = x x x 0 i β β β ˆ 0.5 (u u u i ) + ˆ γ θ (u u u i ). (3.2) Here quantile regression function of r i ’s, ˆ γ θ (u u u i )’s, are non-crossed since the linear quan- tile regression is performed, which leads quantile regression functions of y given (x x x i , u u u i ), ˆ

q θ (x x x i , u u u i ), non-crossed.

4. Numerical studies

In this section, we illustrate the performance of the noncrossing varying coefficient quantile regression with the wage data in Wooldridge (2003) and FEV (forced spiratory volume) data in Kahn (2005). Polynomial kernel with degree 2 and RBF kernel are utilized in the wage data and FEV data, respectively.

4.1. Wage data

We consider a subset of the wage data set studied in Wooldridge (2003), which consists of four variables collected regarding each of 526 working individuals for the year 1976. The response variable y is the logarithm of wage in dollars per hour. Among major variables possibly affecting wages, we use years of education (u) as the smoothing variable, indicator of gender (x 1 ), marital status (x 2 ), and years of potential labor force experience (x 3 ) as input variables. Input variables, x 1 and x 2 , are binary to indicate qualitative features of the individual. We define x 1 to be a binary variable taking on the value one for males and the value zero for females. We also define x 2 to be one if the person is married and zero if the person is not married.

Correlation analysis shows that all variables u, x 1 , x 2 and x 3 have positive correlation coefficient values with y, which are 0.4311, 0.3737, 0.2707, and 0.1114, respectively. From the coefficients we can interpret that a married man with higher education and longer experience will have a higher chance of getting higher wages.

Figure 4.1 shows plots of the quantile regression functions estimated by the varying coef-

ficient support vector quatile regression (VcSVQR) of the logarithm of wages versus years

(7)

of education for male-married individuals (upper-left), male-notmarried individuals (upper- right), female-married individuals (lower-left), female- notmarried individuals (lower-right), respectively. Figure 4.2 shows plots of the quantile regression functions estimated by the noncrossing varying coefficient support vector quatile regression (ncVcSVQR), where each quantile regression function is obtained by using the average potential labor force experi- ence of each education year period. For example the estimated quantile regression function for male-married individuals of 12 years of potential labor force experience with 6 years of education is obtained as ˆ q θ (u = 6, x 1 = 1, x 2 = 1, x 3 = 12).

Figure 4.1 The estimated quantile regression functions of the logarithm of wages by the varying coefficient quantile regression. The quantile regression functions for θ=0.1, 0.3, 0.5, 0.7, 0.9 are

superimposed on the scatter plots.

From Figure 4.1 and 4.2 we can see that individuals with higher education have higher

wages than individuals with lower education, married individuals have higher wages than

not-married individuals, and males have higher wages than females. From Figure 4.1 (upper)

and 4.2 (upper) we can see that the 0.9th quantile regression functions looks unreasonably

highly estimated under education year 10 by VcSVQR but not by ncVcSVQR. From Figure

4.1 (lower) and 4.2 (lower) we can see that the estimated quantile regression functions

by VcSVQR are crossed under age 8 but the estimated quantile regression functions by

ncVcSVQR are not crossed.

(8)

Figure 4.2 The estimated quantile regression functions of the logarithm of wages estimated by the noncrossing varying coefficient quatile regression. The quantile regression functions for θ=0.1, 0.3, 0.5, 0.7,

0.9 are superimposed on the scatter plots.

4.2. FEV data

We consider FEV (forced expiratory volume) data set studied in Kahn (2005), which consists of four variables collected regarding each of 654 boys and girls. The response variable y is FEV value in liters. Among variables possibly affecting FEV, we use age in years (u) as the smoothing variable, height in inches (x 1 ), indicator of gender (x 2 ), and indicator of non-smoking parents in childhood (x 3 ) as input variables. Input variables, x 2 and x 3 , are binary to indicate qualitative features of the individual. We define x 2 to be a binary variable taking on the value one for boy and the value zero for girl. We also define x 3 to be a binary variable taking on the value one for non-smoking parents in childhood (nonsmoke) and the value zero for smoking at least one parent in childhood (smoke).

Correlation analysis shows that all variables u, x 1 , x 2 and x 3 have positive correlation coefficient values with y, which are 0.7565, 0.8681, 0.2084, and 0.2454, respectively. From the coefficients we can interpret that tall boys of non-smoking parents in childhood with older age will have a higher chance of having larger FEV values.

Figure 4.3 and 4.4 shows plots of the estimated quantile regression functions by VcSVQR

and ncVcSVQR of FEV values versus ages for boy-smoking mother (upper-left), boy-nonsmoke

(upper-right), girl-smoke (lower-left), girl- nonsmoke (lower-right), respectively, where each

estimated quantile regression function is obtained by using the average height of each age.

(9)

Figure 4.3 The estimated quantile regression functions of FEV estimated by the varying coefficient quatile regression. The quantile regression functions for θ=0.1, 0.3, 0.5, 0.7, 0.9 are superimposed on the

scatter plot.

From Figure 4.3 and 4.4 we can see that boys and girls of smoking mothers in childhood have lower FEV values than those of non-smoking mothers, and the young have lower FEV values than the old. And we also can see that the quantile regression functions looks un- reasonably low estimated in age above 16 by VcSVQR but looks reasonably estimated by ncVcSVQR.

5. Conclusions

In this paper we have proposed a new non-crossing quantile regression method which

uses varying coefficient support vector quantile regression and heteroscedastic location-scale

model as basic model. To show the effectiveness of the proposed method we have used two

real examples. Through numerical studies, we found that the proposed method derives a

satisfied solution to estimating non-crossing quantile regression when multiple quantiles are

required and captures well the characteristics of data.

(10)

Figure 4.4 The estimated quantile regression functions of FEV estimated by the noncrossing varying coefficient quatile regression. The quantile regression functions for θ=0.1, 0.3, 0.5, 0.7, 0.9 are

superimposed on the scatter plot.

References

Cawley, G. C., Talbot, N. L. C., Foxall, R. J., Dorling, S. R. and Mandic, D. P. (2004). Heteroscedastic kernel ridge regression. Neurocomputing, 57, 105-124.

Cole, T. J. (1990). The LMS method for constructing normalized growth standards. European Journal of Clinical Nutrition, 44, 45-60.

Hardle, W. (1989). Applied nonparametric regression, Cambridge University Press, Cambridge.

Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of Royal Statistical Society B , 55, 757-796.

He, X. (1997). Quantile curves without crossing. The American Statistician, 51, 86-192.

Heagerty, P. J. and Pepe, M. S. (1999). Semiparametric estimation of regression quantiles with application to standardizing weight for height and age in US children. Applied Statistics, 48, 533-551.

Hwang, C. and Shim, J. (2017a). Geographically weighted least squares-support vector machine. Journal of Korean Data & Information Science Society, 28, 227-235.

Hwang, C. and Shim, J. (2017b). Feature selection in the semivarying coefficient LS-SVR. Journal of Korean Data & Information Science Society, 18, 461-471.

Kahn, M. (2005). An exhalent problem for teaching Statistics. The Journal of Statistical Education, 13, http://jse.amstat.org/v13n2/datasets.kahn.html

Koenker, R. and Bassett, G. (1978). Regression quantile. Econometrica, 46, 33-50.

Kuhn, H. W. and Tucker, A. W. (1951). Nonlinear programming. In Proceedings of 2nd Berkeley Sympo- sium, Berkeley, University of California Press, 481-492.

Li, Y., Liu, Y. and Ji, Z. (2007). Quantile regression in reproducing kernel Hilbert spaces. Journal of the American Statistical Association, 102, 255-268.

Mercer, J. (1909). Functions of positive and negative and their connection with the theory of integral

(11)

equations. Philosophical Transactions of the Royal Society A, 415-44.

Shim, J., Hwang, C. and Seok, K. (2016). Support vector quantile regression with varying coefficients.

Computational Statistics, 31, 1015-1030.

Shim, J., Seok, K. and Hwang, C. (2009) Non-crossing quantile regression via doubly penalized kernel machine. Computational Statistics, 24, 83-94.

Shim, J., Seok, K. and Hwang, C. (2017). Monotone support vector quantile regression. Communications in Statistics-Theory & Methods, 31, 5180-5193.

Suykens, J. A. K., Vandewalle, J. and DeMoor, B. (2001). Optimal control by leasts quares support vector machines. Neural Network , 14, 23-35.

Takeuchi, I. (2004). Non-crossing quantile regression curves by support vector and its efficient implementa- tion. In Proceedings of 2004 IEEE IJCNN , 1, 401-406.

Takeuchi, I., Le, Q. V., Sears, T. D. and Smola, A. J. (2006). Nonparametric quantile estimation. Journal of Machine Learning Research, 7, 1231-1264.

Vapnik, V. (1995). The nature of statistical learning theory, Springer-Verlag, New York.

Wooldridge, J. M. (2012). Introductory econometrics: A modern approach, South-Western Cengage Learn- ing, Mason.

Yuan, M. (2006). GACV for quantile smoothing splines. Computational Statistics and Data Analysis, 50,

813-829.

수치

Figure 4.1 The estimated quantile regression functions of the logarithm of wages by the varying coefficient quantile regression
Figure 4.2 The estimated quantile regression functions of the logarithm of wages estimated by the noncrossing varying coefficient quatile regression
Figure 4.3 The estimated quantile regression functions of FEV estimated by the varying coefficient quatile regression
Figure 4.4 The estimated quantile regression functions of FEV estimated by the noncrossing varying coefficient quatile regression

참조

관련 문서

In this paper we propose a censored varying coefficient regression model using Buckley-James method to consider situations where the regression coefficients of the model

We propose a feature selection method able to address this issue using generalized cross validation functions of the varying coefficient least squares support vector

Just as classical linear regression methods based on minimizing sum of squared residuals enable us to estimate a wide variety of models for conditional mean

In this study, Korean professional baseball pitchers’ annual salaries were estimated using pitcher records 2010 to 2018 season.. A quantile regression which is more robust to

Keywords: Back propagation algorithm, deep neural network, generalized cross valida- tion function, grid search, least squares support vector regression, multilayer neural

We adopt the absolute deviation error function for a loss function of regression model, and the proposed algorithms preserves the structure of the least squares support

We suggest a Bayesian inference for quantile PCA regression with subset of principal components based on singular value decomposition, and we consider shrinkage priors on

Check loss function is commonly used for quantile regression in model fitting and model (or tuning parameter) selec- tion.. As we are interested in the tuning parameter selection,