On Approximate Prediction Intervals for Support Vector Machine Regression

(1)

2 002 , V ol. 13, N o.2 p p . 65～75

On A ppro x im at e P re dic t ion In t e rv al s for S u pport V e c t or M ac h in e R e g re s s ion

K y u n g h a S e ok ¹⁾ , Ch an g h a H w an g ^{2 )} , D aeh y e on Ch o ^{3 )} A b s tra c t

T h e support v ect or m ach in e (S V M ), fir st dev elop ed by V apn ik an d h is gr oup at A T &T Bell Lab or at orie s , is b ein g u sed a s a n ew t echn iqu e for r egr es sion an d cla s sificat ion pr oblem s . In t his paper w e pre sen t an approach t o est im at in g appr ox im at e pr edict ion in t erv als for S V M r egr es sion b a sed on post erior pr edictiv e den sities . F u rt h erm or e, t h e m et h od is illu str at ed w it h a dat a ex am ple.

1 . Intro du c tion

T h e supp ort v ect or m achin e (S V M ), fir st dev eloped b y V apnik an d his gr ou p at A T &T Bell Lab or at ories , is b ein g u sed a s a n ew t echn iqu e for r eg re s sion an d cla s sificat ion pr oblem s . S VM is g ainin g popularit y du e t o m any at t r act iv e feat ur es , an d pr om isin g em pirical perform an ce. S V M h a s b een su cces sfu lly applied t o a n um b er of r eal w orld pr oblem s su ch a s h an dw rit t en ch ar act er an d digit r ecog nition , face det ection , t ex t cat eg orizat ion an d obj ect det ect ion in m ach in e v ision . T h e aforem ent ion ed applicat ion s r elat e t o cla s sification pr oblem s . bu t S V M is also w idely applicable in regr es sion problem s . S V M w a s initially dev eloped t o s olv e cla s sificat ion pr oblem s , bu t r ecen tly it h a s b een ex t en ded t o th e dom ain of r eg r es sion pr ob lem s . H ow ev er , S V M cla s sificat ion can b e v iew ed a s a special ca se of S VM r eg r es sion . F or t ut orial in tr odu ct ion s an d ov erv iew s of recent dev elopm ent s of S VM regr es sion , see Gun n (1998 ), S m ola & S ch ölk opf (1998 ), an d 1. As s ociat e Pr ofes s or , Dat a Science Dept ., Inj e Univ er sity , Kyungnam , 621- 749, Kor ea .

E - m ail : skh @s t at .inj e.ac.kr

2. Dept . of St atistical Infor m ation , Cat holic Univ er sit y of Daegu , Kyungbuk , 712- 702, Kor ea .

3. Dept . of Dat a Science, Inj e Univ er sity , Kyungnam , 621- 749, Kor ea

(2)

V apn ik (1995, 1998).

S V M is b a s ed on t h e stru ctu r al risk m in im ization (S RM ) prin ciple, w h ich h a s b een sh ow n t o b e su perior t o tr adit ion al em pirical risk m inim izat ion (ERM ) prin ciple. S RM m in im izes an u pper b oun d on th e ex p ect ed risk un lik e ERM m inim izin g t h e err or on t h e t r ain in g dat a . By m inim izin g t his b ou n d , hig h g en er alization p erform an ce can b e achiev ed . B a sed on th is ob serv at ion w e b eliev e t h at S V M r egr es sion w ill perform b et t er w h en calculat in g pr ediction in t erv als th an ot h er m et h ods su ch a s n eu r al n et w ork s an d M A RS . A s w e dem on st r at e , pr edict ion int erv als for S VM r eg r es sion can b e sligh tly difficult t o obt ain . De V eaux el.

al (1998 ) com pu t ed pr edict ion int erv als for n eur al n et w ork s an d com par ed t h em w it h pr ediction int erv als b a s ed on M A RS . S ee Bish op (1995) for n eu ral n et w ork s an d M A RS .

Despit e it s su cces s es in m any real w orld application s , S V M depen d s only on a on e w eigh t solu tion r epre sen t ed in dir ect ly by t h e set of Lag r an g ian m u lt iplier s in m ak in g pr ediction s . H ow ev er , accor din g t o a Bay esian p er spectiv e, t h e w eigh t s of S V M , ev en aft er learn in g , still t ak e a cert ain post erior dist ribu t ion . T hu s , ut ilizin g j u st a on e w eig ht s olut ion a s r epr es ent at iv e n eg lect s post erior u n cert aint y in t h e w eig ht s . T his oft en lead s t o m or e ex t r em e predict ed ou tput s du rin g t estin g , an d in t u rn in dicat es an ov erly high confiden ce. T h e appr opriat e w ay t o h an dle t h e se w eig ht par am et er s is t o u tilize pr edict iv e post erior den sities oft en u s ed in Bay esian S t at istics . K w ok (1999) u sed t his idea t o g et t h e m oderat ed out pu t s for S VM cla s sificat ion un der t h e n am e of m ar gin alization r epr esen t ed by M acK ay (1992). In t his paper w e apply t his idea t o com pu tin g predict ion int erv als for S VM r eg r es sion . S ollich (2000) gu ar an t ee s th at w e can apply B ay esian m et h ods t o S VM .

T h e purpose of th is pap er is t o pre sen t a B ay esian appr oach t o estim atin g pr edict ion int erv als for S VM r egr es sion an d t o illu st rat e t h e m et h od w ith an ex am ple u sin g th e sam e r eal dat a a s in De V eau x el. al (1998 ). T h e r est of th is p aper is or g anized a s follow s . S ect ion 2 g iv es an ov erv iew of S VM regr es sion . S ection 3 briefly rev iew s t h e r elat ion sh ip b et w een th e lik elih ood prin ciple an d S V M r eg re s sion . S ect ion 4 discu s s es h ow t o com pu t e pr ediction in t erv als for S V M r eg r es sion , an d in S ect ion 5 th e m eth od is illu str at ed w it h a dat a ex am ple .

2 . S u pport V e c t or M ac hin e Re g re s s io n

Let th e tr ainin g dat a set D b e den ot ed b y { ( x

_i

, y ) , i = 1, . . . , n }, w ith

each input x

_i

R

^d

an d t h e out pu t y

_i

R . Ou r g oal is t o fin d a fun ction

f ( x ) t h at h a s at m ost dev iat ion from t h e act u ally obt ain ed t ar g et s y

i

' s for all

t h e t r ain in g dat a , an d at th e sam e tim e, is a s flat a s pos sible. F latn es s h er e

m ean s t h at w e s eek sm all w . W e fir st con sider t h e ca se of lin ear r eg r es sion .

T h en , w e t ak e t h e form

(3)

f ( x ) = w

^t

x + b w it h w R

^d

, b R

w h er e su per script t r epr es ent s t h e t r an spos e of a v ect or . On e w ay t o en sur e t his is t o m inim ize th e Eu clidean n orm | | w | |

²

. F orm ally w e can w rit e th is pr oblem a s a con v ex opt im ization pr ob lem by r equirin g :

m inim ize 1

2 | | w | |

²

,

subj ect t o y

_i

- w

^t

x

_i

- b an d w

^t

x

_i

+ b - y

_i

T h e u n derly in g a s sum pt ion h er e is t h at th e conv ex opt im izat ion pr oblem is fea sible. S om et im es , h ow ev er , th is m ay n ot b e t h e ca s e, or w e als o m ay w an t t o allow for s om e err or s . T o m ak e it fea sib le, w e in tr odu ce slack v ariables

_i

an d

*

i

. H en ce w e arriv e at t h e form ulat ion st at ed in V apn ik (1995, 1998).

m in im ize 1

2 | | w | |

²

+ C

n

i = 1

(

_i

+

^*_i

) , (1)

subj ect t o { ^y

ⁱ

^- ^w

t

x

_i

- b +

_i

w

^t

x

_i

+ b - y

_i

+

^*_i

i

,

^*i

0 T h e con st an t C > 0 det erm in es t h e t r ade off b et w een t h e flat n e s s of f an d th e am oun t up t o w hich dev iation s lar g er t h an ar e t olerat ed . H er e,

_i

an d

^*_i

are slack v ariables r epr es ent in g u pper an d low er con st r ain t s on t h e out pu t s . T h e form u lat ion ab ov e corr esp on ds t o dealin g w ith V apnik ' s - in sen sitiv e los s fu n ct ion des crib ed by

| | = { ⁰ | | - ^{if |} ot h erw is e ^|

T h e k ey idea is t o con st ru ct a Lag ran g e fu n ct ion . H en ce w e pr oceed a s follow s :

L = 1

2 | | w ||

²

+ C

n

i = 1

(

_i

+

^*_i

) -

n

i = 1 i

( +

_i

- y

_i

+ w

^t

x

_i

+ b )

-

n i = 1

*

i

( +

^*_i

+ y

_i

- w

^t

x

_i

- b ) -

n

i = 1

(

_i _i

+

^*_i ^*_i

) ( 2)

W e n otice t h at th e posit iv it y con st r ain t s

_i

,

^*_i

,

_i

,

^*_i

0 sh ould b e sat isfied .

A ft er t akin g p artial deriv ativ es of equ at ion (2) w it h r eg ar d t o th e prim al v ariables

(4)

( w , b,

_i

,

^*_i

) an d plu g g in g th em in t o (2), w e g et th e opt im ization pr oblem b elow .

m ax

, ^*

- 1

2

n

i , j = 1

(

_i

-

^*_i

)(

_j

-

_j^*

) x

^t_i

x

_j

-

n

i = 1

(

_i

+

^*_i

) +

n

i = 1

y

_i

(

_i

-

^*_i

) w it h con st r ain t s

n

i = 1

(

_i

-

^*_i

) = 0 an d

_i

,

^*_i

[ 0 , C ] .

S olv in g th e ab ov e equ ation w it h t h e se con str aint s det erm in es t h e Lag r an g e m u lt iplier s ,

i

,

^*i

, an d t h e opt im al r egr es sion fun ct ion is giv en b y

w =

n

i = 1

(

_i

- '

^*_i

) x

_i

, b = - 1

2 w

^t

[ x

_r

+ x

_s

] ,

w h er e x

_r

an d x

_s

ar e supp ort v ect or s . H er e, t h e su pport v ect or s ar e dat a p oint s w h er e ex a ctly on e of t h e Lag ran g e m u lt iplier s is g r eat er t h an zero. T h erefor e, t h e opt im al r egr es sion fun ct ion can b e r ew rit t en in t h e form of

f ( x ) =

n

i = 1

(

_i

-

^*_i

) x

_i^t

x + b .

A ctu ally , b can b e com pu t ed b y u sin g t h e K aru sh - Ku hn - T u ck er (K KT ) con dit ion s . S ee for det ails S m ola & S ch lk opf (1998 ).

W e n ow con sider th e ca se of n onlin ear S V M r eg r es sion . A n on lin ear m odel is u su ally r equired t o adequ at ely m odel dat a . In t his ca s e S VM r egr es sion fir st m ap s x fr om t h e in put space R

^d

t o z = (x ) in a hig h dim en sion al feat u re sp ace w h er e lin ear r eg re s sion perform ed . T h e k ern el appr oach is em ploy ed t o addre s s t h e cur se of dim en sion alit y . T h e n on lin ear S V M r egr es sion s olut ion , u sin g an - in s en sit iv e los s fun ction , is g iv en by

m ax

, ^*

- 1

2

n

i , j = 1

(

_i

-

^*_i

) (

_j

-

_j^*

)K ( x

_i

, x

_j

) -

n

i = 1

(

_i

+

^*_i

) +

n

i = 1

y

_i

(

_i

-

^*_i

)

w it h con st r ain t s

n

i = 1

(

_i

-

^*_i

) = 0 an d

_i

,

^*_i

[ 0 , C ] .

S olv in g t h e ab ov e equ ation w ith t h e se con st r ain t s det erm in es t h e Lagr an g e m u lt iplier s ,

_i

,

^*_i

, an d t h e opt im al r egr es sion fun ct ion is giv en b y

f ( x ) =

n

i = 1

(

_i

-

^*_i

)K ( x

_i

, x ) + b (3)

(5)

w h er e

b = - 1

2

n

i = 1

(

_i

-

^*_i

) [ K ( x

_r

, x

_i

) + K ( x

_s

, x

_i

) ] ,

T h e differ en ce t o t h e lin ear ca se is t h at w is n o lon g er ex plicit ly g iv en . H ow ev er , it is u niqu ely defin ed in t h e w eak sen se b y th e dot pr odu ct s . A ls o n ot e t h at in t h e n onlin ear set t in g , t h e opt im izat ion pr oblem corr espon d s t o fin din g t h e flatt est fu n ct ion in th e feat ur e space , n ot in th e in pu t space. In t his pap er w e only con sider t h e n on lin ear ca se. T hu s , for a t est v ect or x R

^m

, w e fir st n eed t o com pu t e

a ( x , w ) = w

^t

z + b =

n

i = 1

(

_i

-

^*_i

)K ( x

_i

, x ) + b,

w h er e

w

^t

z =

n

i = 1

(

_i

-

^*_i

)K ( x

_i

, x ) .

3 . Lik elih o o d P rin c iple an d S V M Re g re s s io n

In th is sect ion , w e describ e t h e r elat ion ship b et w een t h e lik elih ood prin ciple an d S V M r eg re s sion . In or der t o det erm in e w , w e m inim ize (1), w h ich , for a fix ed C, is t h e sam e a s m in im izin g

| | w | |

²

2 C +

n

i = 1

(

_i

+

^*_i

) . (4)

W e pu t = 1/ C . T h en , a m odel H , w ith a k - dim en sion al par am et er v ect or w , con sist s of it s fu n ct ion al form f , t h e distribut ion p ( D w , H ) , t h at t h e m odel m ak es ab ou t t h e dat a D , an d a prior par am et er dist ribu tion p ( w , H ) w it h a regu larization par am et er of . T h er efor e, w e h av e t h e post erior dist ribu t ion of w for a g iv en v alu e of by u sin g Bay e s ' rule :

p ( w D , , H ) ∝ p ( w , H ) p ( D w , H ) . (5)

N ow , con sider t h e follow in g pr ob ab ilit y m odel:

T h e prior ov er w is th e n orm al prior

(6)

p ( w , H ) ∝ exp ( -

2 | | w | |

²

) . T h e pr ob abilit y dist rib ut ion is giv en by

p ( y

_i

x

_{i ,}

w , H ) = 1

2 ( 1 + ) exp ( - |

_i

| ) .

N ot e th at p ( y

_i

x

_{i ,}

w , H ) is actu ally t h e den sit y m odel for an - in sen sitiv e los s fun ct ion . S ub stitu t in g t h es e pr ob abilities in t o (5 ) an d a s su m in g t h at t h e ob serv ation s ar e i.i.d ., w e ob t ain

- log p ( w D , , H ) =

2 | | w | |

²

+

n

i = 1

(

_i

+

^*_i

) -

n

i = 1

log p ( x

_i

) + con sta n t . (6 )

T h e la st t w o t erm s on t h e rig ht do n ot dep en d on w . H en ce, b y put tin g

= 1/ C , opt im izin g (1) can b e r eg ar ded a s fin din g th e m ax im um ap p os t eriori (M A P ) est im at e w

_{M P}

of w . M or eov er , t r adit ion al S V M regr es sion can b e con sider ed a s u sin g w

M P

a s th e s ole r epr esen t ativ e of t h e w h ole post erior dist ribu tion upon predict ion .

4 . P o s t e rior P re di ctiv e D i s trib uti on an d P re dic ti on In t erv al s

T o com put e pr edict ion int erv al for each out pu t y , w e n eed t o deriv e t h e p ost erior pr edict iv e dist ribu tion p ( y | D , x ) of y giv en t h e t r ain in g dat a s et D an d an in pu t v ect or x . W e fir st a s sum e th at th e post erior dist ribu t ion of w can b e approx im at ed by a sin g le n orm al dist ribu t ion at w

_{M P}

. S in ce a ( x , w ) = w

^t

z + b , th e post erior dist rib ut ion of a ( x , w ) w ill als o b e n orm al N ( a

_{M P ,}

s

²

) , w ith m ean a

_{M P}

( x ) = a ( x , w

_{M P}

) an d v arian ce

s

²

( x ) = z

^t

A

^{- 1}

z , (7)

w h er e A =

²

M =

²

(

2 | | w | |

²

+

n

i = 1

(

i

+ '

^*i

) ) is t h e H es sian . T hu s , w e can

g et th e post erior pr edictiv e den sit y giv en b elow

p ( y D , x ) = p ( y a , D , x ) p ( a D , x ) da

= 1

2 ( 1 + ) 2 s e

^{- |y - a |}

e

^{- ( a - a}^{M P}⁾

2/ 2 s²

d a

= 1

2 ( 1 + ) [ ^e

^{- y +} ⁺ ^s

3+ 2 aM P

2

( ^{y -} ^{- a} ^s

^{M P}

^{- s}

²

)

(7)

+ e

^{y +} ⁺

s²- 2 aM P

2

( ^{1 -} ( ^{y -} ^{- a} ^s

^{M P}

^{- s}

²

))

+ ( ^{y +} s ^{- a}

^{M P}

) ^- ( ^{y -} s ^{- a}

^{M P}

)] ^,

w h er e is th e distribut ion fun ct ion of a st an dar d n orm al distribut ion . A plot of t h e post erior predict iv e dist rib ut ion is sh ow n in F igu re 1. T his plot can b e m ade aft er w e t r ain S VM r eg r es sion for fix ed v alu es of C, , an d a k ern el param et er . If an int erv al sum m ary is desir ed , a cent r al in t erv al of post erior pr edictiv e pr ob ab ilit y , w hich corr espon d s , in t h e ca s e of a 100 (1- a )% in t erv al, t o t h e ran g e of v alu es ab ov e an d b elow w hich lies ex actly 100 (a/ 2 )% of t h e post erior pr edict iv e pr ob ab ilit y can b e calcu lat ed . S u ch in t erv al estim at es ar e r eferred t o a s pr edict ion int erv als .

N ow , it is left t o illu st r at e h ow t o com put e s

²

( x ) . T o com pu t e s

²

( x ) in (7 ), w e h av e t o det erm in e t h e H es sian A . W e alr eady kn ow t h at com put in g t h e H es sian m at rix for n eur al n et w ork is v ery com plicat ed . But , w e w ill see th at it is m u ch sim pler t o com pu t e t h e H es sian m at rix for S V M regr es sion . W e k n ow t h at

i ,

'

^*_i

m ea sur es t h e differ en ce b et w een y

_i

an d w

^t

z

_i

+ b. H er e, w e u s e z

_i

for b ot h lin ear an d n on lin ear ca ses . T hu s , w e h av e

i

= step ( y

i

- a

i

) ( y

i

- a

i

- )

*

i

= step ( a

_i

- y

_i

) ( a

_i

- y

_i

- )

(8)

F igu r e 1: P ost er pr edict iv e den sity an d S t an dar d n orm al den sit y

w h er e a

_i

= w

^t

z

_i

+ b an d step ( x ) is th e st ep fun ction . It sh ould b e n ot ed th at step ( x ) is n ot different iab le. T h u s , w e r eplace it by th e sig m oid fun ction ( x ) = 1/ (1 + e

^- ^x

) . S in ce a

_i

= z

_i

an d

²

a

_i

= O , w e obt ain

2

i

= r ( y

_i

- a

_i

) z

_i

z

^t_i

an d

² ^*_i

= r ( a

_i

- y

_i

) z

_i

z

_i^t

. H er e, O is th e zer o m at rix an d r ( x ) = ( x - ) ″( x ) + 2 ′( x ) . F in ally , w e h av e A = I + B w h er e

B =

n

i = 1

r

_i

z

_i

z

^t_i

an d r

_i

r ( | y

_i

- a

_i

|) .

W e n ow com put e t h e eig env alu es

_k

of B . F or t h e lin ear ca s e, w e put z

_i

= x

_i

.. T h en , com put in g eig en v alu es

_k

is st raig ht for w ar d. H ow ev er , w e n eed s om et hin g m or e for t h e n on lin ear ca s e. W e n ow w ill ex plain in det ail h ow t o com pu t e eig en v alu es

_k

for t h e n onlin ear ca s e. W e kn ow th e t r ain in g alg orith m dep en d s on s om e k ern el fu n ct ion . W e pu t z

_i

= ( x

_i

) for som e m appin g . Let

v

_k

b e t h e eig env ect or s of B . T h en w e h av e v

_k

=

n

i = 1

u

_{k i}

z

_i

for som e

con st ant s u

k i

an d

k

z '

j^t

v

k

= z

j^t

B v

k .

. T his lead s t o

k

K u

k =

K K u

k ,

, w h er e u

k

= ( u

k 1 ,

. . . , u

k n^t

) , K is t h e n × n m at rix w it h elem en t s z

^t_i

z

_j

= K ( x

_{i ,}

x

_j

) an d K is an ot h er n × n m at rix w it h elem en t s r

_i

z

^t_i

z

_j

= r

_i

K ( x

_{i ,}

x

_j

) . In g en er al, w e can a s sum e t h at K is in v ert ible.

H en ce, w e h av e

_k

u

_k

= K u

_{k ,}

an d obt ain t h e eig en v alu es

_k

of A a s

k

= +

_{k .}

U sin g t h es e re sult s , A can b e appr ox im at ed a s

A P A P

^t

, (8)

w h er e

P = ( v

_{1 ,}

. . . , v

_m

) = (

n

n = 1

u

_1i

z

_{i ,}

. . . ,

n

i = 1

u

_{m i}

z

_i

) , (9)

an d

= d iag ( +

1 ,

. . . , +

m

) (10)

H er e, th e v

k

' s ar e orth og on al an d m is t h e nu m b er of sign ificant eig env alu es in t h e n × n m at rix K .

Ut ilizin g (8 ), (9 ), (10) an d th e fa ct t h at P

^{- 1}

= P

^t

, s

²

( x ) in (7) can t h en b e com pu t ed a s

s

²

= z

^t

A

^{- 1}

z = z

^t

P

^{- 1}

P z = ( P

^t

z )

^t ^{- 1}

( P

^t

z )

(9)

=

n i = 1

1 +

_i

(

j = iⁿ

u

ij

K ( x

j ,

x ) )

²

⁽¹¹⁾

5 . D at a E x am ple

A s ex plain ed in S ection 1, S V M r egr es sion is b a sed on t h e S RM prin ciple, w h ich m in im izes an u pp er b oun d on t h e ex p ect ed risk , un lik e E RM w h ich m inim izes t h e err or on t h e t r ain in g dat a . By m in im izin g th is b ou n d , S V M r eg r es sion ach iev es high g en er alizat ion p erform an ce. T hu s , w e can gu es s th at S V M r egr es sion w ill perform b ett er t h an pr edict ion int erv als b a sed on ot h er m et h od s su ch a s n eur al n et w ork s an d M A RS .

In th is s ect ion w e w ill illu st r at e h ow t o com put e appr ox im at e pr ediction int erv als w it h t h e dat a s et in De V eaux e t al.(1993 ). F or t his dat a set De V eau x e t al. (1998 ) com pu t ed pr edict ion int erv als for n eu r al n et w ork s an d com par ed t h em w it h pr ediction in t erv als b a sed on M A R S/ GA M . T h ey sh ow ed t h at t h e M A RS/ GA M fit appear s t o b e a slight ly sm ooth er fit th an t h e n eu ral n et w ork m odel w it h corr espon din gly sligh tly sm ooth er predict ion in t erv als . F r om t w o figu re s for pr ediction int erv als in t his p aper an d De V eau x e t al. (1998) w e can pr esu m ably com p ar e th e p erform an ce s of su ch m et h od s . T his dat a s et is fr om a p oly m er pr oces s w it h 10 pr edict or v ariable s ( x

1 ,

. . . , x

10

) an d a sin gle r esp on se v ariable y . B ecau se t h e dat a are pr opriet ary , n o oth er in form ation is av ailable on t h e v ariables . T h e dat a set con sist of 61ob serv at ion s an d are av ailable v ia ftp at ft p .cis .u pen n .edu : pub/ un g ar/ ch em dat a .

F igu r e2: P r edict ion In t erv als for S V M Reg r es sion

S h a o (1999 ) dis cu s ses s ev er al m odel select ion crit eria an d fin ally ar gu es t h e m odel s election crit erion b a sed on V C - dim en sion w ork s b est . H ow ev er , in th is p aper w e b a sically u s e cr os s v alidat ion m et h od t o det erm in e m odel p ar am et er s sin ce it g en erally w ork s qu it e w ell. T h e Gau s sian k ern el K ( x , y ) = ex p {

- x - y

²

2

²

} is u s ed in t his ex perim ent for S V M r egr es sion . F or S VM

r eg r es sion C, an d sh ould b e pr e - sp ecified . H er e, w e ch ose C = 100, = 0.01

an d = 0.541. B a sed on ou r sim ulation stu dies , w e fou n d t h at S V M m odel is n ot

s en sit iv e t o ch oices of C an d t h at v ary a lit tle b it fr om our ch oices of C = 100

an d = 0.01. T h e im port an t par am et er t h at does r equir e carefu l con sideration is

t h e k ern el par am et er . T o det erm in e t h e k ern el par am et er , w e p erform ed 10- fold

cros s - v alidation s . In each ru n , 50 p oint s w er e u sed for tr ainin g an d t h e r em ainin g

11 w ere u s ed t o com pu t e t h e pr edict ion sum of - in sen sitiv e los ses . T h e b e st

r esult on t h e pr ediction su m of - in sen sitiv e los s es w a s obt ain ed u sin g = 0.541.

(10)

T w o st an dar d err or pr ediction in t erv als for = 5,000 ar e sh ow n in F igu r e 2.

B ecau se t h ere ar e m ult iple input s , w e h av e or dered t h e r espon s es in th is plot . A s

s een from F igu r e 2, t h e est im at es of t h e pr ediction v arian ce at alm ost ev ery p oint

s eem s t o b e st able an d r ea son ab le. F r om t w o fig ur es for predict ion int erv als in

t his pap er an d De V eaux e t. al (1998 ) w e can pr esum ably com par e th e

p erform an ce s of su ch m et h od s . T h e S VM fit appear s t o b e a m u ch sm oot h er fit

t h an M A R S/ GA M an d t h e n eu r al n et w ork fit in De V eau x e t. al (1998 ). In addition ,

S V M g iv es sligh tly sm ooth er pr ediction int erv als . T h e predict ion in t erv als for

S V M ar e som ew h at n arrow er com p ar ed w it h t h e oth er t w o m odels in De V eau x

e t. al (1998 ). W e ch eck ed th e effect of v ary in g in t h e sigm oid fu n ction ( x ) on

t h e av er ag e size of t h e pr edict ion in t erv als for S VM regr es sion . W h en = 1,000

t h e av er ag e size is 0.181 an d decr ea s es t o 0.081 w h en = 5,000 an d fu rth er

decr ea ses t o 0.057 w h en = 10,000. T h e av er ag e size of th e pr edict ion in t erv als

for t h e ot h er t w o m odels is v ery clos e, 0.235 for t h e n eu ral n et w ork an d 0.253 for

t h e GA M m odel. T h is ex perim en t in dicat es th at S VM com par es fav or ably w ith

ot h er m et h od s for r egr es sion m odellin g . T o con clu de, w e r ecom m en d S V M a s t h e

t ech niqu e for r egr es sion m odellin g .

(11)

R e f e re n c e s

1. Bish op , C.M . (1995 ). N eur al N et w ork s for P att ern R ecog nit ion , Ox for d.

2. DeV eau x , R ., S chum i, J ., S ch w ein sb er g , J ., S h ellin gt on , D . an d Un g ar , L.H . (1998 ). P r edict ion In t erv als for N eur al N et w ork s v ia N onlin ear Reg r es sion , T echn o- m et rics 40, 4, 273 - 282.

3. D eV eaux , R .D ., P sich og ios an d U n g ar , L .H . (1993 ). A com paris on of t w o n on - par am et ric est im at ion sch em es : M A RS an d N eur al N et w ork s , Com put er s in Ch em ical En gin eerin g , 17, 8, 819 - 837.

4. Gu nn , S . (1998 ). S u pp ort V ect or M ach in es for Cla s sificat ion an d R egr es sion , IS IS T echn ical Rep ort , U . of S out h am pt on .

5. K w ok , J .T . (1999 ). M oder at in g t h e Ou t put s of S u pport V ect or M ach in e Cla s sifier s , IEEE T r an s act ion s on N eur al N et w ork s 10, 5, 1018- 1031.

6. M a cK ay , D .J .C. (1992). B ay esian Int erpolation , N eu ral Com pu t at ion 4, 3, 415 - 447

7. S h ao, X . (1999 ). M odel S elect ion U sin g S t at ist ical Learnin g T h eory , P h . D . T h esis , U . of M inn esot a .

8. Smola, A.J. and Schölkopf, B. (1998). A T utorial on Support Vector Regression , N eur oCOLT 2 T echn ical Report , N eu roCOLT .

9. S ollich , P . (2000). P r ob abilistic M et h od s for S upport V ect or M ach in es , A dv an ces in N eur al Inform at ion P r oces sin g S y st em s 12, t o appear .

10. V apnik , V . (1995 ). T h e N atu r e of S t atist ical Learnin g T h eory , S prin g er . 11. V apnik , V . (1998 ). S t at ist ical Learnin g T h eory , S prin g er .

[ 2002년 6월 접수, 2002년 9월 채택 ]

On Approximate Prediction Intervals for Support Vector Machine Regression

2 002 , V ol. 13, N o.2 p p . 65～75