• 검색 결과가 없습니다.

Regression Quantile Estimations on Censored Survival Data

N/A
N/A
Protected

Academic year: 2021

Share "Regression Quantile Estimations on Censored Survival Data"

Copied!
8
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

2 002 , V ol. 13, N o.2 p p . 31~38

Re g re s s ion Qu antile E s tim ation s on Ce n s ore d S urv iv al D at a

Jo o Y on g S him 1)

A b s tract

In t h e ca se of m ult iple su rv iv al t im es w h ich m igh t b e cen sor ed at ea - ch cov ariat e v ect or , w e st u dy t h e r eg re s sion qu an tile est im at ion s in th is p aper . T h e est im ation s ar e b a s ed on t h e em pirical dist rib ut ion fun ct in s of t h e cen s or ed tim es an d t h e sam ple qu ant iles of t h e ob serv ed su rv iv al t im e s at each cov ariat e v ect or an d th e w eig ht ed lea st s qu ar e m et h od is applied for t h e estim at ion of th e r eg r es sion qu an tile . T h e est im at or s ar e sh ow n t o b e a sy m pt ot ically n orm ally dist ribu t ed u n der s om e regu larity con dition s .

K ey wor ds an d P h r ases : Cen sorin g , Cov ariat e, E m ipirical dist ribu tion fun ct ion , Qu an tile r eg r es sion m odel, S am ple qu an t ile, S u rv iv al dat a

1 . In tro du c tio n s

T he acceler at ed failure tim e m odel in th e surv iv al dat a an aly sis r egr es ses t he

log arithm s of th e surv iv al tim es on th e corr espon din g cov ariat e v ect or , w hich is

appealin g t o th e pr actition er s du e t o th e dir ect phy sical int erpret ation s . T he m ean of th e

log arithm s of surviv al tim es are relat ed t o th e cov ariat e in th e accelerat ed failure tim e

m odel, w hich cau ses difficulty t o est im at e th e m ean of surv iv al t im es or th e int er cept

param et er . Yin g (1995) pr oposed a m edian r egr es sion m odel a s an alt ern ativ e t o th e

m ean regres sion m odel for ex am inin g th e cov ariat e effect on th e surv iv al patt ern .

Y an g (1999 ) proposed a m edian r egres sion estim at or w hich are b ased on th e w eight ed

em pirical surv iv al an d h azar d fun ction s w ith out est im at in g th e distribution s of th e

cen sored tim es , an d sh ow ed t hat t he estim at or s ar e con sist ent an d a sym pt oticcally

1. Depar tm ent of St atist ics , Kyungpook Nation al Univ er sity , T aegu , 702- 701, Kor ea

(2)

distribut ed.

T he m edian is a sim ple an d m eanin gful m ea sure of th e cent er of th e thick t ailed distribution of th e surv iv al tim es , an d can be w ell estim at ed ev en for n ot t oo h eav y cen sorin g . But u su ally larg e or sm all qu ant ile depen ds on th e cov ariat e differ ently fr om th e m edian , w hich leads t o con sider th e qu ant ile regres sion appr oach . Koenker an d Ba s sett (1978 ) intr odu ced th e qu antile regres sion m odel, th at th e qu antiles of respon ses ar e lin early relat ed t o t he cov ariat e v ect or a s follow s

Q( | x i ) = x i ( ) for ( 0 , 1 ) ,

w her e Q( | x i ) is th e - th qu antile of th e r espon ses giv en t he cov ariat e v ect or x i . T h e estim at or of t he - th regres sion qu ant ile ( ) is defin ed as th e v alu e of m inim izin g th e obj ect iv e fun ction ,

R ( ) =

n

i = 1 ( y i - x i ) for ( 0 , 1 ) ,

ov er all in som e param et er space B ( ) w h ere ( ) is th e check fun ction defin ed a s

( r ) = rI ( r 0) + ( - 1) rI ( r < 0 ) ,

w her e I ( ) is t he in dicat or fun ct ion . T h e m edian estim at or is ea sily seen t o be a special ca se of = 1/ 2. P ow ell(1986 ) stu died th e cen sor ed qu ant ile regres sion , w h er e ob serv at ion s could n ot be ob serv ed b elow th e fix ed lev el 0 in th e r egr ession m odel.

T h e cen sor ed regres sion qu ant ile estim at or is defin ed a s th e v alue of m inim izin g t he obj ectiv e fun ct ion,

Q n ( , ) = 1 n

n

i = 1 ( y i - m ax (0 , x i ) ) .

Lin dgren (1997) su g g est ed estim atin g th e con dit ion al qu antiles n onpar am etrically w ith a local Kaplan - M eier estim at or s of th e surv iv al fun ction s for m ore g en eral cen sored ca se.

In this paper w e propose th e regres sion quantile estim at or s for th e surv iv al dat a w ith

m ult iple ob serv ation s at each cov ariat e v ect or , x 1 , , x k , un der th e presen ce of

cen sorin g s . Ba sed on th e sam ple qu antiles from m ultiple ob serv at ion s , th e r egr es sion

qu antile est im at or is obt ain ed by th e w eight ed lea st squ are m eth od. W e sh ow t hat th e

pr oposed estim at or s ar e asy m pt ot ically n orm ally dist ribut ed an d th ey perform w ell ev en

(3)

in t he h et er osceda stic error m odel v ia th e sim ulation stu dy .

2 . Re g re s s ion Qu antile E s tim ation s

Let T ij be th e surviv al tim e of th e j - th in div idu al correspon din g t o t he cov ariat e v ect or , x i or tran sform at ion on it , w h ere j = 1 , 2 , , n i , i = 1 , 2 , , k . Let x i b e t he th e as sociat ed cov ariat e v ect or w ith p +1 com pon ent s , w h ere th e fir st com pon ent is set t o 1. Let q i ( ) b e th e th et a - t h quantile of T ij giv en x i for in (0,1) t hen

q i ( ) = inf { t : P ( T ij t | x i ) } .

A s sum e th at t he q i ( ) is lin early relat ed t o th e cov ariat e v ect or x i as

q i ( ) = x i ( ) for i = 1, 2 , , k , (2.1)

w her e ( ) is a (p +1)- dim en sion al regres sion qu antile m odellin g th e cov ariat e effect on th e - th qu antiles of T ij .

T he ob serv ed surv iv al tim e is Y ij = m in ( T ij , C ij ) an d ij = I ( T ij C ij ) , w her e C ij is th e cen sor ed tim e correspon din g t o x i for j = 1 , 2 , , n i . C ij ' s are a s sum ed t o b e in depen dently distribut ed w ith unkn ow n distribution fun ction s G i ' s . S in ce T ij an d C ij ar e a s sum ed t o b e in depen dent giv en x i ,

P ( Y ij q i ( ) | x i ) = 1 - P ( T ij > q i ( ) | x i )P ( C ij > q i ( ) | x i ) 1 - ( 1 - ) ( 1 - G i ( q i ( ) ) ) .

(2.2)

T h e right - h an d side of (2.2) depen d s on an d G i ( q i ( ) ) , n ot on t he distribution of t he surv iv al tim es . Den ot e it by p i th en q i ( ) can b e set t o sat isfy

q i ( ) = inf {y : P ( Y ij y | x i ) p i },

t hat is, q i ( ) can be th e p i - th qu antile of Y ij giv en x i . F or a fix ed v alu e of p i

q i ( ) can b e estim at ed by th e p i - th sam ple qu antile of Y ij , w hich is th e v alue

of q i m inim izin g th e obj ectiv e fun ction ,

(4)

n

i

j = 1 p

i

( y ij - q i ) for i = 1 , 2 , , k .

Giv en x i , th e distribution fun ction G i of C ij can b e estim at ed by th e em pirical distribution fun ction ,

G i ( y ) = 1 m i

n

i

j = 1 I ( Y ij y | ij = 0 ) for i = 1 , 2 , , k (2.3)

w her e m i = n i -

n

i

j = 1 ij .

T h en for each i = 1 , 2 , , k , q i ( ) can b e estim at ed by it eration m eth od as follow s : i) S et q i ( ) t o th e initial v alu e q i ( ) ( 0 ) .

ii) p ( l ) i = 1 - ( 1 - ) ( 1 - G i ( q i ( ) ( l ) ) )

iii ) q i ( ) ( l + 1) is th e v alue of q i m inim izin g th e obj ectiv e fun ction ,

n

i

j = 1 p ( y ij - q i ) w ith p = p i ( l )

Un der th e as sum pt ion th at giv en x i t he distribut ion fun ct ion F i of Y ij is n ot flat in th e n eighb orh ood of F - 1 i ( p i ), q i ( ) is th e con sist ent estim at or of F i - 1 ( p i ) , w her e F - 1 i ( p i ) is th e p i - th qu ant ile of Y ij giv en x i an d p i = 1 - ( 1 -

) ( 1 - G i ( q i ( ) ) ) . Un der th e a s sum ption th at for each i, giv en x i , Y ij ' s are in depen dently distribut ed w ith distribution fun ction , F i , w h ose den sit y , f i , is continu ou s an d positiv e,

q i ( ) A N ( F - 1 i ( p i ) , p i ( 1 - p i ) n i f 2 i ( F i - 1 ( p i ) ) ) .

F or each fix ed y , - < y < , t he em pirical distribution fun ct ion G i ( y ) is th e con sist ent estim at or of G i ( y ) , w hich im plies th at p i is th e con sist ent estim at or of

p i = 1 - ( 1 - ) ( 1 - G i ( q i ( ) ) ) . T h e lim itin g distribution of q i ( ) can b e obt ain ed

a s

(5)

q i ( ) A N ( F - 1 i ( p i ) , p i ( 1 - p i ) n i f 2 i ( F i - 1 ( p i ) ) )

= A N ( q i ( ) , p i ( 1 - p i ) n i f 2 i ( q i ( ) ) )

(2.4 )

U sin g q i ( ) th e - t h r egr es sion qu antile ( ) of T ij can b e estim at ed by th e v alu e b m inim izin g th e obj ectiv e fun ction ,

k

i = 1 n i ( q i ( ) - x i b) 2 ,

w hich is ,

( ) = (

k

i = 1 n i x i ' x i ) - 1

k

i = 1 n i x i ' q i ( ) . (2.5)

Un der th e follow in g a s sum ption s :

a 1) F or each i , giv en x i , Y ij ' s are in depen dently dist ribut ed w ith distribution fun ction F i w hich is n ot flat in th e n eighborh ood of q i ( ) an d den sity f i is cont inu ou s an d posit iv e.

a2) x 1 , , x k span R p + 1 . a3) n i

N i a s N w h ere N =

k

i = 1 n i .

U sin g (2.4 ) an d (2.5) th e lim itin g distribution of th e est im at or of th e - t h regres sion qu antiles can b e obt ain ed as

( ) A N ( ( ) , 1

N D - 1 VD - 1 ) (2.6)

w her e

D =

k

i = 1 i x i ' x i an d V =

k i = 1

i p i ( 1 - p i )

f 2 i ( q i ( ) ) x i ' x i .

3 . N um eric al S tu die s

T o inv estig at e t he perform an ce of th e proposed estim at or s th e sim ulat ion st udy is con du ct ed. F or th e com parison , t he m edian estim at or s of Yin g et al.(1995 ) ar e obt ain ed.

Let T ij b e th e j - th surv iv al tim e corr espon din g t o th e cov ariat e v ect or x i for i

= 1 , 2 , 3 . A ssum e th at th e surv iv al t im es T ij ' s are fr om th e lin ear r egr es sion m odel:

(6)

T ij = x i + e ij for j = 1 , 2 , , n i , i = 1, 2 , 3 , (3.1) w her e err or t erm s e ij ' s follow a logistic distribution w it h a location param et er of 0 an d a scale param et er of x i , w hich leads

P ( T ij t | x i ) = P ( T 0

t - x i x i

) ,

w her e T 0 follow s a st an dar d logistic distribution , = ( 0 , 1 ) ' an d = ( 0 , .

1 ) ' T h e cen sored tim es C ij ' s are a s sum ed t o follow a epon ential distribution w it h a locat ion param et er c oi an d a scale par am et er x i a s

P ( C ij t | x i ) = P ( C 0 t - c oi

x i ) I ( t c oi ) ,

w her e C 0 follow s a st an dar d ex pon ential dist ribution an d c oi ' s are ch osen t o be 10%

of pr oportion of cen sorin g .

S et x i = ( 1 , i - 1) an d n 1 =150, n 2 =200, n 3 =250 th en th e regres sion qu antile of t he surv iv al tim es in th e m odel (2.1) is ,

( ) = + F 0 - 1 ( ) , ( 0 , 1) , (3.2)

w her e F 0 - 1 ( ) is t he - th quantile of T 0 , w hich is th e - t h qu antile of th e st an dard logistic dist ribution . F or each x i , i = 1, 2 , 3 , w ith = ( 1 , 1) ' an d =(0.5,0.5)' , 500 r an dom sam ples of { Y ij , ij , j = 1, 2 , , n i } ar e g en er at ed . T h en for = 0.25, 0.5, 0.75, tru e v alu es of r egres sion qu ant iles of th e surviv al tim es ar e obt ain ed by (3.2) a s, respectiv ely ,

( 0 . 25 ) = ( 0 .4507 0 .4507 ) , ( 0 . 5) = ( ) 1 . 0 1 . 0 , ( 0 . 75 ) = ( 1 .5493 1 .5493 ) .

F r om each sam ple of size 600, estim at or s of 2 com pon ent s of th e - th r egr es sion

qu antile, ( )an d ( ) , for = 0.25, 0.5, 0.75, are obt ain ed by (2.5) b a sed on th e

sam ple qu antiles of T ij estim at ed by th e it eration m eth od. T he estim at or s proposed by

Yin g e t a l . (1995 ), ( 0 .5 ) an d ( 0 . 5), are obt ain ed thr ou gh m inim ization of th e

obj ectiv e fun ction w hich m ay h av e m ultiple m inim a, t he sim ulat ed ann ealin g alg orithm

(7)

applied t o th e accelerat ed failure m odel by Lin an d Gey er (1992) is u sed for th e estim ation . Based on 500 sam ples th e em pirical m ean s an d m ean squ are error s of estim at or s are obt ain ed.

F rom T able 1 w e can see th at th e pr oposed estim at or s ( ) prov ides relativ ely accur at e result s ev en for t he h et eroscedastic error m odel. F or = 0.5 t he pr oposed estim at or s, 0 ( 0 .5 ) an d 1 ( 0 . 5) , h av e th e v alues of em pirical m ean s closer t o th e t ru e v alu es th an th e estim at or s of Yin g e t a l . (1995), Y 0 ( 0 .5 ) an d Y 1 ( 0 . 5). An d t he pr oposed estim at or s hav e sm aller v alues of m ean squ ar e error s th an th e estim at or s of Yin g e t a l . (1995 ).

T ab le 1 . M ean an d M S E of estim at or s of ( ) w it h 10% Cen sorin g

0.25 0.5 0.75

0 ( ) m ean 0.4481 0.9931 1.7074

m s e 0.1066 0.0934 0.5175

Y0 ( 0 . 5) m ean 1.0187

m s e 0.1304

1 ( ) m ean 0.4498 0.9975 1.6268

m s e 0.1283 0.1091 0.5802

Y 1 ( 0 . 5) m ean 0.9777

m s e 0.1291

4 . Rem ark s an d Con clu s ion s

Based on th e em pirical distribution fun ction of cen sored tim es an d th e sam ple qu antiles of th e ob serv ed surviv al tim es , th e est im at or s of r egr ession qu antiles are stu died in a cen sor ed qu antile regres sion m odel w ith m ultiple surv iv al tim es at each cov ariat e v ect or . T he qu antile r egr es sion appr oach det ect s th e rev er sal of tr eatm ent effect s w hich a proportion al h azard m odel does n ot perm it .

T he sim ulation stu dy sh ow s t hat th e pr oposed est im at or s perform w ell, ev en in th e

h et eroscedastic err or m odel, w h ere th e v arian ce of error s depen ds on t he cov ariat e

v ect or . An d a rath er ea sy num erical m eth od is r equired th an Yin g e t a l . (1995).

(8)

Re f e ren c e s

1. K oen k er , R . an d B a s sett , G. (1978). Regr es sion Qu an tile s , E con om e trica 46, 33 - 50.

2. Lin d, D . Y . an d Gey er , C. J . (1992). Com put at ion al M et h od s for S em ipar a - m et ric Lin ear Regr es sion W ith Cen sor ed D at a , J ournal of Com p u ta t ional an d Grap hical S ta tis t ics , 1, 77 - 90.

3. Lin dg r en , A . (1997 ). Qu ant ile Reg r es sion w it h Cen sored Dat a u sin g

Gen er alized L 1 M inim izat ion , Comp uta tional S ta tis tics & D a ta A naly s is , 23, 509 - 524.

4. P ow ell, J . (1986 ). Cen sor ed R egr es sion Qu ant iles . J ournal of E con om e trics , 32, 143 - 155.

5. Y an g , S . (1999). Cen sor ed M edian Reg r es sion U sin g W eig ht ed Em pirical S u rv iv al an d H azar d F un ct ion s , J ournal of the A m er ican S ta t is t ical A s s ocia tion, 94, 137 - 145.

6. Yin g , Z., Jun g , S . an d W ei, L . (1995 ). S urv iv al A n aly sis w it h Cen sor ed M edian R egr es sion M odels , J ournal of the A m er ican S ta t is t ical A ss ocia t ion, 90, 178 - 184.

[ 2002년 1월 접수, 2002년 5월 채택 ]

참조

관련 문서