• 검색 결과가 없습니다.

Bootstrap Confidence Intervals for Regression Coefficients under Censored Data

N/A
N/A
Protected

Academic year: 2021

Share "Bootstrap Confidence Intervals for Regression Coefficients under Censored Data"

Copied!
9
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

2 002 , V ol. 13, N o.2 p p 355~363

B o ot s t rap Con fide n c e In t e rv al s f or R e g re s s ion Co e f fic ie n t s u n de r Ce n s ore d

D at a

K il H o Ch o1 ) . S e o n g H w a Je on g2 )

A b s trac t

U sin g t h e Bu ckley - J am es m et h od, w e con st ru ct b oot st r ap confiden ce in t erv als for t h e r eg r es sion coefficien t s un der t h e cen sor ed dat a . A n d w e com p ar e th ese confiden ce in t erv als in t erm s of t h e cov er ag e pr ob abilit ie s an d th e ex p ect ed confiden ce in t erv al len g th s th r ou gh M ont e Carlo sim u lation .

1 . In tro du c ti on

In in du st rial life t est in g an d m edical follow - up stu dies , cen sorin g is com m on b ecau se of t im e lim it s an d oth er re st rict ion s on dat a collection .

W e con sider th e lin ear r eg r es sion m odel

Yi = 0 + 1X i + i ( i = 1 , , n )

w h er e t h e r espon s es Yi ar e right cen s or ed , an d th e err or s i ar e in dep en den tly an d iden t ically dist ribu t ed (i.i.d .) r an dom v ariable s w ith m ean zer o an d finit e v arian ce.

1. Pr ofes s or , Departm ent of St at istics , Kyungpook National Univ er sit y , T aegu , 702- 701, Kor ea

E - m ail : khcho@knu .ac.kr

2. Depar tm ent of St atist ics , Kyungpook Nation al Univ er sity , T aegu , 702- 701, Kor ea

(2)

If Yi is cen s or ed at t im e Ci, t h en th e cen sor ed r esp on se Zi can b e w rit t en by

Zi = Yi i + Ci( 1 - i) = m in ( Yi, Ci) ( i = 1 , , n ) , w h er e Ci , , Cn ar e cen sorin g v alu es an d i ar e t h e in dicat or v ariab les

i =

{

1 if Yi Ci ( un cen s o red ) 0 if Yi > Ci ( cen s ored )

M iller (1976) con sider ed th e m eth od s of e st im at in g t h e r eg r es sion coefficient s of a lin ear r eg r es sion m odel u n der t y pe I cen sor ed dat a . Bu ckley an d J am es (1979 ) su g g est ed t h e m eth od t o a ccom m odat e cen s or ed ob serv at ion s t o estim at e r egr es sion coefficient s .

Un der un cen s or ed dat a , th e lea st squ ar es e st im at or s of 0 an d 1 ar e t h e v alu es a an d b w h ich s at isfy t h e equ ation s

n

i = 1( Yi - a - b X i ) = 0

an d

n

i = 1( X i - X )( Yi - b X i ) = 0 .

T o accom m odat e cen s or ed ob s erv at ion s , let

Y*i = Yi i + E ( Yi Yi > Ci) ( 1 - i) i = 1 , , n .

T h en t h e est im at or of 1 by Bu ckley an d J am es m et h od is t h e v alu e b w h ich s at isfies t h e equ ation

n

i = 1( X i - X ) ( Y*i - b X i) = 0 . S in ce E ( Yi Yi > Ci) is

u nk n ow n , w e adopt a self- con sist en cy appr oach in Bu ckley an d J am es (1979 ).

Rit ov (1990) an d Currie (1996 ) inv estig at ed t h e a sy m pt ot ic pr opert ies of t h e Bu ck ley - J am es estim at or .

T h e b oot st r ap m eth od h a s b een in tr odu ced by Efron (1979 ) w h o g av e a s eries of ex am ples t o illu st r at e t h e v alidit y of t h e appr oach for a w ide cla s s of st atist ics . T h e accu r acy of t h e b oot st rap appr ox im ation t o t h e sam plin g dist rib ut ion of v ariou s st atist ics h a s b een in v est ig at ed by Bick el an d F r eedm an (1981). Efr on (1981 an d 1987 ) h a s dev elop ed m et h ods for con st ru ct in g appr ox im at e confiden ce in t erv als for a par am et er . T h es e int erv als ar e con stru ct ed from t h e b oot st r ap dist ribu tion of an est im at or .

In t his paper , w e con sider t h e con stru ction of b oot st rap con fiden ce in t erv als for t h e regr es sion coefficient s by t h e Bu ckley - J am es m eth od . A n d w e com par e t h e cov er ag e pr ob abilities an d th e ex pect ed confiden ce in t erv al len gt h s for t h e se

(3)

con fiden ce int erv als t hr ou gh M ont e Carlo sim u lation .

2 . B o ot s trap Con fide n c e In t erv al s

T h e b oot st rap m eth od is a u sefu l t ool for con st ru ct in g con fiden ce int erv al or b ein g applicable in situ at ion s w h er e ex plicit ly an aly tical solu tion can n ot b e fou n d or t h e e st im at or is m ath em atically com plicat ed . A ls o, t h e confiden ce in t erv als b y b oot str ap m et h od do n ot r equir e th eor et ical calcu lat ion .

Let X = ( X 1 ,X 2 , , X n) b e th e r an dom s am ple fr om an u nkn ow n dist ribu t ion F , an d ( X 1 ,X 2 , , X n) b e an estim at or of . T h e b oot str ap m eth od is ex t r em ely sim ple in prin ciple.

S t ep 1. Con st ru ct t h e sam ple prob abilit y dist ribu tion F .

S t ep 2. W it h F fix ed, dra w a r an dom sam ple of size n fr om F, say X *i F , i = 1 , 2 , , n .

Call t his "t h e b oot st r ap s am ple ",

X * = ( X *1, X *2, , X *n) .

S t ep 3. A ppr ox im at e th e sam plin g dist ribu tion of by t h e b oot str ap dist ribu tion of

* ( X *1 , X *2, , X *n) .

W e con sider Bu ck ley - J am es estim at or for a lin ear r eg r es sion m odel un der cen s or ed dat a by b oot st r ap r esam plin g m eth od .

T h e r esidu al, , is

= Y - X ( 1 , 2 , , n ).

T h en th e cent er ed r esidu als , i , ar e g iv en by

i = i - 1

n

n

j = 1 j , i = 1 , 2 , , n.

Let Fn b e t h e em pirical distribut ion of i. T h at is , Fn put s m a s s 1/ n at i an d x d Fn = 0 . Giv en Y , let *1, , *n b e con dit ion ally in depen dent r an dom v ariable s w it h com m on dist ribu t ion Fn. A n d let * b e t h e n - dim en sion al v ect or w h ose i- th com pon ent is *i. W e put

Y* = X + *.

(4)

In form ally , * is obt ain ed by re sam plin g t h e cen t er ed r esidu als . N ow , w e con st ru ct

b oot str ap s am ple ( Z*i , *i , X i) , w h er e

Z*i = m in ( Y*i , C*i) an d

*

i =

{

0 ,1 , ifif YY*i*i > CC*i*i .

T h en t h e est im at or s of r eg r es sion coefficien t s b y Bu ckley - J am es m et h od ar e g iv en by

1

* = { u Y*i ( X i- X ) + c Yi*( 1*) ( X i- X )}/ {i = 1n ( X i- X )2 }

an d

0

* = n- 1{ u Y*i + c Yi*( 1*)}- 1

* X ,

w h er e u an d c den ot e su m m at ion s ov er t h e un cen sor ed an d t h e cen s or ed v alu es only , r espect iv ely .

W e con st ru ct t h e b oot st r ap con fiden ce in t erv als of t h e r egr es sion coefficien t s u n der cen sored dat a a s t h e follow in g s ;

S t ep 1. Gen er at e b oot st rap sam ples ( Z* bi , * bi , X bi ) , i = 1 , , n , b = 1 , , B . S t ep 2. Calculat e th e b oot st rap est im at or s * b ( 0

* b, 1

* b) corr esp on din g t o each b oot str ap sam ples , ( Z* bi , * bi , X bi ) , i = 1 , , n ,

b = 1 , , B .

S t ep 3 . Con st ru ct t h e con fid en ce in t er v als of r eg r e s sion coefficien t s b y t h r ee b oot st r ap m et h od s t h at ar e p er cen t ile , b ia s corr ect ed (B C ) p er cen t ile , an d b ia s cor r ect ed an d a cceler at ed (B Ca ) p er cen t ile .

P er cent ile con fiden ce int erv al is con st ru ct ed by u sin g th e em pirical distribut ion fu n ct ion of b oot st r appin g estim ation s . Let G b e th e em pirical dist ribu tion fu n ction of t h e b oot str ap dist ribu tion of *. If t h e b oot str ap dist ribu tion can b e obt ain ed by M ont e Carlo m et h od , th en G is est im at ed from t h e b oot st r ap replicat ion s

* 1, , * B a s

(5)

G ( t) = 1 B

B

b = 1I ( * b t )

w h er e t is a r eal v alu e an d B is a r eplication nu m b er of b oot str ap . Let G- 1 ( ) b e t h e 100 per cen tile of *,

G- 1( ) = {t : G ( t ) }.

T h er efor e, G- 1( ) is th e B t h v alu e in t h e or dered list of * b an d G- 1( 1 - ) is t h e B ( 1 - ) t h v alu e in t h e or der ed list of * b. T h en , 100 ( 1 - ) % p er cent ile con fiden ce int erv al of regr es sion coefficient is

( G- 1( 2 ) , G- 1( 1 -

2 )).

Bia s corr ect ed confiden ce int erv al is t h e m et h od t o con stru ct int erv al by corr ect in g t h e bia s of t h e b oot st r ap distribut ion . Bia s corr ect ion is ob t ain ed by cent erin g t h e st an dar d n orm al cu m ulat iv e distribut ion fu n ct ion (cdf). T h e bia s corr ect ion z0 is giv en by

z0 = - 1(G ( )) = - 1[B1

B

b = 1I( * b )]

w h er e - 1 is t h e inv er s e fu n ction of th e st an dar d n orm al cdf. T h en 100 ( 1 - ) % b ia s corr ect ed p er cen tile confiden ce int erv al is

( G- 1( 1) , G- 1( 2))

w h er e 1 = (2 z0 + z / 2) an d 2 = (2 z0 + z1 - / 2).

If G ( ) = 0 . 5 , t h at is , if h alf of th e b oot st r ap distribut ion of * is les s t h an th e ob serv ed v alu e , t h en z0= 0 an d t h e p er cen tile m et h od an d t h e B C m et h od a gr ee.

Bia s corr ect ed an d a cceler at ed (BCa ) m et h od h a s m ore sat isfa ct ory cov er ag e pr ob ab ilit y th an percent ile m et h od . W e com pu t e t h e bia s corr ect ion z0 in BC

m et h od an d acceler at ion a giv en b y

(6)

a =

n

i = 1( ( ) - ( i ))3

6 {i = 1n ( ( ) - ( i ))2}2 / 3

w h er e

( ) = 1

n

n

i = 1 ( i ) ,

an d ( i) is t h e est im at or defin ded on th e sub s am ple w h ich arises w h en t h e i

t h ob serv ation h a s b een delet ed . T h en 100 ( 1 - ) % B Ca con fiden ce in t erv al of r eg r es sion coefficien t is

( G- 1( 3) , G- 1( 4) )

w h er e

3=

(

z0 + 1 - a ( zz0+ z0+ z/ 2 / 2)

)

an d

4=

(

z0 + 1 - a ( zz0+ z0+ z1 - 1 -/ 2 / 2)

)

.

If z0 an d a ar e zer o t h en 3 = z / 2 an d 4 = z1 - / 2. T hu s t h e percent ile m et h od an d th e BCa m eth od agr ee .

3 . E x am ple an d S im u l atio n S tu dy

A s t h e dat a for th e ex am ple, w e u s e S t anford h eart t ran splan t dat a w hich w a s g iv en by M iller (1976). T h e dat e for t h e st art of t h e tr an splan t at ion pr ogr am at S t an for d w a s on Oct ob er 1 in 1967, an d t h e cut - off dat e for th is an aly sis w a s on A pril 1 in 1974. Durin g t his t im e in t erv al, 69 pat ien t s r eceiv ed h eart t r an splant s , an d t h e len g t h s of th eir surv iv al, in day s , aft er tr an splan t at ion ar e con sider ed a s r espon se v ariable s . A n d ag es of patient s at t ran splan t ar e con sider ed a s an ex plan at ory v ariable. In accor dan ce w it h Bu ckley - J am es m et h od w e t ak e t h e r espon se v ariables t o log surv iv al tim es t o t h e b a s e 10.

N ow , w e com pu t e th e b oot str ap regr es sion coefficient s for h eart tr an splan t dat a

(7)

t hr ou g h t h e 1,000 b oot st r ap r eplication s , an d con st ru ct confiden ce in t erv als of th em b y t h e p er cen tile, BC an d B Ca m et h od s . T h e re sult s ar e list ed in T able 3.1.

T ab l e 3 .1 Reg re s sion coefficien t s an d con fiden ce int erv als

Regr es sion coefficient s 0 = 3.5962 1 = - 0.0278

P er cen tile int erv al ( 1.1575 , 4.9726 ) ( - 0.0761 , - 0.0009 ) BC in t erv al ( 2.1690 , 5.6167 ) ( - 0.0635 , - 0.0025 ) B Ca int erv al ( 2.1032 , 5.2167 ) ( - 0.0635 , - 0.0025 )

A lso, w e inv e st ig at e t h e b oot str ap e st im at es an d t h e b oot st r ap confiden ce int erv als th rou g h th e M ont e Carlo sim u lat ion . W e a s su m e t h at a s a lifetim e dist ribu tion , W eibull dist ribu t ion (W eib ) w it h p ar am et er s = 1 . 0 an d = 0 . 8 (decr ea sin g failu r e r at e ), = 1 . 0 (con st an t failur e r at e ) an d = 1 . 5 (in cr ea sin g failu r e r at e ) ar e con sider ed . A n d a s cen sorin g dist rib ut ion , ex pon en t ial (Ex p ) dist ribu tion w ith par am et er s w hich t h e cen sorin g r at es ar e appr ox im at ely 10 % an d 30 % , r esp ect iv ely , ar e con sidered. T h e M on t e Carlo sim ulat ion s ar e p erform ed for all com bin at ion s of lifet im e distribut ion s an d cen sorin g dist ribu tion s w it h sam ple sizes 40 an d 60. T h e sim ulat ion ex perim ent is p erform ed 1,000 r eplicat ion s an d 1,000 b oot st r ap r eplicat ion s for each ca se. T h e crit eria u sed t o com par e th e con fiden ce int erv als ar e cov er ag e pr ob ability an d th e ex pect ed len g th of con fiden ce int erv al. W e u se 0.90 a s a n om in al pr ob abilit y . T h e r esu lt s of th ese sim u lation s ar e g iv en in T able 3.2. W e do n ot list th e r esu lt s for s am ple size 60 b ecau s e of sim ilarity a s s am ple size 40. F r om t his t able , w e can ob serv e t h e follow in g fact s ; (1) T h e cov er ag e pr ob ab ilit ies of b oot str ap con fiden ce int erv als ar e n early n om in al cov er ag e 0.90.

(2) T h e cov er ag e pr ob ab ilit ies of all th e con fiden ce int erv als decr ea s e a s cen sorin g r at e in crea ses .

(3 ) T h e cov er ag e pr ob abilities of all th e con fiden ce in t erv als in cr ea se a s n in crea ses .

(4 ) BCa m et h od am on g t h e b oot str ap con fiden ce in t erv als h a s t h e sh ort est ex pect ed con fiden ce len gt h .

T h er efor e, w e can kn ow t h at th e b oot str ap m et h od is v ery effect iv e for confiden ce int erv als of r eg re s sion coefficien t s u n der t h e cen sored dat a .

(8)

T able 3 .2 Boot strap confiden ce int erv als

M et h od s

Cen s orin g

r at e 10% 30%

0 1 0 1

W eib (1.0, 0.8 )

P ercent ile Len g th 16.941 0.358 9.414 0.393

Cov er ag e 0.883 0.887 0.869 0.883

BC Len g th 13.081 0.355 10.180 0.362

Cov er ag e 0.891 0.897 0.880 0.894

BCa Len g th 11.197 0.337 9.341 0.351

Cov er ag e 0.900 0.901 0.900 0.906

M et h ods

Cen s orin g

r at e 10% 30%

0 1 0 1

W eib (1.0, 1.0)

P er cent ile Len g th 15.974 0.351 9.638 0.217

Cov er ag e 0.879 0.888 0.873 0.887

B C Len g th 15.537 0.395 12.417 0.228

Cov er ag e 0.899 0.897 0.897 0.895

BCa Len g th 10.192 0.365 9.340 0.161

Cov er ag e 0.901 0.901 0.899 0.902

M et h ods

Cen s orin g

r at e 10% 30%

0 1 0 1

W eib (1.0, 1.5 )

P er cent ile Len g th 15.968 0.351 11.794 0.335

Cov er ag e 0.891 0.894 0.880 0.890

B C Len g th 13.614 0.338 14.301 0.341

Cov er ag e 0.894 0.895 0.892 0.897

BCa Len g th 13.964 0.337 10.162 0.306

Cov er ag e 0.895 0.901 0.896 0.902

(9)

R e f ere n c e s

1. Bick el, P . an d F r eedm an , D . (1981). S om e a sy m pt otic t h eory for t h e b oot st r ap . A nnals of S ta tis t ics , 9, 1196 - 1217

2. Bu ckley , J . an d J am es , I. R . (1979). Lin ear r egr es sion w it h cen sor ed dat a . B iom e tr ica, 66, 429 - 436.

3. Currie, I. D . (1996 ). A n ot e on Bu ckley - J am es estim at or s for cen sor ed dat a . B iom e tr ick a, 83, 912- 915

4. Efron , B . (1979 ). Boot st r ap m et h ods ; A n oth er look at th e j ackk nife. A nnals of S ta tis tics , 7, 1- 26.

5. Efron , B . (1981). N onp ar am etric st an dar d error s an d confiden ce in t erv al.

Canad ian J ournal of S ta t is t ics, 9, 139 - 172.

6. Efron , B . (1987 ). Bet t er b oot str ap con fiden ce int erv al, J ournal of th e A m er ican S ta t is tical A s s ocia t ion, 82, 171- 185.

7. F r eedm an , D . (1981). Boot st r appin g r egr es sion m odel. A nnals of S ta t is tics , 9, 1218 - 1228.

8. M iller , R . G. (1976 ), Lea st s qu ar es r eg r es sion w it h cen sor ed dat a , B iom e tr ika, 63, 449 - 464.

9. Rit ov , Y . (1990), E st im at ion in a lin ear r egr es sion m odel w it h cen sor ed dat a . A nnals of S ta t is tics , 18, 303- 328.

[ 2002년 9월 접수, 2002년 10월 채택 ]

참조

관련 문서

• Connect to information (products: information server; data pub-lisher). • Understand information (data architect,

The result shows that multimedia materials are effective tools for raising students' satisfaction, interests, confidence, participation willingness, and

peking ,there are statistics cdiffences of confidence, company's welfare in co's image ,& co's propaganda in sexual, onlt confidence & co's welfare in

For a participation item, yoga participants had a high response in appearances, exercise confidence, and health; dance sports participants, in endurance;

3.2 Homogeneous Linear ODEs with Constant Coefficients.. 

2.2 Homogeneous Linear ODEs with Constant Coefficients 2.3 Differential

Table 5.4 Classified Train data in Decision Tree 81 Table 5.5 Classified Train data in Random Forest 83 Table 5.6 Result of Logistic Regression 85 Table 5.7 Classified Train

The measured values for benzene in DMF were compared with the The measured values for benzene in DMF were compared with the reference data, and