2 002 , V ol. 13, N o.2 p p 355~363
B o ot s t rap Con fide n c e In t e rv al s f or R e g re s s ion Co e f fic ie n t s u n de r Ce n s ore d
D at a
K il H o Ch o1 ) . S e o n g H w a Je on g2 )
A b s trac t
U sin g t h e Bu ckley - J am es m et h od, w e con st ru ct b oot st r ap confiden ce in t erv als for t h e r eg r es sion coefficien t s un der t h e cen sor ed dat a . A n d w e com p ar e th ese confiden ce in t erv als in t erm s of t h e cov er ag e pr ob abilit ie s an d th e ex p ect ed confiden ce in t erv al len g th s th r ou gh M ont e Carlo sim u lation .
1 . In tro du c ti on
In in du st rial life t est in g an d m edical follow - up stu dies , cen sorin g is com m on b ecau se of t im e lim it s an d oth er re st rict ion s on dat a collection .
W e con sider th e lin ear r eg r es sion m odel
Yi = 0 + 1X i + i ( i = 1 , , n )
w h er e t h e r espon s es Yi ar e right cen s or ed , an d th e err or s i ar e in dep en den tly an d iden t ically dist ribu t ed (i.i.d .) r an dom v ariable s w ith m ean zer o an d finit e v arian ce.
1. Pr ofes s or , Departm ent of St at istics , Kyungpook National Univ er sit y , T aegu , 702- 701, Kor ea
E - m ail : khcho@knu .ac.kr
2. Depar tm ent of St atist ics , Kyungpook Nation al Univ er sity , T aegu , 702- 701, Kor ea
If Yi is cen s or ed at t im e Ci, t h en th e cen sor ed r esp on se Zi can b e w rit t en by
Zi = Yi i + Ci( 1 - i) = m in ( Yi, Ci) ( i = 1 , , n ) , w h er e Ci , , Cn ar e cen sorin g v alu es an d i ar e t h e in dicat or v ariab les
i =
{
1 if Yi Ci ( un cen s o red ) 0 if Yi > Ci ( cen s ored )M iller (1976) con sider ed th e m eth od s of e st im at in g t h e r eg r es sion coefficient s of a lin ear r eg r es sion m odel u n der t y pe I cen sor ed dat a . Bu ckley an d J am es (1979 ) su g g est ed t h e m eth od t o a ccom m odat e cen s or ed ob serv at ion s t o estim at e r egr es sion coefficient s .
Un der un cen s or ed dat a , th e lea st squ ar es e st im at or s of 0 an d 1 ar e t h e v alu es a an d b w h ich s at isfy t h e equ ation s
n
i = 1( Yi - a - b X i ) = 0
an d
n
i = 1( X i - X )( Yi - b X i ) = 0 .
T o accom m odat e cen s or ed ob s erv at ion s , let
Y*i = Yi i + E ( Yi Yi > Ci) ( 1 - i) i = 1 , , n .
T h en t h e est im at or of 1 by Bu ckley an d J am es m et h od is t h e v alu e b w h ich s at isfies t h e equ ation
n
i = 1( X i - X ) ( Y*i - b X i) = 0 . S in ce E ( Yi Yi > Ci) is
u nk n ow n , w e adopt a self- con sist en cy appr oach in Bu ckley an d J am es (1979 ).
Rit ov (1990) an d Currie (1996 ) inv estig at ed t h e a sy m pt ot ic pr opert ies of t h e Bu ck ley - J am es estim at or .
T h e b oot st r ap m eth od h a s b een in tr odu ced by Efron (1979 ) w h o g av e a s eries of ex am ples t o illu st r at e t h e v alidit y of t h e appr oach for a w ide cla s s of st atist ics . T h e accu r acy of t h e b oot st rap appr ox im ation t o t h e sam plin g dist rib ut ion of v ariou s st atist ics h a s b een in v est ig at ed by Bick el an d F r eedm an (1981). Efr on (1981 an d 1987 ) h a s dev elop ed m et h ods for con st ru ct in g appr ox im at e confiden ce in t erv als for a par am et er . T h es e int erv als ar e con stru ct ed from t h e b oot st r ap dist ribu tion of an est im at or .
In t his paper , w e con sider t h e con stru ction of b oot st rap con fiden ce in t erv als for t h e regr es sion coefficient s by t h e Bu ckley - J am es m eth od . A n d w e com par e t h e cov er ag e pr ob abilities an d th e ex pect ed confiden ce in t erv al len gt h s for t h e se
con fiden ce int erv als t hr ou gh M ont e Carlo sim u lation .
2 . B o ot s trap Con fide n c e In t erv al s
T h e b oot st rap m eth od is a u sefu l t ool for con st ru ct in g con fiden ce int erv al or b ein g applicable in situ at ion s w h er e ex plicit ly an aly tical solu tion can n ot b e fou n d or t h e e st im at or is m ath em atically com plicat ed . A ls o, t h e confiden ce in t erv als b y b oot str ap m et h od do n ot r equir e th eor et ical calcu lat ion .
Let X = ( X 1 ,X 2 , , X n) b e th e r an dom s am ple fr om an u nkn ow n dist ribu t ion F , an d ( X 1 ,X 2 , , X n) b e an estim at or of . T h e b oot str ap m eth od is ex t r em ely sim ple in prin ciple.
S t ep 1. Con st ru ct t h e sam ple prob abilit y dist ribu tion F .
S t ep 2. W it h F fix ed, dra w a r an dom sam ple of size n fr om F, say X *i F , i = 1 , 2 , , n .
Call t his "t h e b oot st r ap s am ple ",
X * = ( X *1, X *2, , X *n) .
S t ep 3. A ppr ox im at e th e sam plin g dist ribu tion of by t h e b oot str ap dist ribu tion of
* ( X *1 , X *2, , X *n) .
W e con sider Bu ck ley - J am es estim at or for a lin ear r eg r es sion m odel un der cen s or ed dat a by b oot st r ap r esam plin g m eth od .
T h e r esidu al, , is
= Y - X ( 1 , 2 , , n ).
T h en th e cent er ed r esidu als , i , ar e g iv en by
i = i - 1
n
n
j = 1 j , i = 1 , 2 , , n.
Let Fn b e t h e em pirical distribut ion of i. T h at is , Fn put s m a s s 1/ n at i an d x d Fn = 0 . Giv en Y , let *1, , *n b e con dit ion ally in depen dent r an dom v ariable s w it h com m on dist ribu t ion Fn. A n d let * b e t h e n - dim en sion al v ect or w h ose i- th com pon ent is *i. W e put
Y* = X + *.
In form ally , * is obt ain ed by re sam plin g t h e cen t er ed r esidu als . N ow , w e con st ru ct
b oot str ap s am ple ( Z*i , *i , X i) , w h er e
Z*i = m in ( Y*i , C*i) an d
*
i =
{
0 ,1 , ifif YY*i*i > CC*i*i .T h en t h e est im at or s of r eg r es sion coefficien t s b y Bu ckley - J am es m et h od ar e g iv en by
1
* = { u Y*i ( X i- X ) + c Yi*( 1*) ( X i- X )}/ {i = 1n ( X i- X )2 }
an d
0
* = n- 1{ u Y*i + c Yi*( 1*)}- 1
* X ,
w h er e u an d c den ot e su m m at ion s ov er t h e un cen sor ed an d t h e cen s or ed v alu es only , r espect iv ely .
W e con st ru ct t h e b oot st r ap con fiden ce in t erv als of t h e r egr es sion coefficien t s u n der cen sored dat a a s t h e follow in g s ;
S t ep 1. Gen er at e b oot st rap sam ples ( Z* bi , * bi , X bi ) , i = 1 , , n , b = 1 , , B . S t ep 2. Calculat e th e b oot st rap est im at or s * b ( 0
* b, 1
* b) corr esp on din g t o each b oot str ap sam ples , ( Z* bi , * bi , X bi ) , i = 1 , , n ,
b = 1 , , B .
S t ep 3 . Con st ru ct t h e con fid en ce in t er v als of r eg r e s sion coefficien t s b y t h r ee b oot st r ap m et h od s t h at ar e p er cen t ile , b ia s corr ect ed (B C ) p er cen t ile , an d b ia s cor r ect ed an d a cceler at ed (B Ca ) p er cen t ile .
P er cent ile con fiden ce int erv al is con st ru ct ed by u sin g th e em pirical distribut ion fu n ct ion of b oot st r appin g estim ation s . Let G b e th e em pirical dist ribu tion fu n ction of t h e b oot str ap dist ribu tion of *. If t h e b oot str ap dist ribu tion can b e obt ain ed by M ont e Carlo m et h od , th en G is est im at ed from t h e b oot st r ap replicat ion s
* 1, , * B a s
G ( t) = 1 B
B
b = 1I ( * b t )
w h er e t is a r eal v alu e an d B is a r eplication nu m b er of b oot str ap . Let G- 1 ( ) b e t h e 100 per cen tile of *,
G- 1( ) = {t : G ( t ) }.
T h er efor e, G- 1( ) is th e B t h v alu e in t h e or dered list of * b an d G- 1( 1 - ) is t h e B ( 1 - ) t h v alu e in t h e or der ed list of * b. T h en , 100 ( 1 - ) % p er cent ile con fiden ce int erv al of regr es sion coefficient is
( G- 1( 2 ) , G- 1( 1 -
2 )).
Bia s corr ect ed confiden ce int erv al is t h e m et h od t o con stru ct int erv al by corr ect in g t h e bia s of t h e b oot st r ap distribut ion . Bia s corr ect ion is ob t ain ed by cent erin g t h e st an dar d n orm al cu m ulat iv e distribut ion fu n ct ion (cdf). T h e bia s corr ect ion z0 is giv en by
z0 = - 1(G ( )) = - 1[B1
B
b = 1I( * b )]
w h er e - 1 is t h e inv er s e fu n ction of th e st an dar d n orm al cdf. T h en 100 ( 1 - ) % b ia s corr ect ed p er cen tile confiden ce int erv al is
( G- 1( 1) , G- 1( 2))
w h er e 1 = (2 z0 + z / 2) an d 2 = (2 z0 + z1 - / 2).
If G ( ) = 0 . 5 , t h at is , if h alf of th e b oot st r ap distribut ion of * is les s t h an th e ob serv ed v alu e , t h en z0= 0 an d t h e p er cen tile m et h od an d t h e B C m et h od a gr ee.
Bia s corr ect ed an d a cceler at ed (BCa ) m et h od h a s m ore sat isfa ct ory cov er ag e pr ob ab ilit y th an percent ile m et h od . W e com pu t e t h e bia s corr ect ion z0 in BC
m et h od an d acceler at ion a giv en b y
a =
n
i = 1( ( ) - ( i ))3
6 {i = 1n ( ( ) - ( i ))2}2 / 3
w h er e
( ) = 1
n
n
i = 1 ( i ) ,
an d ( i) is t h e est im at or defin ded on th e sub s am ple w h ich arises w h en t h e i
t h ob serv ation h a s b een delet ed . T h en 100 ( 1 - ) % B Ca con fiden ce in t erv al of r eg r es sion coefficien t is
( G- 1( 3) , G- 1( 4) )
w h er e
3=
(
z0 + 1 - a ( zz0+ z0+ z/ 2 / 2))
an d
4=
(
z0 + 1 - a ( zz0+ z0+ z1 - 1 -/ 2 / 2))
.If z0 an d a ar e zer o t h en 3 = z / 2 an d 4 = z1 - / 2. T hu s t h e percent ile m et h od an d th e BCa m eth od agr ee .
3 . E x am ple an d S im u l atio n S tu dy
A s t h e dat a for th e ex am ple, w e u s e S t anford h eart t ran splan t dat a w hich w a s g iv en by M iller (1976). T h e dat e for t h e st art of t h e tr an splan t at ion pr ogr am at S t an for d w a s on Oct ob er 1 in 1967, an d t h e cut - off dat e for th is an aly sis w a s on A pril 1 in 1974. Durin g t his t im e in t erv al, 69 pat ien t s r eceiv ed h eart t r an splant s , an d t h e len g t h s of th eir surv iv al, in day s , aft er tr an splan t at ion ar e con sider ed a s r espon se v ariable s . A n d ag es of patient s at t ran splan t ar e con sider ed a s an ex plan at ory v ariable. In accor dan ce w it h Bu ckley - J am es m et h od w e t ak e t h e r espon se v ariables t o log surv iv al tim es t o t h e b a s e 10.
N ow , w e com pu t e th e b oot str ap regr es sion coefficient s for h eart tr an splan t dat a
t hr ou g h t h e 1,000 b oot st r ap r eplication s , an d con st ru ct confiden ce in t erv als of th em b y t h e p er cen tile, BC an d B Ca m et h od s . T h e re sult s ar e list ed in T able 3.1.
T ab l e 3 .1 Reg re s sion coefficien t s an d con fiden ce int erv als
Regr es sion coefficient s 0 = 3.5962 1 = - 0.0278
P er cen tile int erv al ( 1.1575 , 4.9726 ) ( - 0.0761 , - 0.0009 ) BC in t erv al ( 2.1690 , 5.6167 ) ( - 0.0635 , - 0.0025 ) B Ca int erv al ( 2.1032 , 5.2167 ) ( - 0.0635 , - 0.0025 )
A lso, w e inv e st ig at e t h e b oot str ap e st im at es an d t h e b oot st r ap confiden ce int erv als th rou g h th e M ont e Carlo sim u lat ion . W e a s su m e t h at a s a lifetim e dist ribu tion , W eibull dist ribu t ion (W eib ) w it h p ar am et er s = 1 . 0 an d = 0 . 8 (decr ea sin g failu r e r at e ), = 1 . 0 (con st an t failur e r at e ) an d = 1 . 5 (in cr ea sin g failu r e r at e ) ar e con sider ed . A n d a s cen sorin g dist rib ut ion , ex pon en t ial (Ex p ) dist ribu tion w ith par am et er s w hich t h e cen sorin g r at es ar e appr ox im at ely 10 % an d 30 % , r esp ect iv ely , ar e con sidered. T h e M on t e Carlo sim ulat ion s ar e p erform ed for all com bin at ion s of lifet im e distribut ion s an d cen sorin g dist ribu tion s w it h sam ple sizes 40 an d 60. T h e sim ulat ion ex perim ent is p erform ed 1,000 r eplicat ion s an d 1,000 b oot st r ap r eplicat ion s for each ca se. T h e crit eria u sed t o com par e th e con fiden ce int erv als ar e cov er ag e pr ob ability an d th e ex pect ed len g th of con fiden ce int erv al. W e u se 0.90 a s a n om in al pr ob abilit y . T h e r esu lt s of th ese sim u lation s ar e g iv en in T able 3.2. W e do n ot list th e r esu lt s for s am ple size 60 b ecau s e of sim ilarity a s s am ple size 40. F r om t his t able , w e can ob serv e t h e follow in g fact s ; (1) T h e cov er ag e pr ob ab ilit ies of b oot str ap con fiden ce int erv als ar e n early n om in al cov er ag e 0.90.
(2) T h e cov er ag e pr ob ab ilit ies of all th e con fiden ce int erv als decr ea s e a s cen sorin g r at e in crea ses .
(3 ) T h e cov er ag e pr ob abilities of all th e con fiden ce in t erv als in cr ea se a s n in crea ses .
(4 ) BCa m et h od am on g t h e b oot str ap con fiden ce in t erv als h a s t h e sh ort est ex pect ed con fiden ce len gt h .
T h er efor e, w e can kn ow t h at th e b oot str ap m et h od is v ery effect iv e for confiden ce int erv als of r eg re s sion coefficien t s u n der t h e cen sored dat a .
T able 3 .2 Boot strap confiden ce int erv als
M et h od s
Cen s orin g
r at e 10% 30%
0 1 0 1
W eib (1.0, 0.8 )
P ercent ile Len g th 16.941 0.358 9.414 0.393
Cov er ag e 0.883 0.887 0.869 0.883
BC Len g th 13.081 0.355 10.180 0.362
Cov er ag e 0.891 0.897 0.880 0.894
BCa Len g th 11.197 0.337 9.341 0.351
Cov er ag e 0.900 0.901 0.900 0.906
M et h ods
Cen s orin g
r at e 10% 30%
0 1 0 1
W eib (1.0, 1.0)
P er cent ile Len g th 15.974 0.351 9.638 0.217
Cov er ag e 0.879 0.888 0.873 0.887
B C Len g th 15.537 0.395 12.417 0.228
Cov er ag e 0.899 0.897 0.897 0.895
BCa Len g th 10.192 0.365 9.340 0.161
Cov er ag e 0.901 0.901 0.899 0.902
M et h ods
Cen s orin g
r at e 10% 30%
0 1 0 1
W eib (1.0, 1.5 )
P er cent ile Len g th 15.968 0.351 11.794 0.335
Cov er ag e 0.891 0.894 0.880 0.890
B C Len g th 13.614 0.338 14.301 0.341
Cov er ag e 0.894 0.895 0.892 0.897
BCa Len g th 13.964 0.337 10.162 0.306
Cov er ag e 0.895 0.901 0.896 0.902
R e f ere n c e s
1. Bick el, P . an d F r eedm an , D . (1981). S om e a sy m pt otic t h eory for t h e b oot st r ap . A nnals of S ta tis t ics , 9, 1196 - 1217
2. Bu ckley , J . an d J am es , I. R . (1979). Lin ear r egr es sion w it h cen sor ed dat a . B iom e tr ica, 66, 429 - 436.
3. Currie, I. D . (1996 ). A n ot e on Bu ckley - J am es estim at or s for cen sor ed dat a . B iom e tr ick a, 83, 912- 915
4. Efron , B . (1979 ). Boot st r ap m et h ods ; A n oth er look at th e j ackk nife. A nnals of S ta tis tics , 7, 1- 26.
5. Efron , B . (1981). N onp ar am etric st an dar d error s an d confiden ce in t erv al.
Canad ian J ournal of S ta t is t ics, 9, 139 - 172.
6. Efron , B . (1987 ). Bet t er b oot str ap con fiden ce int erv al, J ournal of th e A m er ican S ta t is tical A s s ocia t ion, 82, 171- 185.
7. F r eedm an , D . (1981). Boot st r appin g r egr es sion m odel. A nnals of S ta t is tics , 9, 1218 - 1228.
8. M iller , R . G. (1976 ), Lea st s qu ar es r eg r es sion w it h cen sor ed dat a , B iom e tr ika, 63, 449 - 464.
9. Rit ov , Y . (1990), E st im at ion in a lin ear r egr es sion m odel w it h cen sor ed dat a . A nnals of S ta t is tics , 18, 303- 328.
[ 2002년 9월 접수, 2002년 10월 채택 ]