• 검색 결과가 없습니다.

Robust inference for linear regression model based on weighted least squares

N/A
N/A
Protected

Academic year: 2021

Share "Robust inference for linear regression model based on weighted least squares"

Copied!
14
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

2 002 , V o l. 13 , N o.2 p p . 2 71~2 84

R obu s t in f e re n c e for lin e ar re g re s s ion m o de l b a s e d on w e ig h t e d le a s t s qu are s 1 )

Jin - P y o P ark 2 )

A b s tra c t

In t his p aper w e con sider t h e r ob u st infer en ce for th e par am et er of lin ear r eg r es sion m odel b a sed on w eigh t ed lea st squ ar es . F ir st w e con sider t h e sequ ent ial t est of m ultiple ou t lier s . N ex t w e su g g est t h e w ay t o a s sig n a w eigh t t o each ob serv at ion ( x i , y i ) an d r ecom m en d th e r obu st infer en ce for lin ear m odel. F in ally , t o ch eck t h e perform an ce of con fiden ce int erv al for t h e slop e u sin g pr op os ed m et h od, w e con du ct ed a M ont e Carlo sim ulat ion an d pr esent ed som e nu m erical r esu lt s an d ex am ple s .

K e y W o rd s an d P h ra s e s : L eas t m ed ian of s quares , Ou tliers tes t, W e ig h ted leas t s quares

1 . IN T R OD U CT ION

W e con sider t h e r obu st infer en ce for th e par am et er s of lin ear r eg r es sion m odel y i = 0 + x i1 1 + + x ip p + e i , i = 1, 2 , , n (1)

w h ere t h e err or e i is a s sum ed t o b e n orm al dist rib ut ion w it h m ean zer o an d v arian ce 2 . Lea st s qu ar es estim at e of , an d st an dar d err or of , S E ( ) are s en sit iv e t o t h e ou t lier s . H en ce infer en ces for t h e p ar am et er s of lin ear r eg r es sion m odel u sin g an d S E ( ) ar e affect ed by out lier s . T o r em edy t his pr oblem , m an y st at ist ical m et h od h av e b een dev eloped .

In t his p aper , w e con sider t h e t ool of ident ify in g an d t est in g th e out lier s in lin ear r egr es sion m odel. T his t ool is b a s ed on t h e r atio of a r obu st s cale e st im at e 1. Res earch funded Kyungnam Univ er s ity , 2002

2. Pr ofes s or , Divis ion of inform ation & communication engin eering , Kyungnam Univ er sity .

(2)

an d n on r obu st s cale est im at e. A n d th en w e pr opose t h e for w ar d sequ ent ial pr ocedu re for iden tify in g t h e ou tlier s . N ex t w e con sider t h e m et h od t o a s sig n a w eig ht t o each ob s erv at ion ( x i , y i ). W e m ak e u se of n on in cr ea sin g fu n ct ion of t est st at istics a s a w eight t o each ob serv at ion . F in ally , w e apply a w eigh t ed lea st s qu ar es an aly sis t o int r odu ce r obu st in feren ce for lin ear r eg re s sion m odel. T he weighted least squares has a high breakdown point and is efficient in a st at ist ical s en s e.

H en ce, infer en ces for t h e par am et er s of lin ear regr es sion m odel u sin g w eig ht ed lea st squ ar es ar e n ot affect ed by ou tlier s

T h e r em ain in g of paper is or g an ized a s follow s . In S ect ion 2 w e in tr odu ce t h e t ool of iden tify in g an d t est in g t h e ou tlier s in lin ear r egr es sion m odel. W e su g g est t h e m eth od t o a s sign a w eigh t t o each ob serv at ion ( x i , y i ). W e pr opose t h e r obu st in fer en ce for th e par am et er s of lin ear r egr es sion m odel. In s ect ion 3 w e con sider th e cov er ag e an d m edian len gt h of con fiden ce in t erv al for t h e slop e b y m ean s of M ont e Carlo sim u lat ion . In section 4 w e apply t h e pr oposed m et h od t o s ev er al r eal dat a t o ch eck t h e p erform an ce of t h at . S ect ion 5 con t ain s s om e con clu din g r em ark s .

2 . T h e Ro bu s t inf ere n c e f or th e p aram et e rs of th e lin e ar re g re s s io n m o del

W e su g g e st th e w eigh t ed lea st s qu ar es regr es sion b a sed on t h e sequ ent ial out lier s t est pr opos ed b y Jinpy o P ark an d H eech an g P ark (2001). F ir st w e r ecall t h e defin it ion of t h e sequ ent ial ou tlier s t est . T h e t est st at istics is defin ed a s follow . Lea st m edian of s qu ar es pr op osed by R ou s seeu w (1984 ) m in im izes th e m edian of t h e squ ar ed r esidu als . Lea st m edian of s qu ar es regr es sion h a s a v ery h ig h br eak dow n poin t of alm ost 50% . T h e lea st m edian of squ ar es estim at or

L M S is g iv en by

M inim ize m ed r 2 i (2)

J i

w h ere r i = y i - x i J , J = ( X J T X J ) - 1 X T Y J , x i = ( x i1 , x i2 , , x ip ) an d J = { i 1 , i 2 , , i p } is a sub s et of {1 , 2 , , n } con t ainin g p in dices . T h e r esidu al is g iv en by

r L M S

i

= y i - x i L M S . (3 )

(3)

T h e in it ial scale est im at e s 0 for t h e lea st m edian s qu ar es r eg re s sion is giv en b y

s 0 = 1 .4826 ( 1 + 5 / ( n - p - 1) ) m ed i ( r L M S

i

) 2 . (4 )

T h e init ial s cale estim at e is t h en u sed t o det erm in e a w eigh t w i for t h e ith ob serv ation , n am ely

w i = { 0 1 if c ot h erw is e r L M S

i

/ s 0 d (5 ) w h ere [c, d ] is t h e inn er fen ce of b ox plot of r L M S

i

/ s 0 .

By m ean s of th ese w eig ht s , t h e fin al s cale estim at e s for th e lea st m edian s qu ar es

r eg re s sion is g iv en by

s =

n

i = 1 w i ( r L M S

i

) 2 / (

n

i = 1 w i - p - 1) . (6)

s als o h a s a br eak dow n point 0.5, t h e h ig h est pos sible v alu e.

By con t ra st , t h e lea st squ ar es est im at or L S m in im izes

n

i = 1 r 2 i . (7 )

T h e br eak dow n poin t of lea st squ are s est im at or is 0. T h e r esidu al is giv en by

r L S

i

= y i - x i T L S . (8 )

It is w ell kn ow n th at ou t lier s can h av e an ex t r em e effect on th e lea st s qu ar es e st im at or .

T h e scale est im at e for t h e lea st squ are s r egr es sion is giv en by

=

n

i = 1 ( r L S

i

) 2 / ( n - p - 1) . (9 )

T h e t est st at ist ics for t est in g t h e out lier s is defin ed a s

R = / s. (10)

(4)

It t est s t h e follow in g h y poth esis

H 0 : n o out lier in dat a ( x i1 , x i2 , , x ip , y i ) , i = 1 , 2 , , n H 1 : som e out lier s in dat a ( x i1 , x i2 , , x ip , y i ) ,

i = 1 , 2 , , n . (11)

T h e n ull hy p ot h esis is r ej ect ed for lar g e R . H ow ev er , if t h e n ull hy p ot h esis is r ej ect ed, t h er e is n o in dication of h ow m any or w hich poin t s ar e ou t lier s . T o s olv e th is pr oblem , w e apply t h e t e st sequ en tially in for w ar d sequ en tial pr ocedur e t o iden tify t h e out lier s . If t h e t est r ej ect s th e n ull hy poth esis t h en t h e poin t w it h t h e lar g est D = |sor t( r L M S

i

) - M ed (r L M S

i

) | is defin ed a s an ou tlier , w h er e

sor t( r L M S

i

) is th e sort of r L M S

i

an d M ed ( r L M S

i

) is t h e m edian of r L M S

i

. T h e ob serv ation det ect ed a s an out lier is r em ov ed an d t h e t est is applied ag ain t o t h e n - 1 r em ain in g ob s erv at ion s . T h e procedur e is r epeat ed an d st op s w h en t h e t est is n o lon g er sign ificant .

T h e crit ical v alu es for t h e t est (approx im at ed by M on t e Carlo sim u lat ion u sin g 1000 replicat es ) ar e pr es ent ed in t h e T able 1.

T able 1 . Critical values for the pr opos ed t est

S am ple s ize s

N um ber of ex plan at ory v ar iable

1 2 3 4

lev el lev el lev el lev el

0.1 0.05 0 .01 0.1 0.05 0.01 0 .1 0 .05 0 .01 0 .1 0 .05 0 .01

15 1.725 1.894 2.072 2 .107 2 .223 2 .386 2.469 2 .622 2 .756 2 .807 2.895 2.992

20 1.484 1.637 1.849 1.850 1.978 2 .084 2.121 2 .246 2 .334 2 .323 2.407 2.580

25 1.493 1.605 1.759 1.682 1.793 1.853 1.950 2 .044 2 .200 2 .164 2.282 2.388

30 1.461 1.570 1.717 1.552 1.638 1.752 1.824 1.921 2 .065 1.982 2.150 2.333

35 1.395 1.475 1.623 1.496 1.578 1.688 1.650 1.793 1.925 1.786 1.910 2.103

40 1.326 1.403 1.493 1.417 1.487 1.580 1.573 1.666 1.774 1.654 1.769 1.882

45 1.276 1.337 1.435 1.393 1.473 1.570 1.456 1.548 1.655 1.575 1.688 1.812

50 1.266 1.338 1.403 1.351 1.425 1.515 1.471 1.492 1.575 1.466 1.540 1.631

N ex t w e su g g est t h e w ay t o a s sig n a w eight w i t o each ob serv at ion ( x i , y i ). F or t his pu rp os e, w e can u se s ev er al t y pes of fun ct ion s of t h e t est st atist ics R . T h e fir st kin d of w eigh t fu n ction t h at w e con sider h er e is of t h e form

w ( R i ) = { 0 1 if R ot h erw is e i c 1 . (12)

(5)

w h ere c 1 is a crit ical v alu e for t est st at ist ics w h en sig nifican t lev el is 0.1. T h is w eig ht fu n ction , y ieldin g only bin ary w eigh t , pr odu ce s a clear dist in ction b et w een a ccept ed an d rej ect ed p oint . But th is fun ct ion is r adical. S o w e int r odu ce w eig ht fu n ct ion t h at is les s ex tr em e. It con sist s of in t rodu cin g a lin ear part t h at sm ooth s t h e t r an sit ion fr om w eigh t 1 t o w eig ht 0.

In t h at w ay , ex tr em e ou tlier s dis appear en tir ely an d int erm ediat e ca ses ar e g r adu ally dow n - w eig ht ed . In t h e g en er al form u la

w ( R i ) =

1 if R i c 1

( c 2 - R i )

( c 2 - c 1 ) if c 1 R i c 2 0 ot h erw is e

(13 )

W h er e c 2 is a crit ical v alu e for t est st at istics w h en sign ificant lev el is 0.01.

A ny w ay , w e t h en apply w eight ed lea st s qu ar es defin ed by

M in im ize

n

i = 1 w ( R i ) r 2 i (14 )

T h e w eig ht ed lea st s qu ar es estim at or is g iv en by

* = ( X T W T WX ) - 1 X T W T WY (15 )

w h ere W = d iag ( w

1 2 1 , w

1 2

2 , , w

1 2 n ) .

Let W T W = d iag ( w 1 , w 2 , , w n ) = V , th en

* = ( X T VX ) - 1 X T V Y

E ( * ) = (16 )

an d Va r ( * ) = ( X T VX ) - 1 2 .

T h e st an dar d error of i- t h w eigh t ed lea st squ are s est im at or is giv en by

2 ( X T VX ) ii - 1 . (17 )

W h er e un kn ow n 2 is e st im at ed b y ( s * ) 2 =

n

i = 1 w ( R i ) r 2 i / (

n

i = 1 w ( R i ) - p) .

(6)

T o dis cu s s r obu st infer en ce for t h e lin ear regr es sion m odel, w e a s sum e t h at t h e err or s ar e in depen den t ly an d n orm ally dist ribu t ed w ith m ean zer o an d v arian ce

2 . Un der t h e se con dit ion s , it is w ell kn ow n th at

*

i - i

( s * ) 2 ( X T VX ) ii - 1

, i = 1 , 2 , , p (18 )

h a s a S tu dent t - dist ribu tion w ith

n

i = 1 w ( R i ) - p deg ree of fr eedom . Let u s

den ot e t h e 1 - / 2 qu an tile of t his dist ribu tion by t

n

i = 1

w ( R

i

) - p , 1 -

2

. T h en a ( 1 - ) 100 % con fiden ce in t erv al for i is giv en b y

[ * i - t

n

i = 1

w (R

i

) - p , 1 - 2

( s * ) 2 ( X T VX ) - 1 ii , * i + t

n

i = 1

w (R

i

) - p , 1 - 2

( s * ) 2 ( X T VX ) - 1 ii ] , i = 1 , 2 , , p . (19 )

T o t est t h e h y pot h e sis

H 0 : i = 0

H 1 : i 0 , (20)

W e can u s e th e follow in g t e st st atist ics

* i

( s * ) 2 ( X T VX ) ii - 1

, i = 1 , 2 , , p . (21)

T h e r obu st coefficient of det erm in at ion R 2 r is calcu lat ed a s follow s :

R 2 r = 1 -

n

i = 1 w ( R i ) ( y i - y i ) 2

n

i = 1 w ( R i ) ( y i - y i ) 2

in m odel w ith con st ant t erm (22)

an d

R 2 r = 1 -

n

i = 1 w ( R i ) ( y i - y i ) 2

n

i = 1 w ( R i )y 2 i

in m odel w ith ou t con st an t t erm . (23 )

Becau se t h e w eigh t ed lea st s qu ar es r eg re s sion b a sed on t h e sequ en tial out lier s

t est st at istics R h a s t h e h ig h br eak dow n p oint , infer en ces for th e par am et er s of

lin ear r egr es sion m odel u sin g w eigh t ed lea st squ ar es ar e n ot affect ed by out lier s .

(7)

3 . S im u l at i on an d it s R e s u lt s

W e focu s on con fiden ce in t erv al for th e slope p ar am et er . W e con sider th e cov er ag e an d m edian len g t h of con fiden ce int erv al for t h e slope . F ir st , w e g en er at e s am ples in follow in g situ at ion ,

y i = x i1 + x i2 + + x ip + e i i = 1 , 2 , , n (24 ) in w h ich e i N ( 0 , 1) an d th e ex plan at ory v ariable s ar e g en er at ed a s x ij N ( 0 , 100) , for j = 1 , 2 , , p . S econ d , (1- ) 100% of sam ples ar e g en er at ed a s in t h e fir st situ at ion an d th e rem ain in g 100 % are g en erat ed a s e i N ( 0 , 1) an d x i N ( , 100) . F in ally (1- ) 100% of s am ples are g en er at ed a s in t h e fir st sit u ation . T h e r em ainin g 100 % ar e g en er at ed a s e i N ( , 1) an d

x i N ( 0 , 100 ) .

T o ch eck t h e p erform an ce of confiden ce int erv al for t h e slope u sin g equ at ion 19, w e con du ct ed a M on t e Carlo sim ulation u sin g 1000 r eplicat e s of th e follow in g s am plin g sit u at ion : sam ple sizes n =20, 40, 60, 80, 100, =10, 50, 90, p =1, = 0 . 05 an d = 0 . 1.

T h e m ean cov er ag e an d m edian len g th (in p ar ent h e sis ) of confiden ce in t erv al for t h e slope b a sed on t h e lea st squ ar es m et h od ar e pr esen t ed in th e T able 2. T h e n om in al con fiden ce lev el in all ca se s is 0.95.

T a b l e 2 . m ean cov er age and m edian length of confidence int erv al for t he slope = 1 bas ed on the least squar es

% of con t am in ation S am ple size

M ild Con t am in ation ( =20 )

M edium Cont am in ation ( =50 )

S tr ong Cont am in ation ( =90)

5%

20 0.75(0.41) 0.51(0.784) 0.47(1.231)

40 0.71(0.40) 0.47(0.641) 0.33(0.951)

60 0.65(0.34) 0.32(0.534) 0.26(0.823)

80 0.61(0.32) 0.29(0.423) 0.23(0.741)

100 0.47(0.29) 0.27(0.341) 0.21(0.647)

10%

20 0.65(0.35) 0.42(0.35) 0.30(0.92)

40 0.47(0.24) 0.32(0.25) 0.25(0.83)

60 0.27(0.21) 0.21(0.20) 0.11(0.76)

80 0.15(0.19) 0.10(0.18) 0.07(0.43)

100 0.13(0.17) 0.09(0.16) 0.05(0.28)

T h e com pu t ed percent ag es ar e b elow t h e n om in al lev el, specially for larg er

s am ple sizes . T his in dicat es t h at t h e low cov er ag e lev els ar e du e t o out lier s in th e

dat a .

(8)

T o ov er com e t his pr ob lem , w e u se t h e w eigh t ed lea st squ ar es b a sed on th e s equ en t ial ou tlier s t est st at istics R . T h e m ean cov era g e an d m edian len g th (in p ar ent h e sis ) of con fiden ce int erv al for th e slope b a sed on t h e w eig ht ed lea st s qu ar es m et h od are pr esen t ed in t h e T ab le 3. T h e n om in al confiden ce lev el in all ca ses is 0.95.

T a b l e 3 . m ean cov er age an d m edian length of confidence int erv al for the s lope

= 1 bas ed on the w eight ed least squar es

% of cont aminat ion

Sam ple s ize

M ild Cont am in at ion ( =20)

M edium Cont am in at ion ( =50)

St r ong Cont am in at ion ( =90)

5%

20 0.96 (0.098 ) 0.96 (0.097 ) 0.93 (0.112) 40 0.96 (0.096 ) 0.96 (0.096 ) 0.94 (0.0991) 60 0.95 (0.094 ) 0.94 (0.095 ) 0.94 (0.098 ) 80 0.94 (0.093 ) 0.94 (0.094 ) 0.95 (0.095 ) 100 0.94 (0.092) 0.94 (0.093 ) 0.95 (0.094 )

10%

20 0.97 (0.135 ) 0.98 (0.137 ) 0.92 (0.192) 40 0.95 (0.124 ) 0.97 (0.125 ) 0.94 (0.183 )

60 0.93 (0.121) 0.98 (0.120) 0.95 (0.176 )

80 0.93 (0.119 ) 0.97 (0.18 ) 0.96 (0.413 ) 100 0.92 (0.117 ) 0.95 (0.16 ) 0.96 (0.128 ) N otice t h at t h e cov er ag es ar e all ab ov e 92% an d sev er al ex ceed th e n om in al 95% lev el. A nd m edian len gth of confiden ce int erv al in th e T able 3 is sh ort er th an t hat in th e T able 2.

4 . N u m e ri c al R e s u lt s

In this section , t h e proposed m eth od is applied t o sev er al dat a set s t o ch eck th e perform an ce.

E x am pl e 1 ( P ilot - P lan t Dat a )

T his dat a com es form Daniel an d W ood (1971). Rou s s eew an d Leroy (1987 ) u sed th ese dat a t o illu st rat e t h e n eed for r ob u st regr es sion t echn iqu e . S u ppose n ow t h at on e of t h e ob serv at ion s h a s b een w r on gly r ecor ded. F or ex am ple, th e x - v alu e of t h e six th ob serv ation h a s b een r ecor ded a s 370 in st ead of 37. T his err or produ ces an ou tlier in th e in dep en den t v ariable space . T h e dat a appear in t h e T able 4. T h e r esu lt s for t h e pr oposed m eth od ar e in t h e T able 5.

In t h e T able 5, t h e t est is high ly sig nifican t for ob serv ation 6 th at w r on g ly

r ecor ded. W h en t h e t est is applied t o t h e r em ain in g 19 ob serv ation s , nu ll

h y poth esis is n ot r ej ect ed. In th is ex am ple, all w ( R i ) ar e equ al t o 1, ex cept for

ca se 6. T h e infer en ce r esu lt s by m ean s of lea st s qu ar es (LS ) an d w eig ht ed lea st

(9)

s qu ar es (W LS ) for P ilot - P lant dat a w it h out lier ar e pr es ent ed in th e T ab le 6. T h e s cat t erplot for t h e pilot - plan t dat a w it h out ou t lier s su g g est s a st r on g st at istical r elat ion ship b et w een t h e r espon s e an d t h e ex plan at ory v ariable. H ow ev er , in t h e T able 6, w e can con clu de t h at t h e LS slope is n ot significantly differ fr om zer o an d t h e R 2 corr espon din g t o LS is 0.141, on th e oth er side, th e W LS slop e is sig nifican tly differ fr om zer o an d t h e R 2 corre spon din g t o W LS is 0.994. A n d t h e len gt h of th e con fiden ce int erv al for LS is lon g er th an t h at for W LS . T h is r esu lt dem on st r at e th e fa ct t h at W LS is u n affect ed but LS is affect ed by out lier s . T h er efor e t h e infer en ce u sin g t h e w eig ht ed lea st s qu ar es m et h od b a sed on t h e fu n ct ion of t h e s equ en tial t est st at istics is r obu st .

F i g u re 1 . Scat t erplot for Pilot - Plant w ithout outlier s T ab le 4 . Pilot - Plant dat a s et w ith outlier

in dex Extr action (x ) T itr ation (y ) in dex Extr action (x ) T itr ation (y )

1 123 76 11 138 82

2 109 70 12 105 68

3 62 55 13 159 88

4 104 71 14 75 58

5 57 55 15 88 64

6 370(37) 48 16 164 88

7 44 50 17 169 89

8 100 66 18 167 88

9 16 41 19 149 84

10 28 43 20 167 88

*(37) is original dat a of pilot - plant dat a s et

(10)

T ab le 5 . T he pr opos ed m ethod applied t o the cont amin at ed pilot - plant dat a s am ple size obser v at ion select ed pr oposed t est s t at ist ics critic al v alues

w eight

0.01 0.05 0.1

20 6 11.703 1.849 1.637 1.484 0

19 11 0.941 1.858 1.651 1.495 1

T a b l e 6 . T he infer ence r es ult s by m eans of LS and W LS for Pilot - Plant dat a w ith outlier

coefficient st andar d

err or T - v alue p - v alue

95% c onfidence int erv al

leng t h coefficient of det erm in at ion

low er upper

L S x 1 0.081 0.047 1.719 0.103 - 0.018 0.179 0.197 0.141

W L S x 1 0.323 0.006 54.214 0.000 0.310 0.335 0.025 0.994

E x am p le 2 ( S t ocklos s Dat a )

T h e secon d ex am ple com es fr om th e Br ow n lee (1965 ). W e h av e s elect ed th is ex am ple b ecau se it is a set of r eal dat a an d it is ex am in ed by m any st atist ician s . M ost people con clu ded t h at ob serv ation s 1, 3, 4, an d 21 w er e ou tlier s . S om e p eople r eport ed t h at ob serv ation 2 w a s out lier . T h e dat a ar e sh ow n in t h e T able 7. T h e re sult for t h e proposed m et h od appear in th e T able 8. In t h e T able 8, all w ( R i ) ar e equ al t o 1, ex cept for ob serv ation 4, 21, 1, 3 an d 2. T h e infer en ce r esult s by m ean s of LS an d W LS for st acklos s dat a are pr esen t ed in t h e T ab le 9.

T ab le 7 . St acklos s dat a in dex r ate

(x 1)

tem per - atur e (x 2)

acid concen - tr ation (x 3 )

st ackles s (y )

in dex r ate (x 1)

tem per - atu re (x 2)

acid con cen - tr ation (x 3 )

s t ackles s (y )

1 80 27 89 42 12 58 17 88 13

2 80 27 88 37 13 58 18 82 11

3 75 25 90 37 14 58 19 93 12

4 62 24 87 28 15 50 18 89 8

5 62 22 87 18 16 50 18 86 7

6 62 23 87 18 17 50 19 72 8

7 62 24 93 19 18 50 19 79 8

8 62 24 93 20 19 50 20 80 9

9 58 23 87 15 20 56 20 82 15

10 58 18 80 14 21 70 20 9 1 15

11 58 18 89 14

(11)

T ab le 8 . T he pr opos ed m ethod applied t o the st acklos s dat a S am ple size ob serv ation

select ed

pr opos ed t est s t at ist ics

Crit ical V alues

w eight

0.01 0.05 0.10

21 4 2.685 2.304 2.204 2.101 0

20 21 2.895 2.334 2.246 2.121 0

19 1 2.367 2.359 2.296 2.231 0

18 3 2.9067 2.384 2.346 2.321 0

17 2 2.610 2.421 2.396 2.384 0

16 13 2.326 2.634 2.583 2.421 1

T a b l e 9 . T he infer ence r es ult s by m eans of LS and W LS for st acklos s dat a

coefficient s t andar d

err or T - v alue p - v alue

95% confidence int erv al

lengt h coefficient of deter m in ation

low er upper

L S

x 1 0.716 0.135 5.307 0.000 0.431 1.000 0.569

0.914 x 2 1.295 0.368 3.520 0.003 0.519 2.072 1.553

x3 - 0.152 0.156 - 0.973 0.344 - 0.482 0.178 0.66

W L S

x 1 0.686 0.088 7.834 0.000 0.495 0.877 0.382

0.942 x 2 0.567 0.153 3.702 0.003 0.233 0.901 0.668

x3 - 0.017 0.063 - 0.273 0.789 - 0.155 0.120 0.275

In t h e T able 9, t h e len g th of confiden ce in t erv al of each r eg r es sion coefficien t for LS ar e lon g er t h an t h at for W LS . A n d t h e sig nifican ce of t h e regr es sion coefficient s t urn s ou t t o b e differ en t in t h e LS fit an d th e W LS fit .

E x am p le 3 T h ese r aw dat a cam e fr om Dr aper an d S m it h (1966 ) an d w er e

u sed t o det erm in e t h e influ en ce of an at om ical fa ct or s on w ood specific gr av it y .

R ou s seeu w an d Leroy (1987 ) u sed a cont am in at ed v er sion of t h ese dat a t o com par e

t h e v ariou s diagn ostic. T h ese con t am in at ed dat a is th e ou tlier s th at ar e n ot

out ly in g in any of th e in div idu al v ariables . T h e cont am in at ed dat a is sh ow n in t h e

T able 10. T h e r esu lt for proposed m et h od appear in th e T able 11. In t h e T able 11,

all w ( R i ) ar e equ al t o 1, ex cept for ob serv at ion 19, 6, 8, an d 4. T h e infer en ce

r esult s by m ean s of LS an d W LS for m odified dat a on w ood sp ecific g r av ity ar e

pr esen t ed in t h e T ab le 12.

(12)

T ab le 10 . Cont aminat ed Dat a on W ood Specific Gr avity

Index x 1 x 2 x 3 x 4 x 5 y

1 0.5730 0.1059 0.4650 0.5380 0.8410 0.5340

2 0.6510 0.1356 0.5270 0.5450 0.8870 0.5350

3 0.6060 0.1273 0.4940 0.5210 0.9200 0.5700

4 0.4370 0.1591 0.4460 0.4230 0.9920 0.4500

5 0.5470 0.1135 0.5310 0.5190 0.9150 0.5480

6 0.4440 0.1628 0.4290 0.4110 0.9840 0.4310

7 0.4890 0.1231 0.5620 0.4550 0.8240 0.4810

8 0.4130 0.1673 0.4180 0.4300 0.9780 0.4230

9 0.5360 0.1182 0.5920 0.4640 0.8540 0.4750

10 0.6850 0.1564 0.6310 0.5640 0.9140 0.4860

11 0.6640 0.1588 0.5060 0.4810 0.8670 0.5540

12 0.7030 0.1335 0.5190 0.4840 0.8120 0.5190

13 0.6530 0.1395 0.6250 0.5190 0.8920 0.4290

14 0.5860 0.1114 0.5050 0.5650 0.8890 0.5170

15 0.5340 0.1143 0.5210 0.5700 0.8890 0.5020

16 0.5230 0.1320 0.5050 0.6120 0.9190 0.5080

17 0.5800 0.1249 0.5460 0.6080 0.9540 0.5200

18 0.4480 0.1028 0.5220 0.5340 0.9180 0.5060

19 0.4170 0.1687 0.4050 0.4150 0.9810 0.4010

20 0.5280 0.1057 0.4240 0.5660 0.9090 0.5680

T abl e 1 1 . T he pr opos ed m ethod applied t o m odified dat a on w ood specific gr avity

S ample size obser v at ion selected

scale r at io st at ist ics

Crit ical V alues

w eig ht

0.01 0.05 0.10

20 19 1.783 1.484 1.415 1.365 0

19 6 1.948 1.518 1.445 1.395 0

18 8 2.068 1.547 1.472 1.412 0

17 4 2.635 1.577 1.492 1.433 0

16 5 1.227 1.671 1.522 1.463 1

(13)

T ab le 12 . T he infer ence r esult s by m eans of LS and WLS for m odified dat a on w ood s pecific gr avity

coefficien t st and ar d

er r or T - v alue p - v alue

95% con fiden ce in ter v al for slop

leng t h c oefficien t of deter m in ation

low er upper

L S

x 1 0 .441 0.117 3 .770 0 .002 0 .190 0.691 0 .501.

0.808

x 2 - 1.475 0.487 - 3.029 0 .009 - 2.519 - 0 .431 2 .088

x 3 - 0.261 0.112 - 2.332 0 .035 - 0.501 - 0 .021 0 .48

x 4 - 0.021 0.161 0 .129 0 .899 - 0.325 0.366 0 .691

x 5 0 .171 0.203 0 .840 0 .415 - 0.265 0.607 0 .872

W L S

x 1 0 .217 0.042 5 .162 0 .000 0 .124 0.311 0 .187

0.958

x 2 - 0.085 0.198 - 0.430 0 .676 - 0.526 0.356 0 .882

x 3 - 0.564 0.043 - 12 .975 0 .000 - 0.661 - 0 .467 0 .194

x 4 - 0.400 0.065 - 6.118 0 .000 - 0.546 - 0 .255 0 .291

x 5 0 .607 0.079 7 .730 0 .000 0 .432 0.783 0 .351

In th e T able 12, t h e sign ifican ce of th e r eg r es sion coefficien t s t urn s out t o b e differ en t in th e LS fit an d t h e W LS fit . T h e v ariables x 4, x 5 h av e LS r eg r es sion coefficient s t h at ar e n ot sign ificant ly differ ent fr om zer o for sign ificant lev el 0.05.

Only t h e v ariable x 2 h a s W LS r egr es sion coefficient th at is n ot sig nifican t ly differ en t fr om zer o for sign ificant lev el 0.05. Dr ap er an d S m it h con clu de th at x 2 cou ld b e r em ov ed from t h e m odel. T h e v ariable x 2 h a s LS r egr es sion coefficien t t h at is sig nifican t ly different fr om zer o for sig nifican t lev el 0.01 is on ly cau sed by out lier s . A n d t h e len gt h of con fiden ce in t erv al of each r egr es sion coefficien t for LS are lon g er t h an th at for W LS .

T h e m an y ex am ple s dem on str at e t h e fact th at t h e W LS is un affect ed by m a skin g an d s w am pin g effect s .

5 . Con c lu din g R e m ark s

It is w ell k n ow n t h at ou tlier s can h av e an ex t r em e effect on t h e lea st squ ar es e st im at ion . H en ce, it is im p ort ant t o det ect th e ou tlier s an d t o m an ag e h ow t o deal w it h th e det ect ed ou t lier s .

In t his pap er , w e proposed th e t ool t o iden tify out lier s an d t h e m et h od t o deal w it h t h e det ect ed ou t lier s . T o det ect t h e out lier s , w e su g g est ed t h e for w ar d s equ en t ial t est . A n d t o deal w it h t h e det ect ed out lier s , w e r ecom m en ded t h e w eig ht ed lea st squ ar es m et h od b a sed on t h e fun ction of t h e s equ en tial t est st atist ics .

W e prov ed t h at th e w eight ed lea st squ are s m et h od b a s ed on th e fu n ction of

t h e sequ ent ial t est st at ist ics w a s n ot affect ed by t h e m a skin g an d sw am pin g

effect s th rou g h th e M ont e Carlo r esult s an d nu m erical ex am ples . T h ese su g g e st

t h at th e n ew ly pr oposed t ool pr ov ides con serv ativ e an d fairly pow erfu l m eth od for

t h e an aly sis of th e dat a fr om lin ear r eg r es sion m odel

(14)

R e f e re n c e

1. Br ow nlee, K . A .,(1965) S ta t is t ica l th e o ry an d m e th od o log y in s c ie n c e a n d e n g in e e r in g , 2n d ed ., J ohn W ily & S on s , N ew Y ork .

2. Dan iel, C., an d W ood , F . S .,(1971) F it t in g E q ua t ion s t o d a ta , J oh n W `ley

& S on s , N e w Y ork .

3. Dr aper , N , R ., an d S m ith , H .,(1966). A p p lied R eg ress ion A naly s is , John W iley

& S on s , N ew York .

4. Jinpy o Park and Heechang P ark ,(2001). T he sequential testing of multiple outlier s in lin ear regression , T he Korean Communication in St atistics Vol. 8, No.2, 337- 346

5. Rousseeuw , P . J.,(1984) Least median of squares regression , J . A m. S tat. A ssoc., 79, 871- 884.

6. R ou s seeu w , P . J ., an d Ler oy , A . M .,(1987 ) R ob us t r eg r es s ion an d ou tlie r d e t ec t ion , J oh n W iley & S on s , N ew Y ork .

[ 2002년 9월 접수, 2002년 10월 채택 ]

참조

관련 문서