• 검색 결과가 없습니다.

A new test statistic to assess the goodness of fit of exponential distribution under progressive censoring <sup>†</sup>

N/A
N/A
Protected

Academic year: 2021

Share "A new test statistic to assess the goodness of fit of exponential distribution under progressive censoring <sup>†</sup>"

Copied!
10
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

A new test statistic to assess the goodness of fit of exponential distribution under progressive censoring

Saemi Yun 1 · Kyeongjun Lee 2

12 Division of Mathematics and Big Data Science, Daegu University

Received 20 June 2019, revised 28 June 2019, accepted 28 June 2019

Abstract

The problem of examining how well a assumed distribution fits the data of a sample is of significant that has to be examined prior to any inferential process. In this paper, a new goodness-of-fit test for an exponential distribution based on progressive censored data is proposed. Using Monte Carlo simulation studies, the present researchers have observed that the proposed test for exponentiality is consistent and quite powerful in comparison with existing goodness-of-fit tests based on progressive censored data. Also, the new test statistic for a real data set is used and the results show that our new test statistic performs well.

Keywords: Exponential distribuiton, Lorenz curve, order statistics, progressive censor- ing.

1. Introduction

One of the most interesting problems in statistics is finding a distribution which fits to a given set of data. In other words, it is desired to test whether a specific distribution coincides with given data or not. Most of goodness-of-fit tests are based on the distance between empirical distribution function (EDF) and theoretical distribution functions over the interval (0, 1), the null hypothesis is rejected if the distance is too large in some metrics.

In reliability and life-testing studies, the observed failure time data of items are often not wholly available. Lowering the expense and period associated with the tests is important in statistical tests with censored data. Among the censoring method, progressive censoring have become quite popular in reliability and life-testing studies.

The progressive censoring arises in a reliability and lifetime-testing experiment as follows.

Promptly following the 1st observed failure time, R 1 surviving items are eliminated from the test at random. Similarly, following the 2nd observed failure time, R 2 surviving items are eliminated from the test at random. This process continues until, promptly following

† This work was supported by Daegu University Undergraduate Research Program, 2019.

1

Graduate student, Division of Mathematics and Big Data Science, Daegu University, Gyeongsan 38453, Korea.

2

Corresponding author: Associate professor, Division of Mathematics and Big Data Science, Daegu

University, Gyeongsan 38453, Korea. E-mail: indra [email protected]

(2)

the mth observed failure time, all the remaining R m = n − R 1 − · · · − R m−1 − m items are eliminated from the test. In test, it is assumed that the removals of still operating units are carried out at observed failure times and that the progressive censoring scheme is known in advance. Consequently, the m ordered observed failure times, which we denote by X 1:m:n , X 2:m:n , · · · , X m:m:n , are referred to as progressive censoring.

When the observed failure time data are progressive censoring data, the goodness-of- fit tests for perfect data can no longer be used. In this motive, goodness-of-fit test under progressive censoring has received the attention of numerous authors.

Balakrishnan et al. (2002) were to suggest a goodness-of-fit test for the exponential dis- tribution (ED) based on spacings from progressive censoring data. Wang (2008) proposed another goodness-of-fit test for the exponential distribution under progressive censored data.

Pakyari and Balakrishnan (2012) proposed a modification to the EDF goodness-of-fit statis- tics under progressive censored data. Pakyari and Balakrishnan (2013) employed a modifi- cation to the classical EDF test statistics based on order statistics, making it suitable for progressive censored data. Lee (2017) proposed a goodness-of-fit test for the gumbel distri- bution based on the generalized Lorenz curve. Yun and Lee (2018) proposed a goodness of fit tests for progressively type II censored data from a gumbel distribution. Yun and Lee (2018) proposed a goodness of fit tests for the exponential distribution based on multiply progressive censored data. Recently, Lee and Lee (2019) proposed goodness-of-fit tests for the location and scale distribution under progressive censored data.

In this paper, we suggest a goodness-of-fit test statistics and graphical method based on Lorenz curve for progressive censored data from an exponential distribution. The rest of this paper is organized as follows.

The detail explanation of proposed goodness-of-fit test procedures are presented in Section 2. We also propose a new graphical method which uses generalized Lorenz curve. The power of the proposed test is then assessed through Monte Carlo simulations in Section 3, and their power is compared with that of the test process proposed earlier by Wang (2008). In Section 4, we analyzed two real data sets, and we conclude the paper in Section 5.

2. Proposed tests

We want to test whether the progressive censored data comes from an exponential distri- bution. Suppose we are interested in a goodness-of-fit test for H 0 : f 0 = 1 σ exp − x σ  versus H 1 : f 0 6= 1 σ exp − x σ , where f 0 is a null density function and σ is unknown.

The test statistic proposed by Wang (2008) is

χ 2 = 2

m−1

X

i=1

log S 1 + S 2 + · · · + S m

S 1 + S 2 + · · · + S i .

The null distribution of the test statistic χ 2 is the Chi-squared distribution with 2m − 2 degrees of freedom. The power function of the test is given by

P 

χ 2 > χ 2 α/2 (2m − 2)|H 1

 + P 

χ 2 < χ 2 1−α/2 (2m − 2)|H 1



,

(3)

where χ 2 α (ν) is the upper α critical value of the Chi-squared distribution with ν degrees of freedom.

The Lorenz curve introduced by Lorenz (1905) provides the means to optically access the inequality in an income distribution and thus to compare the income inequality between two distributions.

Let F denotes the CDF of income or wealth distribution. For a given percentile p, let F −1 (p) = inf

y {y|F (y) ≥ p} 0 ≤ p ≤ 1, denotes the inverse CDF corresponding to F .

Then, the Lorenz curve corresponding to F is given by

L(p) = 1 µ

Z F

−1

(p) 0

xdF (x) 0 ≤ p ≤ 1,

where µdenotes the respective means of the distribution with cdf F .

Let X 1:m:n < X 2:m:n < · · · < X m:m:n be the progressive censored data with progressive censoring scheme R = (R 1 , R 2 , · · · , R m ) from an exponential distribution. Also, let progres- sive censored data have an exponential distribution with probability density function (PDF) and cumulative distribution function (CDF)

f (x; σ) = 1 σ exp 

− x σ



, F (x) = 1 − exp 

− x σ



, (2.1)

where σ is the unknown scale parameter.

If U i:m:n = F (X i:m:n ; σ), then p i:m:n = E(U i:m:n ) denote the expected value of the ith progressive censored order statistics from the standard uniform distribution, which is given by

p i:m:n = 1 −

m

Y

j=m−i+1

 j + R m−j+1 + · · · + R m j + 1 + R m−j+1 + · · · + R m

 .

Lorenz curve assumed that X is a non-negative wealth or incomes. And Lorenz curve is not a location and scale invariant statistic. In order to solve this problem, all values of the ordered progressive censored data were substracted by the value of the first ordered progressive censored data, and then each result was added all together. Then modified Lorenz curve mL(p j:m:n ) is obtained as

mL(p j:m:n ) = P j

i=1 X i:m:n − X 1:m:n

P m

i=1 X i:m:n − X 1:m:n

− p j:m:n + 1 (2.2)

Using the percentile points of Chi-squared distribution with 3 degree of freedom, Beta

distribution with shape parameters 2 and 0.5, Weibull distribution with shape parameter 2

and scale parameter 1, gamma distribution with shape parameter 2, the results are given in

Figure 2.1.

(4)

Figure 2.1 Modified Lorenz curve for various distribution

Using the mL(p j:m:n ), an nL(p j:m:n ) is obtained as

nL(p j:m:n ) = mL(p j:m:n )

mL F (p j:m:n ) , (2.3)

where

mL F (p j:m:n ) = P j

i=1 F −1 (p i:m:n ) − F −1 (p 1:m:n ) P m

i=1 F −1 (p i:m:n ) − F −1 (p 1:m:n ) − p j:m:n + 1.

Now, we propose test statistics by using nL(p j:m:n ).

L(k) = max

1≤j≤m |1 − nL(p j:m:n ) k | , (2.4)

where −∞ < k < ∞.

If the data accurately follows an exponential distribution, we expect the G(k) test statistics to be zero. Consequently, large values of L(k) test statistics lead to the rejection of null hypothesis. Therefore, we reject the null hypothesis if the L(k) test statistics exceed the corresponding upper tail null critical values. Since L(k) test statistics has a disadvantage in that its distribution theory is difficult, the critical values are not available explicitly and so the percentile points need to be determined through Monte Carlo simulations.

Next, we propose new plot methods using nL(p j:m:n ),

L k (p j:m:n ) = |1 − nL(p j:m:n ) k |. (2.5)

If the data accurately follows an exponential distribution, the nL(p j:m:n ) is 1, and L k (p j:m:n )

will converge with the x-axis. So, we are going to check if the data will follow an exponential

distribution by using the degree of how much the L k (p j:m:n ) is apart from the x-axis.

(5)

To check the shapes of L k (p j:m:n ), we consider the exponential distribution, chi-squared distribution, Beta distribution, Weibull distribution, gamma distribution. First of all, we generate 50 data from X j:m:n = F −1 (p j:m:n ) at exponential distribution, chi-squared distri- bution with 3 degree of freedom, Beta distribution with shape parameters 2 and 0.5, Weibull distribution with shape parameter 2 and scale parameter 1, gamma distribution with shape parameter 2. Next, we draw the L k (p j:m:n ). The results of L k (p j:m:n ) is appeared in Figure 2.2.

Figure 2.2 New plot method for various k

3. Illustrative examples and simulation results

3.1. Real example

A progressive type II censored sample generated from the log-times to breakdown data on insulating fluid tested at 34kV by Viveros and Balakrishnan (1994) is used to illustrate the test statistics discussed earlier. The observations in original time scale, the progressive censoring scheme are reported below;

In this example, the test statistic L(0.5) = 0.04439718, and critical region is .08398146.

In case k = 1, the test statistic L(1) = 0.08682326, and critical region is .1638556. In case

k = 3, the test statistic L(3) = 0.2385094, and critical region is .4621096. In case k = 5,

(6)

i 1 2 3 4 5 6 7 8 x

i:m:n

.18999 .77997 .95993 1.30996 2.77986 4.84962 6.49999 7.35000

R

i

0 0 3 0 3 0 0 5

the test statistic L(5) = 0.3649993, and critical region is .7599239. Therefore, we fail to reject the null hypothesis that the sample comes from an exponential distribution. This is consistent with the conclusions of Wang (2008) (χ 2 = 16.4308, p-value = 0.2877805).

It can also confirm graphical test using L k (p j:m:n ). L k (p j:m:n ) are appeared as shown in Figure 3.1. Figure 3.1 showsf that L k (p j:m:n ) of sample meet the border of x-axis. Thus, it can judge that the sample come from an exponential distribution.

Figure 3.1 New plot method for example

3.2. Simulation results

We assess the power of the proposed tests by comparing the simulated power values

with Wangs (2008) χ 2 test. For comparative purposes, the 27 censoring schemes used by

Balakrishnan et al. (2004) are considered. To determine the critical region of L(k), Monte

Carlo simulations are employed. First of all, the progressive Type II censored samples for

27 censoring schemes from the standard exponential distribution are generated, and then

(7)

L(k) are calculated. After the procedures are repeated 10000 times, the lower tail 5 percent significance points of L(k) for 27 censoring schemes are given in Table 3.1.

For 27 censoring schemes, a Monte Carlo simulation study is conducted to determine the power under different alternatives (chi-squared distribution with 3 degree of freedom, Beta distribution with shape parameters 2 and 0.5, Weibull distribution with shape parameter 2 and scale parameter 1, gamma distribution with shape parameter 2). For each alternative, 10000 samples were generated, and the test power was estimated by the frequency of the samples falling into the critical region. The power values presented in Tables 3.2 and 3.3.

From Table 3.2 and 3.3, when the k = 5, the L(5) statistic always performs better than the other L(k) statistic. Also, it can be observed from Tables 3.2 and 3.3 that L(5) statistic possesses better power than χ 2 statistic in a number of situations. In fact, L(5) statistic is found to be better than χ 2 statistic in 70 out of 108 situations. Especially, when the data are generated from Beta distribution with shape parameters 2 and 0.5, the L(5) statistic always performs better than the χ 2 statistic.

For all methods, the power increases generally as m increases for a fixed n. For all methods, the power increases generally as n increases for a fixed m.

Table 3.1 Critical values for the L(k)

Scheme L(0.5) L(1) L(3) L(5)

1 0.143079 0.266852 0.671243 1.087092

2 0.072496 0.141270 0.395856 0.641027

3 0.079290 0.154751 0.433601 0.704268

4 0.120598 0.229512 0.599247 0.955347

5 0.067406 0.132390 0.377673 0.619667

6 0.124318 0.234631 0.607838 0.976204

7 0.111435 0.212469 0.554088 0.849928

8 0.073323 0.143600 0.405448 0.654505

9 0.112417 0.213782 0.555042 0.851565

10 0.138888 0.258853 0.645298 1.011927

11 0.060111 0.117958 0.337082 0.542758

12 0.070653 0.139582 0.400857 0.656983

13 0.100318 0.192796 0.511275 0.806705

14 0.050388 0.099382 0.288264 0.472866

15 0.072641 0.142604 0.401037 0.653272

16 0.083529 0.161258 0.441465 0.698651

17 0.050594 0.100309 0.293629 0.483890

18 0.057677 0.114169 0.326928 0.544765

19 0.101212 0.194352 0.513646 0.810162

20 0.045259 0.089779 0.261101 0.427278

21 0.056312 0.111385 0.321810 0.530519

22 0.073274 0.142259 0.396271 0.628845

23 0.040874 0.080654 0.237193 0.391961

24 0.064548 0.125868 0.350774 0.562492

25 0.067484 0.131601 0.365324 0.582109

26 0.043985 0.086969 0.257695 0.424913

27 0.049225 0.097709 0.287991 0.473014

(8)

Table 3.2 Comparison of the powers in chi-squared and Beta distribution

chi-sqaured (3) Beta (2,0.5)

Scheme χ

2

L

0.5

L

1

L

3

L

5

χ

2

L

0.5

L

1

L

3

L

5

1 0.1450 0.0079 0.0130 0.0813 0.1570 0.9750 0.6173 0.8434 0.9907 0.9970

2 0.0965 0.0155 0.0254 0.0775 0.1392 0.6804 0.2537 0.3414 0.5774 0.7003

3 0.1097 0.0170 0.0279 0.0911 0.1515 0.8390 0.4489 0.5494 0.7777 0.8622

4 0.1730 0.0101 0.0208 0.1095 0.1868 0.9973 0.9793 0.9952 0.9998 0.9999

5 0.1365 0.0390 0.0544 0.1218 0.1824 0.9508 0.7854 0.8333 0.9283 0.9577

6 0.1566 0.0080 0.0193 0.1127 0.1932 0.9854 0.9381 0.9873 0.9996 1.0000

7 0.1930 0.0114 0.0247 0.1207 0.2198 0.9997 0.9982 0.9999 1.0000 1.0000

8 0.1704 0.0408 0.0591 0.1459 0.2186 0.9971 0.9829 0.9887 0.9975 0.9990

9 0.1929 0.0120 0.0267 0.1292 0.2308 0.9993 0.9991 1.0000 1.0000 1.0000

10 0.2305 0.0057 0.0108 0.0891 0.1709 0.9947 0.8109 0.9466 0.9979 0.9996

11 0.1276 0.0291 0.0401 0.0925 0.1506 0.6914 0.3013 0.3641 0.5558 0.6944

12 0.1566 0.0451 0.0597 0.1296 0.1974 0.8721 0.6220 0.6852 0.8375 0.8984

13 0.3056 0.0204 0.0380 0.1456 0.2397 1.0000 1.0000 1.0000 1.0000 1.0000

14 0.2324 0.0912 0.1130 0.1836 0.2472 0.9950 0.9514 0.9619 0.9822 0.9957

15 0.2746 0.0578 0.0810 0.1850 0.2656 0.9994 0.9993 0.9997 1.0000 1.0000

16 0.3868 0.0507 0.0774 0.2022 0.3045 1.0000 1.0000 1.0000 1.0000 1.0000

17 0.3436 0.1566 0.1793 0.2684 0.3353 1.0000 1.0000 1.0000 1.0000 1.0000

18 0.3614 0.1376 0.1599 0.2669 0.3363 1.0000 1.0000 1.0000 1.0000 1.0000

19 0.3673 0.0190 0.0372 0.1469 0.2362 1.0000 1.0000 1.0000 1.0000 1.0000

20 0.2479 0.1027 0.1180 0.1870 0.2528 0.9822 0.8697 0.8905 0.9377 0.9905

21 0.0805 0.0920 0.1111 0.1886 0.2554 0.9961 0.9919 0.9947 0.9985 0.9994

22 0.5183 0.0918 0.1303 0.2553 0.3641 1.0000 1.0000 1.0000 1.0000 1.0000

23 0.4637 0.2399 0.2672 0.3459 0.4142 1.0000 1.0000 1.0000 1.0000 1.0000

24 0.5178 0.1088 0.1444 0.2762 0.3839 1.0000 1.0000 1.0000 1.0000 1.0000

25 0.5801 0.1249 0.1627 0.3123 0.4247 1.0000 1.0000 1.0000 1.0000 1.0000

26 0.5503 0.2901 0.3198 0.4077 0.4784 1.0000 1.0000 1.0000 1.0000 1.0000

27 0.5627 0.2667 0.2946 0.3898 0.4705 1.0000 1.0000 1.0000 1.0000 1.0000

(9)

Table 3.3 Comparison of the powers in Weibull and gamma distribution

Weibull (2,1) gamma (2,1)

Scheme χ

2

L

0.5

L

1

L

3

L

5

χ

2

L

0.5

L

1

L

3

L

5

1 0.7070 0.0134 0.0767 0.4702 0.6402 0.3540 0.0027 0.0162 0.1733 0.2893

2 0.3767 0.0760 0.1249 0.2993 0.4279 0.2127 0.0334 0.0567 0.1745 0.2644

3 0.4787 0.1092 0.1711 0.3850 0.5213 0.2554 0.0404 0.0657 0.1980 0.3004

4 0.8352 0.1467 0.2862 0.6853 0.8186 0.4377 0.0196 0.0533 0.2474 0.3710

5 0.6251 0.2947 0.3603 0.5507 0.6621 0.3358 0.1091 0.1457 0.2870 0.3841

6 0.7596 0.0953 0.2322 0.6630 0.8093 0.3975 0.0158 0.0508 0.2573 0.3913

7 0.9043 0.2886 0.4596 0.8187 0.9186 0.4962 0.0341 0.0813 0.3008 0.4685

8 0.8295 0.4692 0.5508 0.7523 0.8434 0.4434 0.1395 0.1842 0.3555 0.4766

9 0.9047 0.3177 0.4972 0.8486 0.9340 0.4994 0.0402 0.0934 0.3257 0.4969

10 0.8824 0.0235 0.1149 0.5632 0.7361 0.5658 0.0036 0.0209 0.1926 0.3269

11 0.4682 0.1444 0.1919 0.3526 0.4784 0.3046 0.0744 0.1048 0.2233 0.3283

12 0.6128 0.2704 0.3306 0.5231 0.6392 0.3894 0.1243 0.1618 0.3135 0.4204

13 0.9811 0.5339 0.6742 0.9159 0.9645 0.7340 0.0806 0.1397 0.3746 0.5256

14 0.8765 0.6068 0.6590 0.7781 0.8422 0.5938 0.2942 0.3338 0.4691 0.5646

15 0.9496 0.6740 0.7490 0.9041 0.9509 0.6855 0.2200 0.2799 0.4842 0.6101

16 0.9982 0.9001 0.9422 0.9881 0.9961 0.8439 0.2245 0.3087 0.5514 0.6826

17 0.9896 0.9184 0.9330 0.9649 0.9802 0.7923 0.4857 0.5267 0.6496 0.7304

18 0.9947 0.9317 0.9467 0.9790 0.9878 0.8131 0.4491 0.4911 0.6540 0.7235

19 0.9906 0.5196 0.6639 0.9139 0.9638 0.8109 0.0763 0.1370 0.3726 0.5247

20 0.8562 0.5774 0.6173 0.7373 0.8069 0.6268 0.3272 0.3617 0.4885 0.5788

21 0.7403 0.6768 0.7211 0.8390 0.8904 0.3077 0.2909 0.3320 0.4789 0.5723

22 0.9998 0.9836 0.9915 0.9987 0.9998 0.9470 0.3925 0.4766 0.6870 0.9595

23 0.9979 0.9754 0.9813 0.9901 0.9984 0.9128 0.6844 0.7191 0.7971 0.9174

24 0.9997 0.9804 0.9887 0.9977 0.9998 0.9495 0.4605 0.5412 0.7436 0.9545

25 1.0000 0.9980 0.9989 0.9998 1.0000 0.9697 0.5242 0.5984 0.7795 0.9708

26 1.0000 0.9973 0.9980 0.9992 1.0000 0.9608 0.7677 0.7960 0.8596 0.9610

27 1.0000 0.9985 0.9989 0.9996 1.0000 0.9651 0.7408 0.7685 0.8471 0.9657

(10)

4. Conclusions

One of the most interesting problems in statistics is finding a distribution which fits to a given set of data. However, when the observed failure time data are progressive censoring data, the goodness-of-fit tests for perfect data can no longer be used. In this reason, we suggest a goodness-of-fit test statistics and graphical method based on Lorenz curve for progressive censored data from an exponential distribution. As a result, for all proposed statistics, the power increases generally as m increases for a fixed n. For all proposed statis- tics, the power increases generally as n increases for a fixed m. When the k = 5, the L(5) statistic always performs better than the other L(k) statistic. Also, L(5) statistic possesses better power than χ 2 statistic in a number of situations. Especially, when the data are gen- erated from Beta distribution with shape parameters 2 and 0.5, the L(5) statistic always performs better than the χ 2 statistic.

References

Balakrishnan, N, Ng, H. K. T. and Kannan, N. (2002). Goodness-of-fit tests and model validity, Birkhauser, Boston.

Lee, K. (2017). Goodness-of-fit test for the gumbel distribution based on the generalized Lorenz curve.

Journal of the Korean Data & Information Science Society, 28, 733-742.

Lee, W. and Lee, K. (2019). Goodness-of-fit tests based on generalized Lorenz curve for progressively Type II censored data from a location-scale distributions. Communications for Statistical Applications and Methods, 26, 191-203.

Lorenz, M. O. (1905). Methods of measuring the concentration of wealth. Publications of the American Statistical Association, 9, 209-219.

Pakyari, R. and Balakrishnan, N. (2012). A general purpose approximate goodness-of-fit for progressively Type II censored data. IEEE Transactions on Reliability, 61, 238-244.

Pakyari, R. and Balakrishnan, N. (2013). Goodness-of-fit tests for progressively Type II censored data from location-scale distribution. Journal of Statistical Computation and Simulation, 83, 167-178.

Viveros, R. and Balakrishnan, N. (1994). Interval estimation of parameters of life from progressively censored data. Technometrics, 36, 84-91.

Wang, B. (2008). Goodness-of-fit test for the exponential distribution based on progressively Type II cen- sored sample. Journal of Statistical Computation and Simulation, 78, 125-132.

Yun, H. and Lee, K. (2018). Goodness of fit tests for the exponential distribution based on multiply pro- gressive censored data. Journal of the Korean Data Analysis Society, 20, 2813-2827.

Yun, N. and Lee, K. (2018). Goodness of fit tests for progressively type II censored data from a Gumbel

distribution. Journal of the Korean Data & Information Science Society, 29, 59-69.

수치

Figure 2.1 Modified Lorenz curve for various distribution
Figure 2.2 New plot method for various k
Figure 3.1 New plot method for example
Table 3.1 Critical values for the L(k)
+3

참조

관련 문서

(2016) consider the maximum product spacings method for the estimation of parameters of gener- alized inverted exponential distribution under progressive type II censoring.. The

Maximum product spacings method for the estimation of parameters of generalized inverted exponential distribution under progressive type II censoring. Approximate ML estimation in

This paper develops a goodness of fit test statistic to test if the progressively Type II censored sample comes from an exponential distribution with origin known.. The test is

In this paper, we derive the estimators of the location parameter and the scale parameter in a logistic distribution based on multiply type-II censored samples by