Statistical properties on generalized half-logistic distribution †
Suk-Bok Kang 1 · Yongku Kim 2 · Jung-In Seo 3
1 Department of Statistics, Yeungnam University
2 Department of Statistics, Kyungpook National University
3 Department of Big Data, Daejeon University
Received 5 February 2020, revised 18 February 2020, accepted 18 February 2020
Abstract
This paper addresses statistical properties of the generalized half-logistic distribu- tion. We first establish the relations between the generalized half-logistic distribution and some well-known probability distributions, and then derive the moments of a ran- dom variable from the generalized half-logistic distribution. In addition, the maximum likelihood method is developed to provide the maximum likelihood estimators and ap- proximate confidence intervals for unknown parameters. Note that the shape parameter can be the parameter of interest because the generalized half-logistic distribution has different shapes according to the value of the shape parameter. Therefore, we provide an unbiased estimator and an exact confidence interval based on a pivotal quantity for the shape parameter of the generalized half-logistic distribution, which is more ef- ficient than the maximum likelihood estimation. Finally, we show that the generalized half-logistic distribution is available as a failure time model through real data analysis.
Keywords: Fisher information, generalized half-logistic distribution, pivotal quantity.
1. Introduction
In many applications such as reliability and life test, the exponential distribution (ED) is one of the most popular distributions because of its very simple mathematical expressions.
Baklizi (2005) developed the preliminary test estimator based on the maximum likelihood estimator (MLE) of the scale parameter by assuming a prior guess of this parameter in the ED. Wang (2006) derived unbiased estimators of the mean life and the failure rate at a design stress based on the failure censored step-stress accelerated life-testing data. Wu et al.
(2007) constructed a prediction interval for the future observation in the two-parameter ED based on multiply type-II censored samples. Yun and Lee (2019) proposed a new goodness- of-fit test for a progressive censored data from the ED. Shin and Lee (2019) provided a
† This research was supported by the Daejeon University Research Grants (2019).
1
Professor, Department of Statistics, Yeungnam University, Gyeongsan 38541, Korea.
2
Associate professor, Department of Statistics, Kyungpook National University, Daegu 41566, Korea.
3
Corresponding author: Assistant professor, Department of Big Data, Daejeon University, Daejeon
34520, Korea. E-mail: [email protected]
maximum product spacing estimation method for the scale parameter of the ED under the multiply progressive censoring scheme. But the ED does not always fit well failure time data because its failure rate function is constant. The probability density function (pdf) of the half-logistic distribution (HLD) is similar in shape to that of the ED, whereas the failure rate function is a monotonically increasing function (see Balakrishnan and Wong (1991)).
For this reason, the HLD were studied by several authors. Balakrishnan and Puthenpura (1986) provided the best linear unbiased estimators (BLUEs) of unknown parameters of the HLD using linear functions of order statistics. Kang et al. (2008) derived MLE and approximate MLEs (AMLEs) of the scale parameter in the HLD under the progressive type- II censoring scheme. Kang et al. (2009) proposed a double hybrid censoring scheme and derived AMLEs of the scale parameter in the HLD under the proposed censoring scheme.
Recently, Seo and Kang (2015a) proposed a new, more efficient method than the maximum likelihood method for estimating the scale parameter of the HLD under the progressive type-II censoring scheme. In addition, Seo et al. (2017) proposed robust methods for the estimation of unknown parameters in a two-parameter distribution with a bathtub-shaped failure rate function under the same censoring scheme. Asgharzadeh et al. (2015) constructed exact confidence intervals (CIs) and exact joint confidence regions for unknown parameters in a two-parameter bathtub-shaped lifetime distribution based on record values. Seo and Kim (2015) provided various ways of checking the sensitive effects on the target posterior distribution of the parameter of interest by providing the full conditional distribution of the nuisance parameter.
In this paper, we provide some statistical properties of a generalized HLD (GHLD) such as the relationships to other probability distributions and moments which have not been studied in the literature to the best of our knowledge, and derive an unbiased estimator and an exact CI for the shape parameter as well as the MLEs of unknown parameters and their approximate CIs in the GHLD. We can give the cumulative distribution function (cdf) and the pdf of the random variable X with the GHLD by
F (x) = 1 −
2e −θx 1 + e −θx
λ
and
f (x) = θλ
2e −θx 1 + e −θx
λ 1
1 + e −θx x > 0, θ > 0, λ > 0, (1.1) where θ is reciprocal parameter and λ is shape parameter. When λ = 1, this distribution is the HLD. Kim et al. (2011) provided Bayesian estimation methods for the shape parameter and the reliability function in the GHLD under the progressive type-II censoring scheme.
Seo et al. (2012) provided an entropy estimation method for upper record values from the
GHLD. Seo and Kang (2014a) derived the entropy of the GHLD based on type-II censored
samples. Seo and Kang (2014b) provided Bayesian estimation and prediction methods of
lower record values arising from the exponentiated HLD (EHLD). Kang et al. (2014) de-
veloped the maximum likelihood estimation and Bayesian estimation methods for unknown
parameters of the EHLD based on type-II hybrid censored samples. Seo and Kang (2015b)
derived moment estimators and MLEs of unknown parameters in the EHLD, and provided
approximate CIs based on MLEs by deriving an exact expression of Fisher information for unknown parameters in the distribution. Chaturvedi et al. (2016) discussed estimation and testing methods for the reliability function of the GHLD.
Section 2 introduces some theorems for relations of the GHLD to well-known probability distributions, and derives some properties such as moments and entropy of the GHLD.
Section 3 provides MLEs of unknown parameters and their approximate CIs in the GHLD.
Section 4 assesses the validity of the proposed methods, and Section 5 concludes the paper.
2. Properties
2.1. Related distributions
Suppose X is a random variable with the generalized half-logistic pdf given in (1.1). Then we can derive some relations between some well-known distributions and the GHLD.
Theorem 2.1 Let −∞ < α < β < ∞. If
Y = α + (β − α)
2e −θX 1 + e −θX
λ ,
then Y has a uniform distribution on (α, β).
Proof : By the probability integral transformation, F (X) is uniformly distributed on (0, 1).
1 − F (X) is also uniformly distributed on (0, 1). Therefore, Y has the uniform distribution
on (α, β).
Theorem 2.2 If
Y = log 1 + e −θX 2e −θX
,
then Y has an ED with mean 1/λ.
Proof : Let y = log
1+e
−θx2e
−θx. Then x = log (2e y − 1) /θ and the Jacobian of the transfor- mation is
J = dx
dy = 2e y θ (2e y − 1) . Therefore, the density function of Y is
g(y) = λe −λy , y > 0,
which is the pdf of the ED with mean 1/λ.
Theorem 2.3 Let α be a scale parameter. If
Y = α 1 + e −θX 2e −θX
,
then Y has a type-I Pareto distribution with the parameters α and λ.
Proof : Let y = α
1+e
−θx2e
−θx. Then x = log (2y/α − 1) /θ and the Jacobian of the transfor- mation is
J = 2
θ (2y − α) . Therefore, the density function of Y is
g(y) = λα λ y −λ−1 , y ≥ α,
which is the pdf of the type-I Pareto distribution with the parameters α and λ. Theorem 2.4 Let α be a shape parameter and β be a scale parameter. If
Y = β
λ log 1 + e −θX 2e −θX
1/α , then Y has a Weibull distribution with the parameters α and β.
Proof : Let y = β h λ log
1+e
−θx2e
−θxi 1/α
. Then x = log [2 exp ((y/β) α /λ) − 1] θ and the Jacobian of the transformation is
J = y β
α 2α exp h
1 λ
y
β
α i θλy h
2 exp
1 λ
y
β
α
− 1 i . Therefore, the density function of Y is
g(y) = α β
y β
α−1
exp
− y β
α
, y > 0,
which is the pdf of the Weibull distribution with the parameters α and β. Theorem 2.5 Let µ be a location parameter and σ be a scale parameter. If
Y = µ − σ log
"
1 + e −θX 2e −θX
λ
− 1
#
,
then Y has a logistic distribution with the parameters µ and σ.
Proof : Let y = µ − σ log
1+e
−θx2e
−θxλ
− 1
. Then x = log h
2 exp − y−µ σ + 1 1/λ − 1 i . θ and the Jacobian of the transformation is
J = − 2 exp − y−µ σ + 1 1/λ−1 exp − y−µ σ θλσ h
2 exp − y−µ σ + 1 1/λ − 1 i . Therefore, the density function of Y is
g(y) = exp − y−µ σ
σ 1 + exp − y−µ σ 2 , −∞ < y < ∞,
which is the pdf of the logistic distribution with the parameters µ and σ. Theorem 2.6 Let µ be a location parameter and σ be a scale parameter. If
Y = µ − σ log
λ log 1 + e −θX 2e −θX
, then Y has a Gumbel distribution with the parameters µ and σ.
Proof : Let y = µ − σ log h λ log
1+e
−θx2e
−θxi . Then x = log 2 exp exp − y−µ σ λ − 1 θ and the Jacobian of the transformation is
J = − 2 exp 1
λ exp − y−µ σ exp − y−µ σ θλσ 2 exp λ 1 exp − y−µ σ − 1 . Therefore, the density function of Y is
g(y) = 1 σ exp
− y − µ σ − exp
− y − µ σ
, −∞ < y < ∞,
which is the pdf of the Gumbel distribution with the parameters µ and σ. 2.2. Moments
This subsection provides the first and second moments of a random variable from the GHLD by using some series expansions. Similarly, the entropy is also provided.
Theorem 2.7 The first and second moments of a random variable X from the pdf given in (1.1) are
E (X) = 1 θ
1
λ − λh 1 (λ) + log 2
(2.1)
and
E X 2 = 1 θ 2
2
λ 2 + 2 log 2
λ + 2λ (h 2 (λ) − h 3 (λ)) + (log 2) 2
, (2.2)
respectively, where
h 1 (λ) =
∞
X
j=1
2 −j
j(λ + j) , (2.3)
h 2 (λ) =
∞
X
j=1
2 −(j+1) (j + 1)(λ + j + 1)
j
X
i=1
1 i , h 3 (λ) =
∞
X
j=1
2 −j j(λ + j)
1
λ + j + log 2
.
Proof : Let u =
2e
−θx1+e
−θxλ . Then
E (X) = θλ Z ∞
0
x
2e −θx 1 + e −θx
λ 1 1 + e −θx dx
= − 1 θ
Z 1 0
log
u 1/λ 2 − u 1/λ
du
= 1 θ
1 λ +
Z 1 0
log
1 − 1
2 u 1/λ
du + log 2
= 1 θ
1 λ − λ
∞
X
j=1
2 −j
j(λ + j) + log 2
by using
log (1 − z) = −
∞
X
j=1
z j
j for |z| < 1. (2.4)
Similarly,
E X 2 = θλ Z ∞
0
x 2
2e −θx 1 + e −θx
λ 1
1 + e −θx dx
= 1 θ 2
Z 1 0
log
u 1/λ 2 − u 1/λ
2 du.
Here, the integrand is decomposed as
log
u 1/λ 2 − u 1/λ
2
= log u λ
2 +
log
1 − 1
2 u 1/λ
+ log 2
2
− 2 log u λ
log
1 − 1
2 u 1/λ
+ log 2
. (2.5)
Then, by using the series expansions (2.4) and
[log(1 − z)] 2 = 2
∞
X
j=1
z j+1 j + 1
j
X
i=1
1
i for z 2 < 1, (2.6)
the second moment of X is derived as given in (2.2).
For θ = 1 and λ = 0.5(0.5)4, the mean and the variance are calculated from (2.1) and (2.2), and they are given in Table 2.1. Note that the mean and the variance of the GHLD decrease as the shape parameter λ increases.
Table 2.1 Mean and Variance for various shape parameters
λ 0.5 1 1.5 2 2.5 3 3.5 4
E(X) 2.49290 1.38630 0.98581 0.77260 0.63829 0.54519 0.47656 0.42371 V ar(X) 4.59230 1.36804 0.69872 0.43765 0.30487 0.22670 0.17622 0.14145
Let X be any random variable with a cdf F (x) and a pdf f (x). Then we can define the differential entropy of X based on Cover and Thomas (2005) as
H(f ) = − Z ∞
−∞
f (x) log f (x)dx.
Theorem 2.8 The differential entropy of a random variable X from the pdf given in (1.1) is
H(f ) = 1 − log θ − log λ + λh 1 (λ), where h 1 (λ) is given in (2.3).
Proof :
H(f ) = −E(log f (X))
= − log θ − log λ + E
λ log 1 + e −θX 2e −θX
+ E log 1 + e −θX .
By Theorm 2.2,
E
log 1 + e −θX 2e −θX
= 1 λ . In addition, putting u =
2e
−θx1+e
−θxλ
and using (2.4), we have
E log 1 + e −θX = θλ Z ∞
0
log 1 + e −θx
2e −θx 1 + e −θx
λ 1 1 + e −θx dx
= − Z 1
0
log
1 − 1
2 u 1/λ
du.
= λ
∞
X
j=1
2 −j j(j + λ) .
For θ = 1 and λ = 0.5(0.5)4, the values of the entropy in Theorem 2.8 are reported in Table 2.2 that shows that the entropy of the GHLD decreases as the shape parameter λ increases.
Table 2.2 Entropy of the standard GHLD for various shape parameters
λ 0.5 1 1.5 2 2.5 3 3.5 4
H(f ) 1.89340 1.30685 0.96855 0.72741 0.53859 0.38269 0.24956 0.13316
3. Estimation
This section develops the maximum likelihood estimation method to provide the MLEs and corresponding approximate CIs for unknown parameters. In addition, an unbiased estimator and an exact CI for the parameter of interest λ by providing a pivotal quantity are provided.
3.1. Maximum likelihood estimation
Suppose that X 1 , . . . , X n are independent and identically distributed from the GHLD with parameters (θ, λ). Then the corresponding likelihood function is
L(θ, λ) = θ n λ n
n
Y
i=1
2e −θx
i1 + e −θx
iλ 1
1 + e −θx
i. Hence we can obtain the log-likelihood function as follows
log L(θ, λ) = n log θ + n log λ + λ
n
X
i=1
log
2e −θx
i1 + e −θx
i−
n
X
i=1
log 1 + e −θx
i. (3.1)
From (3.1), we give the likelihood equations for θ and λ by
∂
∂θ log L(θ, λ) = n
θ − (λ + 1)
n
X
i=1
x i
1 + e −θx
i+
n
X
i=1
x i
= 0 (3.2)
and
∂
∂λ log L(θ, λ) = n λ −
n
X
i=1
log 1 + e −θx
i2e −θx
i= 0. (3.3)
Assuming that θ is known, we can obtain the MLE of λ as λ = ˆ n
T 1 (θ) , where
T 1 (θ) =
n
X
i=1
log 1 + e −θX
i2e −θX
i.
Note that because T 1 (θ) has the gamma distribution with parameters (n, λ) by Theorem 2.2, the MLE ˆ λ has the inverse gamma distribution with (n, λn). Therefore, the MLE ˆ λ has the following expectation and variance for n > 2:
E ˆ λ
= λn n − 1 and
V ar ˆ λ
= (λn) 2 (n − 1) 2 (n − 2) .
Hence, the bias and mean squared error (MSE) of the MLE ˆ λ are given by
Bias ˆ λ
= λ
n − 1 and
MSE ˆ λ
= λ 2 (n + 2)
(n − 1)(n − 2) .
If both parameters are unknown, then we can obtain MLEs of θ and λ simultaneously by solving equations (3.2) and (3.3) through Newton-Raphson method. The MLE of θ is denoted by ˆ θ.
Under certain conditions (Casella and Berger (2002, p.516)), the Fisher information matrix for (θ, λ) is given by
I(θ, λ) =
I 11 (θ, λ) I 12 (θ, λ) I 21 (θ, λ) I 22 (θ, λ)
=
E
− ∂θ ∂
22log L(θ, λ) E
− ∂θ∂λ ∂
2log L(θ, λ) E
− ∂λ∂θ ∂
2log L(θ, λ) E
− ∂λ ∂
22log L(θ, λ)
.
From (3.1), we have
− ∂ 2
∂θ 2 log L(θ, λ) = n
θ 2 + (λ + 1)
n
X
i=1
x 2 i e −θx
i(1 + e −θx
i) 2 ,
− ∂ 2
∂θ∂λ log L(θ, λ) =
n
X
i=1
x i
1 + e −θx
i,
− ∂ 2
∂λ 2 log L(θ, λ) = n λ 2 . Putting u =
2e
−θx1+e
−θxλ
,
E
"
X 2 e −θX (1 + e −θX ) 2
#
= θλ Z ∞
0
x 2 e −θx (1 + e −θx ) 3
2e −θx 1 + e −θx
λ dx
= 1 θ 2
Z 1 0
1
2 u 1/λ − 1 4 u 2/λ
log
u 1/λ 2 − u 1/λ
2 du.
We can decompose the integrand [log(·)] 2 as given in (2.5). Then we obtain the above expectation by using series expansions (2.4) and (2.6) as
E
"
X 2 e −θX (1 + e −θX ) 2
#
= λ θ 2
1
(λ + 1) 3 − 1 2(λ + 2) 3 +
(λ + 3) 2 − 2
2(λ + 1) 2 (λ + 2) 2 − h 4 (λ)
log 2
+ λ + 3
4(λ + 1)(λ + 2) (log 2) 2 + h 5 (λ) + h 6 (λ)
, (3.4)
where
h 4 (λ) =
∞
X
j=1
2 −(j+1) (λ + j + 3) j(λ + j + 1)(λ + j + 2) , h 5 (λ) =
∞
X
j=1
2 −(j+1) 2 − (λ + j + 3) 2 j(λ + j + 1) 2 (λ + j + 2) 2 ,
h 6 (λ) =
∞
X
j=1
2 −(j+2) (λ + j + 4) (j + 1)(λ + j + 2)(λ + j + 3)
j
X
i=1
1 i . Similarly,
E
X
1 + e −θX
= θλ Z ∞
0
x (1 + e −θx ) 2
2e −θx 1 + e −θx
λ dx
= − 1 θ
Z 1 0
1 − 1
2 u 1/λ
log
u 1/λ 2 − u 1/λ
du
= 1 θ
1
λ + λ + 2
2(λ + 1) log 2 − λ
1
2(λ + 1) 2 + h 7 (λ)
, (3.5)
where
h 7 (λ) =
∞
X
j=1
2 −(j+1) (λ + j + 2) j(λ + j)(λ + j + 1) .
By using (3.4) and (3.5), we obtain the Fisher information matrix for (θ, λ) as
I(θ, λ) = n
" Q
1
(λ) θ
2Q
2(λ) Q
2(λ) θ
θ 1 λ
2#
, (3.6)
where
Q 1 (λ) =1 + λ(λ + 1)
1
(λ + 1) 3 − 1 2(λ + 2) 3 +
(λ + 3) 2 − 2
2(λ + 1) 2 (λ + 2) 2 − h 4 (λ)
log 2
+ λ + 3
4(λ + 1)(λ + 2) (log 2) 2 + h 5 (λ) + h 6 (λ)
, Q 2 (λ) = 1
λ + λ + 2
2(λ + 1) log 2 − λ
1
2(λ + 1) 2 + h 7 (λ)
.
Now we can obtain the asymptotic variance-covariance matrix of the MLEs by inverting
the Fisher information matrix (3.6). Then, by the asymptotic normality of MLE, the ap-
proximate 100(1 − α)% CIs of the parameters θ and λ based on the MLEs ˆ θ and ˆ λ are given
by
θ ± z ˆ α/2 r
Var ˆ θ
and λ ± z ˆ α/2 r
Var ˆ λ ,
where z α/2 denotes the upper α/2 point of the standard normal distribution, and Var ˆ θ and Var ˆ λ
are the diagonal elements of the asymptotic variance-covariance matrix of the MLEs.
Note that the disadvantage of the approximate CI based on the MLE is that the lower limit of the interval may be negative, even though the parameter is greater than 0. The following subsection provides another estimation method for λ, which is simpler and more efficient than the maximum likelihood estimation method. In addition, it can overcome the problem with the approximate CI based on the MLE.
3.2. Estimation based on pivotal quantity
Theorem 3.1 Suppose that the parameter θ is known. Then an estimator of λ is
λ = ˜ n − 1
T 1 (θ) (3.7)
that an inverse gamma distribution with (n, λ(n − 1)), and is an unbiased estimator of λ.
Proof : As mentioned earlier, T 1 (θ) has the gamma distribution with parameters (n, λ).
Therefore, because ˜ λ has the inverse gamma distribution with (n, λ(n − 1)), E ˜ λ
= λ.
This completes the proof.
Note that, from the distribution of the unbiased estimator ˜ λ, we can also obtain its MSE, given by
MSE ˜ λ
= λ 2 n − 2 that is lower than the MSE ˆ λ
. Therefore, we recommend the use of the unbiased estimator
˜ λ when θ is known.
By using a pivotal quantity, we can construct an exact CI for λ. The following theorem provides CI based on a pivotal quantity for λ.
Theorem 3.2 Suppose that the parameter θ is known. Then, for any 0 < α < 1, an exact 100(1 − α)% CI based on the pivotal quantity W for λ is
χ 2 1−α/2,2n
2T 1 (θ) , χ 2 α/2,2n 2T 1 (θ)
!
, (3.8)
where χ 2 α,n is the upper α percentile of the χ 2 distribution with n degrees of freedom.
Proof : Because T 1 (θ) has the gamma distribution with parameters (n, λ), we can find a pivotal quantity
W = 2λT 1 (θ)
that has the χ 2 distribution with 2n degrees of freedom. By using this pivotal quantity, we have
1 − α = P h
χ 2 1−α/2,2n < 2λT 1 (θ) < χ 2 α/2,2n i .
This completes the proof.
Note that the unbiased estimator (3.7) and the exact CI (3.8) are depend on the other parameter θ. Therefore, if θ is unknown, then the MLE ˆ θ can be used alternatively.
4. Application
This section provides an example to illustrate the proposed method. We consider the real data in Lawless (1982), which represent the failure times (in minutes) for a specific type of electrical insulation material that was subjected to a continuously increasing voltage stress.
The data are as follows:
12.3 21.8 24.4 28.6 43.2 46.9 70.7 75.3 95.5 98.1 138.6 151.9 Balakrishnan and Chan (1992) verified that the scaled HLD fits the data extremely well using the quantile-quantile (Q-Q) plot. In addition, Balakrishnan and Puthenpura (1986) demonstrated that the two-parameter HLD fits the data better than the two-parameter ED when parameters of both distributions have the BLUEs through Kolmogorov-Smirnov test.
To compare with the HLD, we first calculate MLEs of unknown parameters of the GHLD to be ˆ θ = 0.02593 and ˆ λ = 0.77643 ˜ λ = 0.71173
. For the HLD, we also obtain the MLE θ = 0.02109. Next, we use well known three type tests, including Anderson-Darling A ˆ 2 test, Cramer-von Mises W 2
test, and Kolmogorov-Smirnov (D) test. These values are
given in Table 4.1. It is observed in the Table 4.1 that the GHLD has smaller values than
the test statistics of the HLD for three tests. To further verify the fit of the GHLD, we
show the probability-probability (P-P) plot for the fitted GHLD, and they are presented
in Figure 4.1. The value of the correlation coefficient in the P-P plot is 0.99193. From the
result, we can conclude that the GHLD fits the data very well. Finally, we report the results
of remaining estimations in Table 4.2. We replaced lower bounds in approximate CIs based
on the MLEs by zero because they take negative values, whereas bounds based on the pivotal
quantity W take positive values. In addition, the CI based on the pivotal quantity W has a
much shorter length than the CI based on the MLE ˆ λ. Therefore, it is more appropriate to
use the CI based on the pivotal quantity W than that based on the MLE ˆ λ.
Table 4.1 Values of three test statistics
Model A
2W
2D
HLD 0.32875 0.04407 0.14258 GHLD 0.31518 0.04189 0.13794
0 0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1
Empirical cdf
Fitted generalized half logistic cdf