A Software reliability model with a fault detection rate function of the generalized exponential distribution <sup>†</sup>

(1)

A Software reliability model with a fault detection rate function of the generalized exponential distribution ^†

Kwang Yoon Song ¹ · In Hong Chang ²

12 Department of Computer Science and Statistics, Chosun University

Received 26 February 2020, revised 10 March 2020, accepted 10 March 2020

Abstract

The software reliability model is modeled on the basis of the number of software failures and the interval time between failures, and predicts how failures will occur in the future by estimating reliability evaluation measures such as failure interval time, software reliability and failure rate, In general, software systems are known to degrade performance and increase failure rate as their service life increases. Therefore, efforts have been made to expand the exponential distribution assuming a constant failure rate. In this paper, we propose a new software reliability model with a fault detection rate function of the generalized exponential distribution based on a non-homogeneous Poisson process (NHPP). We examine the goodness-of-fit of a new NHPP software reliability model and other existing models based on two data sets. The comparative results for the goodness-of-fit show that the proposed model does significantly better than the existing models.

Keywords: Fault detection rate, generalized exponential distribution, non-homogeneous Poisson process, software reliability.

1. Introduction

The main goal of the software development is on improving the reliability and stability of a software system. Since the 1970s, we have become aware of the importance of developing software reliability models and have begun to develop software reliability models (Song and Chang, 2017). Software reliability model is the most representative method for predicting the reliability of software. The software reliability model is modeled on the basis of the number of software failures and the interval time between failures, and predicts how failures will occur in the future by estimating reliability evaluation measures such as failure interval time, software reliability and failure rate. First of all, Goel and Okumoto (1979) presented a software reliability model considering an exponential curve based on the non-homogeneous

† This study was supported by research funds from Chosun University, 2019.

1

Researcher (Post-Doc), Department of Computer Science and Statistics, Chosun University, Gwangju 61452, Korea.

2

Corresponding author: Professor, Computer Science and Statistics, Chosun University, Gwangju 61452,

Korea. E-mail: [email protected]

(2)

curves, or a mixture of exponential and S-shaped curves (Yamada et al., 1983; Ohba, 1984;

Pham, 2006). The exponential distribution is relatively easy to calculate reliability measures and plays an important role in reliability engineering theory such as product reliability prediction and reliability test design. In general, parts and systems are known to degrade performance and increase failure rate as their service life increases. Therefore, efforts have been made to expand the exponential distribution assuming a constant failure rate. Gupta and Kundu (1999) proposed the probability density function (PDF) and the cumulative distribution function (CDF) of the generalized exponential (GE) distribution by

f (t) = αbe ^−bt (1 − e ^−bt ) ^α−1 , F (t) = (1 − e ^−bt ) ^α . (1.1) The two parameters of the GE distribution represent the shape (α > 0) and the scale parameter (b > 0) like the gamma and Weibull distributions. The exponential distribution is a particular case of the GE distribution when α = 1. Gupta and Kundu (1999) also inves- tigated some of their properties. The GE density function varies significantly depending on the shape parameter α. Also, the hazard function is a non-decreasing function if α > 1, and it is a non-increasing function if α < 1. For α = 1, it is constant. The GE distribution has lots of properties which are quite similar in nature to those of the gamma distribution but it has explicit expressions for the distribution and survival functions like a Weibull distri- bution. Some researcher also provide some mathematical properties for each exponentiated distribution (Nadarajah and Kotz, 2006; Nassar and Eissa, 2003; Pak et al., 2018; Lee et al., 2018). The GE distribution extend the exponential distribution but in different ways.

Therefore, it can be used as an alternative to the Weibull and gamma distributions and in some situations it might work better in terms of fitting than the other two distributions although it can not be guaranteed. It is expected that the GE distribution also should enjoy those properties (Barreto-Souza et al., 2010). In this paper, we discuss a new software relia- bility model with a fault detection rate function of the generalized exponential distribution.

In Section 2, we propose a fault detection rate function of the GE distribution and a new mean value function of the NHPP software reliability model. Criteria for model comparisons and selection of the best model are discussed, and model analysis and results are discussed in Section 3. Finally, Section 4 presents the conclusions and remarks.

2. A software reliability model

2.1. A general NHPP software reliability model

A counting process {N (t), t ≥ 0}, is said to be a non-homogeneous Poisson process (NHPP) with intensity function λ(t), if N (t) follows a Poisson distribution with the mean value function m(t), i.e.,

P {N (t) = n} = {m(t)} ⁿ

n! e ^−m(t) , n = 0, 1, 2, · · · .

The mean value function m(t), which is the expected cumulative number of faults detected

at time t can be expressed as

(3)

m(t) = Z t

0 λ(s)ds,

where λ(t) represents the fault intensity function.

A general mean value function m(t) of many NHPP software reliability models is proposed the existing NHPP models as follows (Pham et al., 1997);

dm(t)

dt = h(t)[a(t) − m(t)]. (2.1)

The solution for (2.1) is

m(t) = e ^−H(t) [m 0 + Z t

t

0

a(τ )h(τ )e ^{H(τ )} dτ ], (2.2)

where, H(t) = R t

t

0

h(s)ds and m(t ₀ ) = m ₀ is the marginal condition of (2.2) with t ₀ repre- senting the stating time of the testing process. Different values of the fault content function a(t) of the software and the fault detection rate function h(t), which reflect various assump- tions of the software test process, can be used to solve the differential equation (2.1) to obtain the mean value function m(t).

2.2. A new NHPP software reliability model

In this paper, a new NHPP software reliability with a fault detection rate function of the generalized exponential distribution is presented. We apply (2.1) and (2.2), which are assumptions of the existing NHPP software reliability model (Goel and Okumoto, 1979), and add the following assumptions :

• The initial condition of the mean value function m(t) is m(0) = 0;

• a(t) = a is the expected number of faults that exist in the software before testing;

• The fault detection rate can be expressed by h(t);

We can derive the mean value function m(t) based on the assumptions and differential equations.

dm(t)

dt = h(t)[a − m(t)],

m(t) = a[1 − e ⁻ ^R

⁰^t

^h(s)ds ]. (2.3)

We assume that the fault detection rate function has a generalized exponential distribution

as proposed by Gupta and Kundu (1999);

(4)

Figure 2.1 Fault detection rate function

h(t) = f (t)

1 − F (t) = αbe ^−bt (1 − e ^−bt ) ^α−1 1 − (1 − e ^−bt ) ^α ,

where, b is the fault detection rate and α represents the shape factor. Figure 2.1 shows the graph of the fault detection rate function h(t).

We obtain a new mean value function m(t) for the NHPP software reliability with a fault detection rate function of the generalized exponential distribution that can be used to determine the expected number of software failures detected by time t by substituting the function h(t) into (2.3):

m(t) = a[1 − e ^{−(αbt−ln(e}

^αbt

^−(e

^bt

⁻¹⁾

^α

⁾⁾ ].

3. Numerical examples

We estimate the parameters of the NHPP software reliability models with the help of a

developed Matlab and R programs based on the least squares estimation (LSE) method, and

to estimate the goodness-of-fit of all models in Table 3.1. The goodness-of-fit of the model

can be confirmed by well-known criteria. Table 3.1 summarized the well-known software

reliability models.

(5)

Table 3.1 Software reliability models

N0. Model m(t)

1 Goel-Okumoto (1979) m(t) = a(1 − e

^−bt

)

2 Yamada et al. (1983) m(t) = a(1 − (1 + bt)e

^−bt

)

3 Ohba (1984) m(t) =

^a(1−e^−bt⁾

1+βe^−bt

4 Yamada et al. (1986) m(t) = a(1 − e

^−rα(1−e^−βt⁾

) 5 Yamada et al. (1986) m(t) = a(1 − e

^−rα(1−e⁻

β 2t2

)

6 Hossain-Dahiya (1993) m(t) = log

h

(e

^a

− c) /

e

^ae^−bt

− c i 7 Pham (2006) m(t) = a(1 − e

^−bt

) 1 + (b + d)t + bdt

²

8 Pham (2007) m(t) = m

0

(

_γt^γt+1

0+1

)e

^−γ(t−t⁰⁾

+ α(γt + 1) γt − 1 + (1 − γt

₀

)e

^−γ(t−t⁰⁾

9 Proposed Model m(t) = a[1 − e

^{−(αbt−ln(e}^αbt^−(e^bt⁻¹⁾^α⁾⁾

]

3.1. Criteria for model comparison

Eight common criteria, namely, the mean squared error (MSE), the root mean squared error (RMSE), the R-square (R ² ), the adjusted R-square (Adj R ² ), the sum of absolute errors (SAE), the mean absolute errors (MAE), variation and the root mean square prediction errors (RMSPE) the will be used as criteria for the model estimation of the goodness-of-fit, and to compare the proposed model with other existing models, as listed in Table 3.1. The closer the value of these six criteria, i.e., MSE, RMSE, SAE, MAE, variation and RMSPE, to zero, the better the model fit. On the other hand, it should be close to 1. These criteria are described as follows in Table 3.2 Here, y _i is the total number of failures observed at time t _i , m is the number of unknown parameters in the model, and m(t _i ) is the estimated cumulative number of failures at t i for i = 1, 2, · · · , n.

The Bias is given as;

Bias = 1 n

n

X

i=1

(m(t i ) − y i ).

Table 3.2 List of criteria

No. Criteria Formula

1 MSE

Pn

i=1(m(t_i)−y_i)² n−m

2 RMSE

r

P_n

i=1(m(t_i)−y_i)² n−m

3 R

²

1 −

P_n

i=1(m(t_i)−y_i)² P_n

i=1(y_i− ¯y_i)²

4 Adj R

²

1 − (

^1−R²

)

⁽ⁿ⁻¹⁾

n−m−1

5 SAE P

n

i=1

|m(t

_i

) − y

i

|

6 MAE

Pn

i=1|m(t_i)−y_i| n−m

7 Variation

r

P_n

i=1(y_i−m(t_i)−Bias)² n−1

8 RMSPE √

V ariation

²

+ Bias

²

(6)

3.2.1. Dataset 1

Dataset 1 was reported by Pham (2006). The system records the cumulative of faults by each week. In Dataset 1, the week index is from week 1 to 17, and there are 144 cumulative failures in 17 weeks. Detailed information can be seen in Pham (2006). We obtained the estimated parameters and the eight common criteria of all nine models at t = 1, 2, · · · , 17 from Dataset 1. Table 3.3 shows the estimated parameters and the values of the eight common criteria for all nine models. As a results, Table 3.3 shows that the proposed model for the values of MSE, RMSE, SAE, MAE, variation and RMSPE are 33.4759, 5.7859, 77.3684, 5.5263, 5.4128 and 5.4130, respectively, which are lowest than those of the other models. And the values of R ² and Adj R ² are 0.9819 and 0.9777, which are highest than those of the other models. Next, Model 2 for the values of MSE, RMSE, SAE, MAE, variation and RMSPE are 33.9219, 5.8243, 76.5464, 5.1031, 5.6470 and 5.6494, respectively, which are the second lowest. And also, the values of R ² and Adj R ² are 0.9803 and 0.9775, which are the second highest. As can be seen from the results, the proposed model shows the best suitable when comparing the common criteria than the other models. Figure 3.1 shows graph of the mean value functions for all models based on Dataset 1.

Table 3.3 Model parameter estimation and comparison criteria from Dataset 1 No. Estimated MSE RMSE R

²

Adj R

²

SAE MAE variation RMSPE

parameter

1 a=154.2059 ˆ 55.3181 7.4376 0.9679 0.9634 100.3857 6.6924 7.3878 7.4453 ˆ b=0.14087

2 a=133.9184 ˆ 33.9219 5.8243 0.9803 0.9775 76.5464 5.1031 5.6470 5.6494 ˆ b=0.3997

3 a=134.2495 ˆ 42.9436 6.5531 0.9768 0.9714 87.0914 6.2208 6.1633 6.1737 ˆ b=0.3529

β=2.4824 ˆ

4 a=156.6194 ˆ 64.4332 8.0270 0.9676 0.9569 100.3787 7.7214 7.3974 7.4474 ˆ

α=8.1097 β=0.00524 ˆ

ˆ γ=3.2762

5 a=138.1284 ˆ 50.2130 7.0861 0.9748 0.9664 82.7549 6.3658 6.5079 6.5452 ˆ

α=2.4399 β=0.0294 ˆ ˆ

γ=1.17296

6 a=154.2059 ˆ 59.2694 7.6987 0.9679 0.9605 100.3857 7.1704 7.3878 7.4453 ˆ b=0.1408

ˆ

c=0.00083

7 ˆ a=16.657 385.8002 19.6418 0.7913 0.7432 247.2932 17.6638 20.7402 21.4290 ˆ b=0.533

ˆ c=1E-11

8 α=54817.84 ˆ 665.8495 25.8041 0.6656 0.5541 316.7825 24.3679 23.2784 23.2843 ˆ

γ=0.0035 t ˆ

0

=21.2367 ˆ

m

0

=2221.5751

9 a=136.7776 ˆ 33.4759 5.7858 0.9819 0.9777 77.3684 5.5263 5.4128 5.4130 ˆ b=0.2716

ˆ

α=1.8357

(7)

Figure 3.1 The mean value functions for all models based on Dataset 1

3.2.2. Dataset 2

Dataset 2 was reported by Musa et al. (2006). The system records the cumulative of faults by each week. In Dataset 2, the week index is from week 1 to 14, and there are 38 cumulative failures in 14 weeks. Detailed information can be seen in Musa et al. (2006).

We obtained the estimated parameters and the eight common criteria of all nine models at t = 1, 2, · · · , 14 from Dataset 2. Table 3.4 shows the estimated parameters and the values of the eight common criteria for all nine models. As a results, Table 3.4 shows that the proposed model for the values of MSE, RMSE, SAE, MAE, variation and RMSPE are 3.5951, 1.8961, 17.7190, 1.6108, 1.7472 and 1.7482, respectively, which are lowest than those of the other models. And the values of R ² and Adj R ² are 0.9716 and 0.9631, which are highest than those of the other models. Next, Model 1 for the values of MSE, RMSE and MAE are 3.6343, 1.9064, and 1.7031, respectively, which are the second lowest. And also, the value of Adj is 0.9630, which are the second highest. Model 4 for the values of SAE, variation and RMSPE are 19.6361, 1.7794 and 1.7794, respectively, which are the second lowest. And the value of R ² is 0.9704, which is the second highest. As can be seen from the results, the proposed model shows the best suitable when comparing the common criteria than the other models.

Figure 3.2 shows graph of the mean value functions for all models based on Dataset 2.

(8)

No. Estimated MSE RMSE R

²

Adj R

²

SAE MAE variation RMSPE parameter

1 a=46.1404 ˆ 3.6343 1.9064 0.9687 0.9630 20.4374 1.7031 1.8327 1.8330 ˆ b=0.1182

2 a=35.9507 ˆ 9.1885 3.0312 0.9208 0.9064 32.6041 2.7170 3.0081 3.0371 ˆ b=0.3988

3 a=46.1403 ˆ 3.9647 1.9912 0.9687 0.9593 20.4375 1.8580 1.8327 1.8330 ˆ b=0.1182

β=5.93E-09 ˆ

4 a=67.4549 ˆ 4.1162 2.0288 0.9704 0.9573 19.6361 1.9636 1.7794 1.7794 ˆ

α=0.0183 β=0.0574 ˆ ˆ

γ=80.1118

5 a=38.1089 ˆ 16.0972 4.0121 0.8844 0.8331 39.5895 3.9590 3.6707 3.7164 ˆ

α=1.6323 β=0.0364 ˆ ˆ γ=1.4194

6 a=46.1403 ˆ 3.9647 1.9912 0.9687 0.9593 20.4375 1.8580 1.8327 1.8330 ˆ b=0.1182

ˆ

c=0.00022

7 ˆ a=3.2868 13.0533 3.6129 0.8969 0.8660 36.9137 3.3558 3.8336 3.9783 ˆ b=0.8524

ˆ c=1E-11

8 α=1833.358 ˆ 26.2542 5.1239 0.8115 0.7277 43.5907 4.3591 4.4940 4.4941 ˆ

γ=0.0119 t ˆ

0

=28.7597 ˆ

m

0

=143.1294

9 a=59.1283 ˆ 3.5951 1.8961 0.9716 0.9631 17.7190 1.6108 1.7472 1.7482 ˆ b=0.0608

ˆ α=0.7941

4. Conclusions and remarks

Generally, software systems are known to degrade performance and increase failure rate

as their service life increases. The exponential distribution is relatively easy to calculate

reliability measures and plays an important role in reliability engineering theory such as

product reliability prediction and reliability test design. Therefore, efforts have been made

to expand the exponential distribution assuming the various failure rates. In this paper,

we discussed a new NHPP software reliability model with a fault detection rate function

of the generalized exponential distribution. Tables 3.3 and 3.4 summarized the results of

the estimated parameters of all nine models in Table 3.1 using the LSE technique and the

eight common criteria (MSE, RMSE, R ² , Adj R ² , SAE, MAE, variation and RMSPE) value

for two data sets. As can be seen from Tables 3.3 and 3.4, the MSE, RMSE, SAE, MAE,

variation and RMSPE values for the proposed model are the lowest values compared to

all models. And also, the R ² and Adj R ² values for the proposed model are the highest

values compared to all models. As can be seen from the results, the proposed model shows

the best suitable when comparing the common criteria than the other models. Future work

will involve broader validation of this conclusion based on recent data sets. In addition, the

sensitivity analysis is to investigate the effect of each parameter of the proposed model on

(9)

Figure 3.2 The mean value functions for all models based on Dataset 2

the mean value function.

References

Barreto-Souza, W., Santos, A. H. and Cordeiro, G. M. (2010). The beta generalized exponential distribution.

Journal of Statistical Computation and Simulation, 80, 159-172.

Goel, A. L. and Okumoto, K. (1979). Time-dependent error-detection rate model for software reliability and other performance measures. IEEE Transactions on Reliability, 28, 206-211.

Hossain, S. A. and Dahiya, R. C. (1993). Estimating the parameters of a non-homogeneous Poisson-process model for software reliability. IEEE Transactions on Reliability, 42, 604-612.

Gupta, R. D. and Kundu, D. (1999). Theory & methods: Generalized exponential distributions. Australian

& New Zealand Journal of Statistics, 41, 173-188.

Lee, J., Lee, W. D. and Kang, S. G. (2018). Likelihood-based inference for the ratio of shape parameters of generalized exponential distributions. Journal of the Korean Data & Information Science Society, 29, 795-806.

Musa, J.D., Iannino, K. and Okumoto, K. (2006). Software reliability measurement prediction application, McGraw-Hill, New York.

Nadarajah, S. and Kotz, S. (2006). The exponentiated type distributions. Acta Applicandae Mathematica, 92, 97-111.

Nassar, M. M. and Eissa, F. H. (2003). On the exponentiated Weibull distribution. Communications in Statistics-Theory and Methods, 32, 1317-1336.

Ohba, M. (1984). Inflection S-shaped software reliability growth models. Stochastic Models in Reliability Theory, 235, 144-162.

Pak, H. K., Kang, S. G. and Lee, W. D. (2018). Confidence intervals for the stress-strength reliability of

the generalized exponential distribution. Journal of the Korean Data & Information Science Society,

29, 1309-1318.

(10)

Pham, H. (2007). An imperfect-debugging fault-detection dependent-parameter software. International Journal of Automation and Computing, 4, 325.

Pham, H., Nordmann, L. and Zhang, X. (1999). A general imperfect software debugging model with S- shaped fault detection rate. IEEE Transactions on Reliability, 48, 169-175.

Song, K. Y. and Chang, I. H. (2017). The optimal release time using software reliability model with a Burr type III fault detection rate function. Journal of The Korean Data Analysis Society, 19, 577-586.

Yamada, S., Ohba, M. and Osaki, S. (1983). S-shaped reliability growth modeling for software error detec- tion. IEEE Transactions on Reliability, 32, 475-484.

A Software reliability model with a fault detection rate function of the generalized exponential distribution <sup>†</sup>

A Software reliability model with a fault detection rate function of the generalized exponential distribution †

Kwang Yoon Song 1 · In Hong Chang 2

12 Department of Computer Science and Statistics, Chosun University

Received 26 February 2020, revised 10 March 2020, accepted 10 March 2020

Abstract

Keywords: Fault detection rate, generalized exponential distribution, non-homogeneous Poisson process, software reliability.

1. Introduction

† This study was supported by research funds from Chosun University, 2019.

Researcher (Post-Doc), Department of Computer Science and Statistics, Chosun University, Gwangju 61452, Korea.

Corresponding author: Professor, Computer Science and Statistics, Chosun University, Gwangju 61452,

Korea. E-mail: [email protected]

curves, or a mixture of exponential and S-shaped curves (Yamada et al., 1983; Ohba, 1984;

2. A software reliability model

2.1. A general NHPP software reliability model

A counting process {N (t), t ≥ 0}, is said to be a non-homogeneous Poisson process (NHPP) with intensity function λ(t), if N (t) follows a Poisson distribution with the mean value function m(t), i.e.,

P {N (t) = n} = {m(t)} n

n! e −m(t) , n = 0, 1, 2, · · · .

The mean value function m(t), which is the expected cumulative number of faults detected

at time t can be expressed as

m(t) = Z t

0

λ(s)ds,

where λ(t) represents the fault intensity function.

A general mean value function m(t) of many NHPP software reliability models is proposed the existing NHPP models as follows (Pham et al., 1997);

dm(t)

dt = h(t)[a(t) − m(t)]. (2.1)

The solution for (2.1) is

m(t) = e −H(t) [m 0 + Z t

t

a(τ )h(τ )e H(τ ) dτ ], (2.2)

where, H(t) = R t

t

2.2. A new NHPP software reliability model

In this paper, a new NHPP software reliability with a fault detection rate function of the generalized exponential distribution is presented. We apply (2.1) and (2.2), which are assumptions of the existing NHPP software reliability model (Goel and Okumoto, 1979), and add the following assumptions :

• The initial condition of the mean value function m(t) is m(0) = 0;

• a(t) = a is the expected number of faults that exist in the software before testing;

• The fault detection rate can be expressed by h(t);

We can derive the mean value function m(t) based on the assumptions and differential equations.

dm(t)

dt = h(t)[a − m(t)],

m(t) = a[1 − e − R

h(s)ds ]. (2.3)

We assume that the fault detection rate function has a generalized exponential distribution

as proposed by Gupta and Kundu (1999);

Figure 2.1 Fault detection rate function

h(t) = f (t)

1 − F (t) = αbe −bt (1 − e −bt ) α−1 1 − (1 − e −bt ) α ,

where, b is the fault detection rate and α represents the shape factor. Figure 2.1 shows the graph of the fault detection rate function h(t).

We obtain a new mean value function m(t) for the NHPP software reliability with a fault detection rate function of the generalized exponential distribution that can be used to determine the expected number of software failures detected by time t by substituting the function h(t) into (2.3):

m(t) = a[1 − e −(αbt−ln(e

−(e

−1)

)) ].

3. Numerical examples

We estimate the parameters of the NHPP software reliability models with the help of a

developed Matlab and R programs based on the least squares estimation (LSE) method, and

to estimate the goodness-of-fit of all models in Table 3.1. The goodness-of-fit of the model

can be confirmed by well-known criteria. Table 3.1 summarized the well-known software

reliability models.

Table 3.1 Software reliability models

N0. Model m(t)

1 Goel-Okumoto (1979) m(t) = a(1 − e

)

2 Yamada et al. (1983) m(t) = a(1 − (1 + bt)e

)

3 Ohba (1984) m(t) =

4 Yamada et al. (1986) m(t) = a(1 − e

) 5 Yamada et al. (1986) m(t) = a(1 − e

)

6 Hossain-Dahiya (1993) m(t) = log

h

(e

− c) / 

e

− c i 7 Pham (2006) m(t) = a(1 − e

) 1 + (b + d)t + bdt

8 Pham (2007) m(t) = m

(

)e

A Software reliability model with a fault detection rate function of the generalized exponential distribution ^†

Kwang Yoon Song ¹ · In Hong Chang ²

P {N (t) = n} = {m(t)} ⁿ

n! e ^−m(t) , n = 0, 1, 2, · · · .

m(t) = e ^−H(t) [m 0 + Z t

a(τ )h(τ )e ^{H(τ )} dτ ], (2.2)

m(t) = a[1 − e ⁻ ^R

^h(s)ds ]. (2.3)

1 − F (t) = αbe ^−bt (1 − e ^−bt ) ^α−1 1 − (1 − e ^−bt ) ^α ,

m(t) = a[1 − e ^{−(αbt−ln(e}

^−(e

⁻¹⁾

⁾⁾ ].

− c) /

− c i 7 Pham (2006) m(t) = a(1 − e