Run-length distribution for variance control charts with runs rules using finite Markov chain imbedding
Hyeonggyu Kim 1 · Gyo-Young Cho 2
12 Department of Statistics, Kyungpook National University
Received 25 June 2019, revised 9 July 2019, accepted 11 July 2019
Abstract
We propose a method for obtaining run-length distribution for Shewhart control charts with supplementary runs rules. We use Markov chain imbedding for obtaining the run length distribution. The method has several advantages. It is easy and fast to calculate and can be applied to other areas such as reliability with these advantages. In addition, accurate and clear distribution can be obtained. In this paper, we calculate probabilities of the run length according to the changes of the variances through S
2control charts with supplementary runs rules. When the variance increases, the average run length (ARL) decreases rapidly. Also, when the degree of freedom increases, the ARL decreases.
Keywords: Markov chain imbedding, probability distribution, quartiles, run length, vari- ance.
1. Introduction
Control charts are statistical procedures that quickly detect changes in product quality.
The Shewart control chart was invented for monitoring processes initiated by Shewhart in 1924. This control chart has been studied and developed since Shewhart’s proposal and is now used as an important tool for quality control in manufacturing process. The control chart is made up the lower and upper control limits with target values. If points are not within the control limits then it seems to be out of control.
Shewhart charts are good for finding medium shifts and large shifts, but it is not effective to small shifts. Woodall and Montgomery (1999) had tried to overcome these shortcomings.
There were two attempts. One of attempts is that charts had evolved to detect small shifts such as EWMA and CUSUM charts. The other is that applying the runs rules to chart.
First of all, if the process shift is small, the EWMA and CUSUM control charts are better than the Shewhart control chart. These charts are usually more complex in terms of chart interpretation, characterization and plotting.
1
Graduate student, Department of Statistics, Kyungpook National University, Daegu, 41566, Korea.
2
Corresponding author: Professor, Department of Statistics, Kyungpook National University, 80 Dae-
hakro, Bukgu, Daegu, 41566, Korea. E-mail: [email protected]
Second, an attempt to enhance the performance of the Shewhart chart is based on runs to add supplementary stopping runs rules. These supplemental runs rules react more sensitively to small changes in the mean in charts. In 1956, Western Electric Company proposed the stopping runs rules. We will use these runs rules. Although people are using these runs rules, there are not many studies to decide the characteristics of this chart. Champ and Woodall (1987) used an algorithm to gain a correct run length distribution and to calculate the time until a signal was generated combining two or more runs rules. This is supplementary runs rules. Bi et al. (2016) studied on the run related probability function and their application to start-up demonstration tests.
In this paper, we suggest a new method to calculate the accurate run length distribution for diverse runs rules based on finite Markov chain imbedding. This is same result as Champ and Woodall (1987) and Shmueli and Cohen (2003). We show ARL values and the quartile values.
2. Finite Markov chain imbedding
2.1. Run
The term run is used in the field of statistics and probability. Problems with run have always attracted the attention of statisticians and probabilist from the beginning. Balakr- ishnan and Koutras (2002) describe how runs are defined and used. Traditionally, within a sequence of Bernoulli, a run signifies a consecutive sequence.
For example, assume there are binary sequence 111100111, we have a run of four l’s, then a run of two 0’s, finally a run of three 1. Therefore, the binary sequence has three runs.
2.2. Finite Markov chain imbedding
Fu (1985) has researched the finite Markov chain imbedding (FMCI) method to obtain the distribution. The finite Markov chain imbedding method has been applied in varied fields for finding the approximate or exact distribution of runs and patterns. Then the method of FMCI was formalized by Fu and Koutras (1994).
Let Γ n = {0, 1, . . . , n} be an index set, and let Ω = {a 1 , a 2 , . . . , a m } be a finite state space.
It is feasible to imbedded a finite Markov chain if a nonnegative integer used in the random variable satisfies the followings The first condition, a finite markov chain {Y t : t ∈ Γ n } must be defined on a finite state space Ω. ξ 0 is defined initial probability vector. The second condition, there is a finite division {C x : x = 0, 1, . . . , l n } on the state space Ω. The third condition, for all x = 0, 1, . . . , l n , we have
P (X n (Λ) = x) = P (Y n ∈ C x |ξ 0 ). (2.1) Suppose that the sequence of m × m transition probability matrices of finite Markov chain {Y t } defined on the state space Ω with beginning probability distribution ξ 0 = (P (Y 0 = a 1 ), P (Y 0 = a 2 ), . . . , P (Y 0 = a m )) for {M t } n t=1 is continuous. Fu and Koutras(1994) pro- posed to calculate the probability with the followng theorem. If X n (Λ) is finite Markov chain imbeddable,
P (X n (Λ) = x) = ξ 0 (
n
Y
t=1
M t )U 0 (C x ), (2.2)
where ξ 0 is the beginning probability vector, and M t , t = 1, . . . , n are the transition proba- bility matrices of the imbedded Markov chain, U (C x ) = Σ r:a
r∈C
xe r , e r is a 1 × m unit row vector matching from state a r .
2.3. Waiting-time distribution
Suppose the simple pattern of r consecutive successes, and let the random variable W (Λ) as the waiting time for pattern Λ to happen, i.e.
W (Λ) = inf{n : X n−r+1 = X n−r+2 = · · · = X n = S}. (2.3) For example, given r = 4, W (Λ) = 7 signify that the pattern SSSS occurs for the first time after seven trials, as in FSFSSSS. The distribution of W (Λ) for Bernoulli trials is often referred to as the geometric distribution of order r.
FU and Lou(2003) derived distribution of waiting time and average of the waiting time.
The contents are as follows.
Theorem. For a given pattern length r is r ≥ 1, the Bernoulli trials are {X t } and the distribution of W (Λ) is
P (W (Λ) = n) = ξN n−1 (Λ)(I − N(Λ))1 0 , (2.4) where ξ = (1, 0, . . . , 0) is a 1 × r row vector, 1 = (1, 1, . . . , 1) is a 1 × r row vector and N(Λ) is the r × r important transition probability submatrix of
M(Λ) = 0 1 . . . . . . c − 1
α
q p 0 · · · 0 0 0
q 0 p · · · 0 0 0
. .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . q 0 · · · · · · · · · 0 p 0 0 · · · · · · · · · 0 1
(r+1)×(r+1)
=
N(Λ) C
0 1
. (2.5)
When the waiting time becomes the average of W (Λ), the formula is:
EW (Λ) = ξ(I − N) −1 1 0 . (2.6)
The average waiting time is equal to the average run length (ARL). Through the ARL, we can evaluate the performance of the Shewhart chart.
3. Distribution of run-length
3.1. Runs rules and notation
The Western Electric Handbook (1956) suggested rules for detecting non-random patterns on control charts as follows
R 1 : A single point excess the 3-sigma control limits;
R 2 : Two out of three consecutive points excess the 2-sigma limits;
R 3 : Four out of five consecutive points excess the 1-sigma limits;
R 4 : Eight consecutive points fall on one side of the center line.
Runs rules split the area in the control chart and apply four rules. The area split into seven parts of S, A 2 , B 2 , C 2 , C 1 , B 1 , A 1 as follow
Figure 3.1 Seven areas of control chart
We use a notation for a runs and scans rule, similar to the one used by Champ and Woodall (1987). The rule signals if k of the last m standardized points fall in area a, and is denoted by T (k, m, a). If k=m, we call a run and if k<m, we call a scan (Balakrishnan and Koutras, 2002). Thus the usual Shewhart chart is denoted by T (1, 1, S). The seven areas which are considered are the followings:
R 1 = {T (1, 1, S)},
R 2 = {T (2, 3, A 2 ), T (2, 3, A 1 )},
R 3 = {T (4, 5, A 2 ∪ B 2 ), T (4, 5, B 1 ∪ A 1 )},
R 4 = {T (8, 8, A 2 ∪ B 2 ∪ C 2 ), T (8, 8, C 1 ∪ B 1 ∪ A 1 )}.
We use the combination of Rule1, Rule2, Rule3 and Rule4 to create supplementary runs rules. Beforetime, runs were developed based on Bernoulli trials. The distribution of the waiting time of the Bernoulli trials was studied according to the geometric distribution of order by Aki (1985) and Hirano (1986).
3.2. Run-length distribution for control chart with runs rules
We consider the S 2 control chart with runs rules for monitoring the process variationσ 2 .
If the production process is in control, the control statistic for monitoring variance σ 2 has a
chi-square distribution with degrees of freedom n-1.
(n − 1)S 2
σ 2 0 ∼ χ 2 (n − 1). (3.1)
If the process has changed from σ 0 2 to σ 1 2 , the control statistic is as follows (n − 1)S 2
σ 0 2 = σ 1 2 σ 0 2
(n − 1)S 2 σ 1 2 ∼ σ 1 2
σ 0 2 χ 2 (n − 1). (3.2) Through this, we try to observe the ARL when there is a change in the variance.
The area is divided into four sections. Each area is C, B, A, S and the probability that the control statistic falls in the each area is as follows. When the process is in control, the probability for Area C is 0.6827, area B is 0.2718, area A is 0.0428 and area S is 0.0027, respectively.
Figure 3.2 Four areas of control chart
Let us consider the following runs rules:
R 1 = {T (1, 1, S)}, R 2 = {T (2, 3, A)}, R 3 = {T (4, 5, B ∪ A)}.
In addition, Duncan’s proposed rule is used.:
R 4 = {T (2, 2, A)}, R 5 = {T (5, 5, B ∪ A)}.
We will use Rule1, Rule2, Rule3, Rule4, and Rule5. Rule1 is the default rule, and Rule2,
Rule3, Rule4, and Rule5 are combined to make runs rules. Let p i be the probability of
entering each area.
If R 2 is added to R 1 , the following states are possible: S, BS 1 S 1 , S 1 BS 1 , CS 1 S 1 , S 1 CS 1 , S 1 S 1 where S 1 = A. When the process is in control, the probability are as follows :
p a = P (A) = 0.0428, p b = P (B) = 0.2718, p c = P (C) = 0.6827, δ = P (S) = 0.0027.
If R 3 is added to R 1 , the following states are possible: S, CS 1 S 1 S 1 S 1 , S 1 CS 1 S 1 S 1 , S 1 S 1 CS 1 S 1 , S 1 S 1 S 1 CS 1 , S 1 S 1 S 1 S 1 where S 1 = A ∪ B. When the process is in control, the probability are as follows :
p a = P (S 1 ) = 0.3146, p b = P (C) = 0.6827, δ = 0.0027.
If R 4 is added to R 1 , the following states are possible: S, S 1 S 1 where S 1 = A. When the process is in control, the probability are as follows :
p a = P (S 1 ) = P (A) = 0.0428, p b = P (B) = 0.2718, p c = P (C) = 0.6827, δ = 0.0027.
If R 5 is added to R 1 , the following states are possible: S, S 1 S 1 S 1 S 1 S 1 where S 1 = A ∪ B.
When the process is in control, the probability are as follows:
p a = P (S 1 ) = 0.3146, p b = P (C) = 0.6827, δ = 0.0027.
Figure 3.3 Combination of Rule1 and Rule2 (or Rule1 and Rule4)
Table 3.1 ∼ Table 3.4 show ARLs and quartiles of run length for the S 2 control charts
with supplementary runs rules when the variance increases for degrees of freedom 3 and
5. ARL usually used as a representative value. However, the distribution of RL is skewed,
so it can’t be said as a representative value. Therefore, quartiles have been added. This is
possible because we could get an accurate and clear distribution using a finite Markov chain
imbedding method. In the tables, r is equal to σ 1 /σ 0 .
Figure 3.4 Combination of Rule1 and Rule3 (or Rule1 and Rule5)
Table 3.1 ARL and quartiles of run length with R
1and R
2df=3 df=5
r ARL Q
1Med Q
3r ARL Q
1Med Q
31.00 166.56 49 116 230 1.00 166.56 49 116 230
1.43 41.53 13 29 57 1.43 29.06 9 21 40
2.50 5.87 3 4 8 2.50 3.51 2 3 4
5.00 1.57 1 1 2 5.00 1.17 1 1 1
Table 3.2 ARL and quartiles of run length with R
1and R
3df=3 df=5
r ARL Q
1Med Q
3r ARL Q
1Med Q
31.00 53.28 17 38 73 1.00 53.28 17 38 73
1.43 13.89 6 10 18 1.43 10.91 5 8 14
2.50 4.31 4 4 5 2.50 3.49 3 4 4
5.00 1.84 1 1 2 5.00 1.2 1 1 1
Table 3.3 ARL and quartiles of run length with R
1and R
4df=3 df=5
r ARL Q
1Med Q
3r ARL Q
1Med Q
31.00 224.39 65 156 311 1.00 224.39 65 156 311
1.43 57.37 17 40 79 1.25 76.12 23 53 105
2.50 7.20 3 5 10 2.50 3.99 2 3 5
5.00 1.58 1 1 2 5.00 1.17 1 1 1
Figures 3.5 ∼ 3.12 show the probability and cumulative distribution of run length for the
S 2 control charts with supplementary runs rules when the variance increases for degrees of
freedom 3 and 5. When the applied rule is executed, run shows the probability of increasing
the probability at one point. This is called spike. It indicates the starting position of the
Table 3.4 ARL and quartiles of run length with R
1and R
5df=3 df=5
r ARL Q
1Med Q
3r ARL Q
1Med Q
31.00 207.52 61 144 287 1.00 207.52 61 144 287
1.43 38.58 13 28 52 1.43 26.91 10 20 36
2.50 6.18 5 5 8 2.50 4.35 3 5 5
5.00 1.89 1 1 2 5.00 1.2 1 1 1
runs rule. It is used by Shmueli and Cohen (2003). For example, look at Figure 3.7. It is combination of Rule 1 and Rule 3. When n is 5, we can show that the probability increases at a point. In the figures, r is equal to σ 1 /σ 0 .
0 10 20 30 40 50
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14
r=1 (ARL=166.56)
N
P[RL=n]
N
P[RL=n]
0.5 1
P[RL<n]
0 10 20 30 40 50
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14
r=1.43 (ARL=41.53)
N
P[RL=n]
N
P[RL=n]
0.5 1
P[RL<n]
0 10 20 30 40 50
0.00 0.05 0.10 0.15
r=2.5 (ARL=5.87)
N
P[RL=n]
N
P[RL=n]
0.5 1
P[RL<n]
0 10 20 30 40 50
0.0 0.1 0.2 0.3
r=5 (ARL=1.57)
N
P[RL=n]
N
P[RL=n]
0.5 1
P[RL<n]
Figure 3.5 Run-length probability distribution and cumulative distribution (Rule1 and Rule2, df=3)
0 10 20 30 40 50 0.00
0.02 0.04 0.06 0.08 0.10 0.12 0.14
r=1 (ARL=166.56)
N
P[RL=n]
N
P[RL=n]
0.5 1
P[RL<n]
0 10 20 30 40 50
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14
r=1.43 (ARL=29.06)
N
P[RL=n]
N
P[RL=n]
0.5 1
P[RL<n]
0 10 20 30 40 50
0.00 0.05 0.10 0.15
r=2.5 (ARL=3.51)
N
P[RL=n]
N
P[RL=n]
0.5 1
P[RL<n]
0 10 20 30 40 50
0.0 0.1 0.2 0.3 0.4
r=5 (ARL=1.17)
N
P[RL=n]
N
P[RL=n]
0.5 1
P[RL<n]
Figure 3.6 Run-length probability distribution and cumulative distribution (Rule1 and Rule2, df=5)
0 10 20 30 40 50
0.000 0.005 0.010 0.015 0.020 0.025 0.030
r=1 (ARL=53.28)
N
P[RL<n]
N
P[RL=n]
0.3 0.6
P[RL<n]
0 10 20 30 40 50
0.000 0.005 0.010 0.015 0.020 0.025 0.030
r=1.43 (ARL=13.89)
N
P[RL<n]
N
P[RL=n]
0.3 0.6
P[RL<n]
0 10 20 30 40 50
0.00 0.01 0.02 0.03 0.04 0.05
r=2.5 (ARL=4.31)
N
P[RL<n]
N
P[RL=n]
0.4 0.8
P[RL<n]
0 10 20 30 40 50
0.00 0.05 0.10 0.15 0.20
r=5 (ARL=1.84)
N
P[RL<n]
N
P[RL=n]
0.5 1
P[RL<n]
Figure 3.7 Run-length probability distribution and cumulative distribution (Rule1 and Rule3, df=3)
0 10 20 30 40 50 0.000
0.005 0.010 0.015 0.020 0.025 0.030
r=1 (ARL=53.28)
N
P[RL<n]
N
P[RL=n]
0.3 0.6
P[RL<n]
0 10 20 30 40 50
0.000 0.005 0.010 0.015 0.020 0.025 0.030
r=1.43 (ARL=10.91)
N
P[RL<n]
N
P[RL=n]
0.3 0.6
P[RL<n]
0 10 20 30 40 50
0.00 0.01 0.02 0.03 0.04 0.05
r=2.5 (ARL=3.49)
N
P[RL<n]
N
P[RL=n]
0.4 0.8
P[RL<n]
0 10 20 30 40 50
0.00 0.05 0.10 0.15 0.20 0.25
r=5 (ARL=1.2)
N
P[RL<n]
N
P[RL=n]
0.5 1
P[RL<n]
Figure 3.8 Run-length probability distribution and cumulative distribution (Rule1 and Rule3, df=5)
0 10 20 30 40 50
0.000 0.001 0.002 0.003 0.004 0.005
r=1 (ARL=224.39)
N
P[RL<n]
N
P[RL=n]
0.1 0.2
P[RL<n]
0 10 20 30 40 50
0.000 0.001 0.002 0.003 0.004 0.005
r=1.43 (ARL=57.37)
N
P[RL<n]
N
P[RL=n]
0.1 0.2
P[RL<n]
0 10 20 30 40 50
0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007
r=2.5 (ARL=7.2)
N
P[RL<n]
N
P[RL=n]
0.15 0.3
P[RL<n]
0 10 20 30 40 50
0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035
r=5 (ARL=1.58)
N
P[RL<n]
N
P[RL=n]
0.4 0.8
P[RL<n]
Figure 3.9 Run-length probability distribution and cumulative distribution (Rule1 and Rule4, df=3)
0 10 20 30 40 50 0.000
0.001 0.002 0.003 0.004 0.005
r=1 (ARL=224.39)
N
P[RL<n]
N
P[RL=n]
0.1 0.2
P[RL<n]
0 10 20 30 40 50
0.000 0.001 0.002 0.003 0.004 0.005
r=1.43 (ARL=39.74)
N
P[RL<n]
N
P[RL=n]
0.1 0.2
P[RL<n]
0 10 20 30 40 50
0.000 0.002 0.004 0.006 0.008
r=2.5 (ARL=3.99)
N
P[RL<n]
N
P[RL=n]
0.15 0.3
P[RL<n]
0 10 20 30 40 50
0.00 0.01 0.02 0.03 0.04 0.05
r=5 (ARL=1.17)
N
P[RL<n]
N
P[RL=n]
0.45 0.9
P[RL<n]
Figure 3.10 Run-length probability distribution and cumulative distribution (Rule1 and Rule4, df=5)
0 10 20 30 40 50
0.000 0.001 0.002 0.003 0.004 0.005 0.006
r=1 (ARL=207.52)
N
P[RL=n]
N
P[RL=n]
0.1 0.2
P[RL<n]
0 10 20 30 40 50
0.000 0.001 0.002 0.003 0.004 0.005 0.006
r=1.43 (ARL=38.58)
N
P[RL=n]
N
P[RL=n]
0.1 0.2
P[RL<n]
0 10 20 30 40 50
0.000 0.002 0.004 0.006 0.008 0.010
r=2.5 (ARL=6.18)
N
P[RL=n]
N
P[RL=n]
0.15 0.3
P[RL<n]
0 10 20 30 40 50
0.00 0.02 0.04 0.06 0.08
r=5 (ARL=1.89)
N
P[RL=n]
N
P[RL=n]
0.45 0.9
P[RL<n]
Figure 3.11 Run-length probability distribution and cumulative distribution (Rule1 and Rule5, df=3)
0 10 20 30 40 50 0.000
0.001 0.002 0.003 0.004 0.005 0.006
r=1 (ARL=207.52)
N
P[RL=n]
N
P[RL=n]
0.1 0.2
P[RL<n]
0 10 20 30 40 50
0.000 0.001 0.002 0.003 0.004 0.005 0.006
r=1.43 (ARL=26.91)
N
P[RL=n]
N
P[RL=n]
0.1 0.2
P[RL<n]
0 10 20 30 40 50
0.000 0.002 0.004 0.006 0.008 0.010 0.012
r=2.5 (ARL=4.35)
N
P[RL=n]
N
P[RL=n]
0.2 0.4
P[RL<n]
0 10 20 30 40 50
0.00 0.05 0.10 0.15
r=5 (ARL=1.2)
N
P[RL=n]
N
P[RL=n]
0.5 1
P[RL<n]