Bootstrap confidence intervals for classification error rate in circular models when a block of observations is missing

(1)

Bootstrap confidence intervals for classification error rate in circular models when a block

of observations is missing

Hie-Choon Chung ¹ · Chien-Pai Han ²

1 Department of e-business, Gwangju University

2 Department of Mathematics, University of Texas at Arlington

Received 9 May 2009, revised 17 July 2009, accepted 20 July 2009

Abstract

In discriminant analysis, we consider a special pattern which contains a block of missing observations. We assume that the two populations are equally likely and the costs of misclassification are equal. In this situation, we consider the bootstrap confi- dence intervals of the error rate in the circular models when the covariance matrices are equal and not equal.

Keywords: Block of missing observations, bootstrap confidence interval, circular model, error rate, linear combination classification statistic, Monte Carlo study.

1. Introduction

In discriminant analysis the problem is to classify a px1 observation X of unknown origin to one of several distinct populations using an appropriate classification rule. In this paper it will be assumed that there are two distinct populations which are multivariate normal. We also assume that the two populations are equally likely and the costs of misclassification are equal. The classification rule depends on the situation when the training samples include missing values or not. Assuming that the covariance matrices are circular, we make an appropriate transformation which reduces the circular matrices to canonical forms. The discriminant function is given when the populations are multivariate normal with different circular matrices and the linear combination statistic is used when a block of observations is missing. We consider the bootstrap confidence intervals of the error rate in the circular models when the covariance matrices are equal and not equal.

1

Corresponding author: Associate professor, Department of e-business, Gwangju University, Gwangju 503-703, Korea. E-mail: [email protected]

2

Professor, Department of Mathematics, University of Texas at Arlington, TX 76019-0408, U.S.A.

(2)

2. Discriminant analysis in circular models

Let X, a px1 vector, be an observation which is known to have come from one of two multivariate normal populations. Denote the ith population by π i which is N (µ i , Σ i ) for i = 1, 2. We assume that Σ i is positive definite and circular, i.e. Σ i is of the form

σ _i ²







1 ρ 1 ρ 2 ρ 3 · ρ 2 ρ 1

ρ 1 1 ρ 1 ρ 2 · ρ 3 ρ 2

ρ 2 ρ 1 1 ρ 1 · ρ 4 ρ 3

.. . .. . .. . .. . .. . .. . ρ ₁ ρ ₂ ρ ₃ ρ ₄ · ρ ₁ 1







. (2.1)

It is given in Wise (1955) that the matrix in (2.1) can be transformed into canonical form.

Thus there exists an orthogonal matrix L with (m,n)th element

l mn = p ^−1/2 (

cos 2π

p (m − 1)(n − 1) + sin 2π

p (m − 1)(n − 1) )

such that L ⁰ Σ i L = diag(σ _i1 ² , σ ² _i2 , . . . , σ _ip ² ). Since L is independent of the elements in Σ 1 and Σ 2 , the discriminant function is equivalent to that when the covariance matrices are diagonal.

This is true because the discriminant function derived by the likelihood ratio procedure is invariant under any linear transformation.

2.1. Discriminant function with complete data

Since the circular matrix can be transformed into canonical form, we may let Σ _i = diag(σ _i1 ² , σ _i2 ² , . . . , σ _ip ² ), i = 1, 2. Han (1970) derived the discriminant function by using the likelihood ratio procedure. It is proportional to

(X − µ 2 ) ⁰ Σ ⁻¹ ₂ (X − µ 2 ) − (X − µ 1 ) ⁰ Σ ⁻¹ ₁ (X − µ 1 ).

Substituting µ i and Σ i we obtain, apart from a constant,

V =

p

X

j=1





 1 σ _2j ² − 1

σ _1j ²

!

x j − µ 2j /σ ² _2j − µ 1j /σ _1j ² 1/σ ² _2j − 1/σ ² _1j

! ² 





, (2.2)

where x j and µ ij are the jth component of X and µ i , i = 1, 2, respectively. We classify X into π 1 if V > k and into π 2 if V ≤ k for some suitable choice of the constant k. To find the distribution of V , we shall assume that σ _1j ² > σ ² _2j for all j, or equivalently Σ 1 − Σ 2 is positive definite. Hence 1/σ ² _2j − 1/σ ² _1j > 0. Let

Z j = s 1

σ _2j ² − 1

σ _1j ² x j − µ 2j /σ ² _2j − µ 1j /σ ² _1j 1/σ ² _2j − 1/σ ² _1j

!

.

(3)

Then V = P p

j=1 Z _j ² . When X comes from π _i , i=1 or 2, Z _j are independently distributed as N (ζ _ij , τ _ij ² ) where

ζ ij = s 1

σ ² _2j − 1

σ ² _1j µ ij − µ 2j /σ _2j ² − µ 1j /σ ² _1j 1/σ _2j ² − 1/σ _1j ²

!

, τ _ij ² = σ ² _ij 1 σ ² _2j − 1

σ ² _1j

! .

Therefore V is distributed as the sum of τ _ij ² · χ ⁰² ₁ (δ _ij ² ), where χ ⁰² ₁ (δ ² _ij ) is a non-central χ ² distribution with 1 degree of freedom and non-centrality parameters δ _ij ² = ζ _ij ² /τ ij . It is not easy to obtain the distribution in a closed form. Patnaik (1949) has considered a χ ² approximation to the distribution of the sum by fitting the first two moments. Thus the distribution may be approximated by c _α χ ² _ν

α

where c α =

P

j τ _ij ⁴ + 2 P

j τ _ij ² ζ _ij ² P

j τ _ij ² + P

j ζ _ij ² , ν α = 1 c α

(τ _ij ² + X

j

ζ _ij ² ).

2.2. Discriminant function with incomplete data

In this paper we consider a special pattern which contains a block of missing observations in circular models. Instead of estimating the parameters, we construct two different discrim- inant functions from the complete data and incomplete data, respectively, and then a linear combination of these two linear discriminant functions is used to obtain the classification rule.

When the populations are multivariate normal with equal covariance matrix, that is, π i : N (µ i , Σ), Chung and Han (2000) derived the linear combination statistic when a block of observations is missing.

Let us partition the p x 1 observation X as follows.

X = Y Z

,

where Y is a k × 1 vector and Z is a (p − k) × 1 vector (1 ≤ k < p). Suppose random samples of sizes m _i , containing no missing values,

X _ij = Y _ij Z _ij

, i = 1, 2, ; j = 1, 2, . . . , m _i , are available from

N p (µ i , Σ) = N p

µ yi

µ zi

,

P

yy

P

zy

P

yz

P

zz

,

and random samples of sizes n i − m i , which contain only the first k-components Y _ij(kx1) , i = 1, 2; j = m i + 1, . . . , n i , are available from N k (µ yi , P

yy ). We denote by X ij , i = 1, 2; j =

1, . . . , m i , the complete observations, and by Y ij , i = 1, 2; j = 1, . . . , n i , the incomplete

observations. Hence the data have the special pattern of missing values where a block of

variables is missing on n i −m i observations, and the remaining observations are all complete.

(4)

We can construct two linear discriminant functions. The first linear discriminant function is based on the observations, X _ij , i = 1, 2; j = 1, . . . , m _i . We have

W _x = ( ¯ X ₁ − ¯ X ₂ ) ⁰ S ⁻¹ _xx

"

X − 1

2 ( ¯ X ₁ − ¯ X ₂ )

# ,

where

X ¯ i = 1 m i

m

_i

X

j=1

X ij =

Y ¯ i1

Z ¯ i

, Y ¯ i1 = 1 m i

m

_i

X

j=1

Y ij , Z ¯ i = 1 m i

m

_i

X

j=1

Z ij , i = 1, 2,

S xx =

2 X

i=1 m

_i

X

j=1

(X ij − ¯ X i )(X ij − ¯ X i ) ⁰ /ν x , ν x = m 1 + m 2 − 2.

The second linear discriminant function is based on the incomplete observations, ¯ Y _ij(kx1) , i = 1, 2; j = 1, 2, . . . , n i . We have

W y = ( ¯ Y 1 − ¯ Y 2 ) ⁰ S _yy ⁻¹

"

Y − 1

2 ( ¯ Y 1 − ¯ Y 2 )

#

, where ¯ Y i = 1

n _i m i Y ¯ i1 + (n i − m i ) ¯ Y i2 , Y ¯ i2 = 1

n i − m i n

i

X

j=m

_i

+1

Y ij , S yy =

2 X

i=1 n

i

X

j=1

(Y ij − ¯ Y i )(Y ij − ¯ Y i ) ⁰ /ν y , ν y = n 1 + n 2 − 1.

Now we combine the two linear discriminant functions and construct the classification rule which is a linear combination of W x and W y , namely

W _c = cW _x + (1 − c)W _y , 0 ≤ c ≤ 1. (2.3) Now we consider the linear combination statistic when the populations are multivariate normal with unequal covariance matrix. When a block of observations is missing in circular models, we have the discriminant function.

W c = cW x + (1 − c)W y , 0 ≤ c ≤ 1,

W _x = (X − ¯ X ₂ ) ⁰ S ⁻¹ ₂ (X − ¯ X ₂ ) − (X − ¯ X ₁ ) ⁰ S ₁ ⁻¹ (X − ¯ X ₁ ), W y = (Y − ¯ Y 2 ) ⁰ S y2 (Y − ¯ Y 2 ) − (Y − ¯ Y 1 ) ⁰ S ⁻¹ _y1 (Y − ¯ Y 1 ),

where

c =

1 m 1

+ 1 m 2

! −1

D ² 1

m 1

+ 1 m 2

! −1

D ² + 1 n 1

+ 1 n 2

! −1

D _y ²

,

(5)

D ² = ( ¯ X ₁ − ¯ X ₂ ) ⁰ S ⁻¹ ( ¯ X ₁ − ¯ X ₂ ), D ² _y = ( ¯ Y ₁ − ¯ Y ₂ ) ⁰ S _y ⁻¹ ( ¯ Y ₁ − ¯ Y ₂ ), S = S 1

m 1

+ S 2

m 2

, S i = 1 m i − 1

m

_i

X

j=1

(X ij − ¯ X i ),

S y = S y1

n 1

+ S y2

n 2

, S yi = 1 n i − 1

n

_i

X

j=1

(Y ij − ¯ Y i ).

Substituting ¯ X i , S i , ¯ Y i and S yi we obtain apart from a constant,

V x =

p

X

j=1





 1 s ² _2j − 1

s ² _1j

!

x j − x ¯ _2j /s ² _2j − ¯ x _1j /s ² _1j 1/s ² _2j − 1/s ² _1j

! 2 





for W x ,

V y =

k

X

j=1





 1 s ² _y2j − 1

s ² _y1j

!

x j − y ¯ 2j /s ² _y2j − ¯ y 1j /s ² _y1j 1/s ² _y2j − 1/s ² _y1j

! ² 





for W y .

Then W _c = cV _x + (1 − c)V _y , 0 ≤ c ≤ 1.

The unconditional cdf of V _x and V _y are given (Han, 1970). The distribution of W _c is very complicated.

Now we consider the conditional distribution given the sample statistics. Given sample statistics, V _x is distributed as the sum of τ _ij ^∗2 · χ ⁰² ₁ (δ ^∗2 _ij ), where χ ⁰² ₁ (δ ^∗2 _ij ) is a non-central distribution χ ² with 1 degree of freedom and non-centrality parameters

δ ^∗2 _ij = ζ _ij ^∗2 τ _ij ^∗2 ,

where ζ _ij ^∗ = s 1

s ² _2j − 1

s ² _1j µ ij − x ¯ 2j /s ² _2j − ¯ x 1j /s ² _1j 1/s ² _2j − 1/s ² _1j

!

, τ _ij ^∗2 = σ _ij ² 1 s ² _2j − 1

s ² _1j

! .

Also, given the statistics, V y is distributed as the sum of τ _yij ^∗2 · χ ⁰² ₁ (δ ^∗2 _yij ), where χ ⁰² ₁ (δ ² _yij ) has non-centrality parameters δ ^∗2 _yij = ζ _yij ^∗2 /τ _yij ^∗2 , where

ζ _yij ^∗ = s 1

s ² _y2j − 1

s ² _y1j µ yij − y ¯ _2j /s ² _y2j − ¯ y _1j /s ² _y1j 1/s ² _y2j − 1/s ² _y1j

!

, τ _yij ^∗2 = σ _yij ² 1 s ² _y2j − 1

s ² _y1j

! . The conditional probability of misclassifying an observation X from π ₁ and π ₂ by W _c is given by

β _1n ^∗ = P r(W _c < k|¯ x _ij , ¯ y _ij , s ² _ij , s ² _yij , x, y ∈ π ₁ )

= P r









c

p

X

j=1

τ _ij ^∗2 χ ⁰² ₁ (δ _ij ^∗2 ) + (1 − c)

k

X

j=1

τ _yij ^∗2 χ ⁰² ₁ (δ ^∗2 _yij )



 < k





 .

Similarly,

β _2n ^∗ = P r(W c ≥ k|¯ x ij , ¯ y ij , s ² _ij , s ² _yij , x, y ∈ π 2 )

(6)

= P r









c

p

X

j=1

τ _ij ^∗2 χ ⁰² ₁ (δ _ij ^∗2 ) + (1 − c)

k

X

j=1

τ _yij ^∗2 χ ⁰² ₁ (δ ^∗2 _yij )



 ≥ k





 .

Hence the conditional error rate, with equal prior probability, is defined as

β ^∗ _n = 1

2 (β _1n ^∗ β ^∗ _2n ). (2.4)

We use Patnaik’s method to approximate it by a constant multiple of a central chi-square distribution. The distribution of V x may be approximated by c x χ ² _ν

_x

, where

c x = P

j τ _ij ^∗4 + 2 P

j τ _ij ^∗2 ζ _ij ^∗2 P

j τ _ij ^∗2 + P

j ζ _ij ^∗2 , ν x = 1 c _x ( X

j

τ _ij ^∗2 + X

j

ζ _ij ^∗2 ).

Also, the distribution of V y may be approximated by c y χ ² _ν

y

, where c _y =

P

j τ _yij ^∗4 + 2 P

j τ _yij ^∗2 ζ _yij ^∗2 P

j τ _yij ^∗2 + P

j ζ _yij ^∗2 , ν _y = 1 c y

( X

j

τ _yij ^∗2 + X

j

ζ _yij ^∗2 ).

The conditional probability of misclassifying an observation X from π 1 and π 2 by W c is given by

β _1c ^∗ = P r(W c < k|¯ x ij , ¯ y ij , s ² _ij , s ² _yij , x, y ∈ π 1 ) = P r n

c · c x χ ² _ν

_x

+ (1 − c)c y χ ² _ν

_y

< k o , β _2c ^∗ = P r(W c < k|¯ x ij , ¯ y ij , s ² _ij , s ² _yij , x, y ∈ π 2 ) = P r n

c · c x χ ² _ν

_x

+ (1 − c)c y χ ² _ν

_y

≥ k o . Hence the conditional error rate, with equal prior probability, is defined as

β ^∗ _c = 1

2 (β _1c ^∗ + β ^∗ _2c ). (2.5)

3. Bootstrap confidence interval when training samples do not contain missing values

Let the populations be multivariate normal with equal covariance matrix, that is, π _i : N (µ i , Σ), i = 1, 2.

We now consider the bootstrap confidence interval for the conditional error rate, which is defined as α = Φ(−∆/2), where ∆ ² = (µ 1 − µ 2 ) ⁰ Σ ⁻¹ (µ 1 − µ 2 ), when the training samples contain no missing values. The bootstrap method is a resampling technique using Monte Carlo simulation (Efron, 1982). In our situation, independent random samples of sizes n 1

and n 2 with replacement are taken from the two training samples respectively. An estimator α ˆ ^∗ of α based on the bootstrap sample is obtained by using ˆ α = Φ(−D/2), where D ² = ( ¯ X 1 − ¯ X 2 ) ⁰ S ⁻¹ ( ¯ X 1 − ¯ X 2 ) which is the Mahalanobis squared distance ∆ ² . This process is repeated independently a large number B of times. Then bootstrap confidence interval for α can be obtained from the B values of ˆ α ^∗ . Let ˆ α _(i) ^∗ denote the i-th ordered value, so that ˆ

α ^∗ ₍₁₎ ≤ ˆ α ^∗ ₍₂₎ ≤ . . . ≤ ˆ α ^∗ _(B) .

(7)

There are several methods to construct the bootstrap confidence interval. We will con- sider the percentile method, bias-corrected percentile method, accelerated bias-corrected percentile method to construct the confidence interval (Efron, 1982, 1987; Buckland, 1983, 1984, 1985; Hall, 1986a, 1986b; Hinkley, 1988; DiCiccio and Romano, 1988; among others).

These three types of 100(1 − 2η)% confidence interval are presented as follows:

Percentile method. The confidence interval is given by ( ˆ α ^∗ _(r) , ˆ α _(s) ^∗ ), where r = (B + 1)η, and s = (B + 1)(1 − η), both rounded to nearest integer, subject to r + s = B + 1.

Bias-corrected percentile method. Suppose ˆ α ^∗ _(q) < ˆ α < ˆ α ^∗ _(q+1) , where ˆ α is calculated from the original samples. That is, q of the B bootstrap estimates for α are smaller than ˆ α. Define

z o = Φ ⁻¹ (q/B), η BL = Φ(2z o − z η ) and η BR = Φ(2z o + z η ),

where Φ(z η ) = 1 − η and Φ denotes the cumulative standard normal distribution. Then the confidence interval is given by ( ˆ α ^∗ _(j) , ˆ α ^∗ _(k) ), where j = (B + 1)η BL and k = (B + 1)η BR .

Accelerated bias-corrected percentile method. Define

η AL = Φ(z o + z _o − z η

1 − a(z o − z η ) ), and η AR = Φ(z o + z _o + z _η 1 − a(z o + z η ) ), where

a = 1 6







B

X

i=1

( ˆ α ^∗ _i − ¯ α ˆ ^∗ ) ³ /

" _B X

i=1

( ˆ α ^∗ _i − ¯ α ˆ ^∗ ) ²

# 3 2





 ,

which is called the acceleration constant, and ¯ α ˆ ^∗ is the mean of the B bootstrap estimates for ˆ α ^∗ _i , i = 1, . . . , B.

Then the confidence interval is given by ( ˆ α ^∗ _(u) , ˆ α ^∗ _(v) ), where u = (B + 1)η AL and v = (B + 1)η _AR .

Note that η _AR and η _AL become η _BR and η _BL if a equals 0. If z _o is zero, then η _BR and η BL become η.

We evaluate the bootstrap confidence intervals for the conditional error rate (Chung and Han, 2000) when the training samples come from the circular models in which the covariance matrix has the equal covariance matrix.

4. Bootstrap confidence interval when training samples contain missing values

First, when the covariance matrices are equal, we consider the bootstrap confidence inter- vals for the conditional error rates using W c (Chung and Han, 2000) in circular models.

Also we will consider the bootstrap confidence interval for the conditional error rate β _n ^∗ and β _c ^∗ in (2.4) and (2.5) using W c . The conditional error rate can be estimated by sub- stituting the estimate ˆ σ ² _ij , ˆ µ ij , ˆ σ _yij ² and ˆ µ yij for σ ² _ij , µ ij , σ ² _yij and µ yij in (2.4) and (2.5) respectively. Let ˆ µ _i = [ ¯ Y ⁽ⁱ⁾ , ¯ Z ⁽ⁱ⁾ ] ⁰ and ˆ µ _yi = ¯ Y ⁽ⁱ⁾ be the estimate of µ _i and µ _iy from (2.4) and (2.5). For the variances, let ˆ σ _cij ² = P m

_i

l=1 (x _ijl − ¯ x _ij ) ² /(m _i − 1), i = 1, 2, j =

(8)

1, 2, · · · , p, be the estimates from the complete observations of sizes m _i . Also let ˆ σ _Iij ² = P n

i

l=m

i

+1 (y ijl − ¯ y ij ) ² /(n i − m i ) be the estimates from the incomplete observations of sizes n i − m i using only Y observations, i = 1, 2, j = 1, 2, . . . , k. Then for σ _ij ² , we suggest the combined estimates ˆ σ ² _ij = m _i σ ˆ ² _cij /n _i + (n _i − m _i )ˆ σ ² _Iij /n _i , i = 1, 2, j = 1, 2, . . . , k, and ˆ

σ _ij ² = ˆ σ _cij ² , i = 1, 2, j = k + 1, . . . , p. For σ ² _yij , we use ˆ σ _ij ² , i = 1, 2, j = 1, 2, . . . , k.

We will use these estimates in the construction of the bootstrap confidence intervals for the conditional error rate when the training samples contain missing observations. Basically the same procedure described for α is applied in this situation for getting the three types of the bootstrap confidence intervals, i.e., the percentile method, the bias-corrected percentile method, and the accelerated bias-corrected method. In order to evaluate the properties of the confidence intervals, we conduct a similar Monte Carlo study described for the optimal error rate.

References

Anderson, T. W. (1984). An introduction to multivariate statistical analysis, John Wiley and Sons, New York.

Buckland, S. T. (1983). Monte Carlo methods for confidence interval estimation using the bootstrap tech- nique. Bias, 10, 194-212.

Buckland, S. T. (1984). Monte Carlo confidence intervals. Biometrics, 40, 811-817.

Buckland, S. T. (1985). Calculation of Monte Carlo confidence intervals. Royal Statistical Society, Algo- rithm AS214, 297-301.

Chung, H. C. and Han, C. P. (2000). Discriminant analysis when a block of observations is missing. Annals of the Institute of Statistical Mathematics, 52, 544-556.

Cho, J. J., Han, J. H. and Lee, I. P. (1999). Better bootstrap confidence intervals for process incapability index. Journal of the Korean Data & Information Science Society, 10, 341-357.

Dempster, A. P., Laird, N. M. and Rubin, R. J. A. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B , 39, 302-306.

DiCiccio, T. J. and Romano, J. P. (1988). A review of bootstrap confidence intervals. Journal of the Royal Statistical Society, Series B, 50, 338-354.

Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. CBMS-NSF Regional Con- ference Series in Applied Mathematics, 38, Society for Industrial and Applied Mathematics(SIAM), Philadelphia.

Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82, 171-200.

Hall, P. (1986a). On the bootstrap and confidence intervals. Annals of Statistics, 14, 1431-1452.

Hall, P. (1986b). On the number of bootstrap simulations required to construct a confidence interval.

Annals of Statistics, 14, 1453-1462.

Han, C. P. (1970). Distribution of discriminant function in circular models. Annals of the Institute of statistical Mathematics, 22, 117-125.

Hinkley, D. V. (1988). Bootstrap methods. Journal of the Royal Statistical Society, Ser. B , 50, 321-337.

Patnaik, D. B. (1949). The non-central and F distributions and their applications. Biometrika, 36, 202-232.

Wise, J. (1955). The autocorrelation function and the spectral density function. Biometrika, 42, 151-159.

Bootstrap confidence intervals for classification error rate in circular models when a block of observations is missing

Bootstrap confidence intervals for classification error rate in circular models when a block

of observations is missing

Hie-Choon Chung 1 · Chien-Pai Han 2

1 Department of e-business, Gwangju University

2 Department of Mathematics, University of Texas at Arlington

Received 9 May 2009, revised 17 July 2009, accepted 20 July 2009

Abstract

Keywords: Block of missing observations, bootstrap confidence interval, circular model, error rate, linear combination classification statistic, Monte Carlo study.

1. Introduction

Corresponding author: Associate professor, Department of e-business, Gwangju University, Gwangju 503-703, Korea. E-mail: [email protected]

Professor, Department of Mathematics, University of Texas at Arlington, TX 76019-0408, U.S.A.

2. Discriminant analysis in circular models

Let X, a px1 vector, be an observation which is known to have come from one of two multivariate normal populations. Denote the ith population by π i which is N (µ i , Σ i ) for i = 1, 2. We assume that Σ i is positive definite and circular, i.e. Σ i is of the form

σ i 2















1 ρ 1 ρ 2 ρ 3 · ρ 2 ρ 1

ρ 1 1 ρ 1 ρ 2 · ρ 3 ρ 2

ρ 2 ρ 1 1 ρ 1 · ρ 4 ρ 3

.. . .. . .. . .. . .. . .. . ρ 1 ρ 2 ρ 3 ρ 4 · ρ 1 1















. (2.1)

It is given in Wise (1955) that the matrix in (2.1) can be transformed into canonical form.

Thus there exists an orthogonal matrix L with (m,n)th element

l mn = p −1/2 (

cos 2π

p (m − 1)(n − 1) + sin 2π

p (m − 1)(n − 1) )

such that L 0 Σ i L = diag(σ i1 2 , σ 2 i2 , . . . , σ ip 2 ). Since L is independent of the elements in Σ 1 and Σ 2 , the discriminant function is equivalent to that when the covariance matrices are diagonal.

This is true because the discriminant function derived by the likelihood ratio procedure is invariant under any linear transformation.

2.1. Discriminant function with complete data

Since the circular matrix can be transformed into canonical form, we may let Σ i = diag(σ i1 2 , σ i2 2 , . . . , σ ip 2 ), i = 1, 2. Han (1970) derived the discriminant function by using the likelihood ratio procedure. It is proportional to

(X − µ 2 ) 0 Σ −1 2 (X − µ 2 ) − (X − µ 1 ) 0 Σ −1 1 (X − µ 1 ).

Substituting µ i and Σ i we obtain, apart from a constant,

V =

p

X

j=1





 1 σ 2j 2 − 1

σ 1j 2

!

x j − µ 2j /σ 2 2j − µ 1j /σ 1j 2 1/σ 2 2j − 1/σ 2 1j

! 2 





, (2.2)

Z j = s 1

σ 2j 2 − 1

σ 1j 2 x j − µ 2j /σ 2 2j − µ 1j /σ 2 1j 1/σ 2 2j − 1/σ 2 1j

!

.

Then V = P p

j=1 Z j 2 . When X comes from π i , i=1 or 2, Z j are independently distributed as N (ζ ij , τ ij 2 ) where

ζ ij = s 1

σ 2 2j − 1

σ 2 1j µ ij − µ 2j /σ 2j 2 − µ 1j /σ 2 1j 1/σ 2j 2 − 1/σ 1j 2

!

, τ ij 2 = σ 2 ij 1 σ 2 2j − 1

σ 2 1j

! .

where c α =

P

j τ ij 4 + 2 P

j τ ij 2 ζ ij 2 P

j τ ij 2 + P

j ζ ij 2 , ν α = 1 c α

Hie-Choon Chung ¹ · Chien-Pai Han ²

σ _i ²

.. . .. . .. . .. . .. . .. . ρ ₁ ρ ₂ ρ ₃ ρ ₄ · ρ ₁ 1

l mn = p ^−1/2 (

such that L ⁰ Σ i L = diag(σ _i1 ² , σ ² _i2 , . . . , σ _ip ² ). Since L is independent of the elements in Σ 1 and Σ 2 , the discriminant function is equivalent to that when the covariance matrices are diagonal.

Since the circular matrix can be transformed into canonical form, we may let Σ _i = diag(σ _i1 ² , σ _i2 ² , . . . , σ _ip ² ), i = 1, 2. Han (1970) derived the discriminant function by using the likelihood ratio procedure. It is proportional to

(X − µ 2 ) ⁰ Σ ⁻¹ ₂ (X − µ 2 ) − (X − µ 1 ) ⁰ Σ ⁻¹ ₁ (X − µ 1 ).

 1 σ _2j ² − 1

σ _1j ²

x j − µ 2j /σ ² _2j − µ 1j /σ _1j ² 1/σ ² _2j − 1/σ ² _1j

! ² 

σ _2j ² − 1

σ _1j ² x j − µ 2j /σ ² _2j − µ 1j /σ ² _1j 1/σ ² _2j − 1/σ ² _1j

j=1 Z _j ² . When X comes from π _i , i=1 or 2, Z _j are independently distributed as N (ζ _ij , τ _ij ² ) where

σ ² _2j − 1

σ ² _1j µ ij − µ 2j /σ _2j ² − µ 1j /σ ² _1j 1/σ _2j ² − 1/σ _1j ²

, τ _ij ² = σ ² _ij 1 σ ² _2j − 1

σ ² _1j

j τ _ij ⁴ + 2 P

j τ _ij ² ζ _ij ² P

j τ _ij ² + P

j ζ _ij ² , ν α = 1 c α

(τ _ij ² + X

ζ _ij ² ).

X = Y Z

,

where Y is a k × 1 vector and Z is a (p − k) × 1 vector (1 ≤ k < p). Suppose random samples of sizes m _i , containing no missing values,

X _ij = Y _ij Z _ij

, i = 1, 2, ; j = 1, 2, . . . , m _i , are available from

µ yi

,

P

and random samples of sizes n i − m i , which contain only the first k-components Y _ij(kx1) , i = 1, 2; j = m i + 1, . . . , n i , are available from N k (µ yi , P

We can construct two linear discriminant functions. The first linear discriminant function is based on the observations, X _ij , i = 1, 2; j = 1, . . . , m _i . We have

W _x = ( ¯ X ₁ − ¯ X ₂ ) ⁰ S ⁻¹ _xx

2 ( ¯ X ₁ − ¯ X ₂ )

Y ¯ i1

(X ij − ¯ X i )(X ij − ¯ X i ) ⁰ /ν x , ν x = m 1 + m 2 − 2.

The second linear discriminant function is based on the incomplete observations, ¯ Y _ij(kx1) , i = 1, 2; j = 1, 2, . . . , n i . We have

W y = ( ¯ Y 1 − ¯ Y 2 ) ⁰ S _yy ⁻¹