HanyangUniversity
Quest Lab.
Chapter 12
Discriminant Analysis
Fall, 2014 Department of IME Hanyang University
Hanyang University Quest Lab.
• Classical statistical method for classification and profiling
• Model-based approach to classification
• Classification is based on the “statistical” distance from each class average
• Classification functions are used to produce classification scores
• Common uses: 미생물의 종(種)분류, loans, credit cards, insurances, 대학입학허가여부, 지문분석 등등
12.1 Introduction
Hanyang University Quest Lab.
12.1 Introduction
Example 1 : Riding mowers (from Chapter 7)
12 owners and 12 nonowners
Hanyang University Quest Lab.
12.1 Introduction
Example 1 : Riding mowers (from Chapter 9)
The line seems to do a good job – 4 misclassifications Can we do better?
Hanyang University Quest Lab.
12.2 Distance of an observation froma class
General idea:
• To classify a new record, measure its distance from the center of each class
• Then, classify the record to the closest class
Hanyang University Quest Lab.
Step 1: Measuring Distance
• Euclidean distance between an item and mean of a class
→ 각 predictor들의 단위에 따라 거리값이 달라짐(달러 vs. 천 달러 등)
→ 각 predictor들의 variability를 고려하지 못함(ex. 거리는 A보다 B에 가까워 도 variability가 B가 더 작으면 A로 classify하는 것이 타당할 수도 있음)
→ Euclidean distance는 predictor간 correlation은 고려하지 못함
→ “Statistical distance” (Mahalanobis distance)
12.2 Distance of an observation froma class
2 2 2
1 1 2 2
( , ) ( ) ( ) (
p p)
DEuclidean x x
=
x-
x+
x-
x+ L +
x-
xHanyang University Quest Lab.
Statistical distance (Mahalanobis distance)
where is the covariance matrix.
If = 1(single predictor), this reduces to z-score.
If = (all predictors are non-correlated with each other), then
12.2 Distance of an observation froma class
[ ] [ ]
2 1
1 1
2 2
1
1 1 2 2
( , )
[( ), ( ), , ( )] ,
T
p p
p p
D S
x x x x
x x x x x x S
x x
-
-
= - -
é - ù
ê - ú
ê ú
= - - -
ê ú
ê ú
ê - ú
ë û
L L
Statistical
x x x x x x
Statistical
( , )
Euclidean( , ).
D x x = D x x
Hanyang University Quest Lab.
12.3 Fisher’s linear classification functions
Idea: to find linear function of measurements that maximizes:
• 분자값이 클수록 class간 구분이 확실히 되며, 분모값이 작을수록 class 내의 동질성 높음)
• 각 obs.마다 이 function을 사용하여 score를 계산하고 이 값이 가장 큰 class로 assign
• 각 class마다 classification function을 하나씩 생성 Step 2: Classification Functions
Hanyang University Quest Lab.
12.3 Fisher’s linear classification functions
Record #1: income = $60K, lot size = 18.4K
Owner score = -73.16 + (0.43)(60) + (5.47)(18.4) = 53.2 Non-owner score= -51.42+(0.33)(60)+(4.68)(18.4)= 54.48
“Non-owner” score is higher → classify as non-owner
How do we find these functions?
Hanyang University Quest Lab.
12.3 Fisher’s linear classification functions
Hanyang University Quest Lab.
12.3 Fisher’s linear classification functions
Step 3:(alternative way to classify) Converting to Probabilities
• It is possible to convert classification scores to probabilities of belonging to a class:
= ()
()+ ()+ ⋯ + ()
Probability that record belongs to class
• The probability is then compared to the cutoff value in order to classify a record
• 장점: 비교없이 lift chart 가능
Hanyang University Quest Lab.
12.3 Fisher’s linear classification functions
.
.+ .= 0.218
Hanyang University Quest Lab.
12.3 Fisher’s linear classification functions
DA 추측
Linear/Quadratic Discriminant Analysis
11/2/2014 15
Bayesian Decision Theory
( ) ( ) ( )
( )
| |
j jj
P C P C
P C
=
x Px x
Posterior probability from Bayes Formula:
( )
jClass probabilities
P C
(Prior)
Conditional density of X:
P ( x | C
j)
( ) ( ) ( )
1
|
n
j j
j
P P C P C
=
= å
x x
Unconditional density of X
11/2/2014 16
Bayesian Classifier
(
1)
1(
2)
2For a given , if | ( ) | ( ),
then is classified to class 1, otherwise, class 2.
x P C p C P C p C x
x
>
xFind the maximum posterior probability
(
1) (
2)
For a given , if | | ,
then is classified to class 1, otherwise, class 2.
P C > P C
x x x
x
( ) ( ) ( )
( )
| |
i ii
P C P C P C
=
xPx x
11/2/2014 17
Bayesian Classifier
(
1)
1(
2)
2For a given , if | ( ) | ( ),
then is classified to class 1, otherwise, class 2.
x P C p C P C p C x
>
x x
사전 확률 계산
P
(C
1) =n
1 /N
,P
(C
2) =n
2 /N
정확한 값이 아니라 추정 (
N
이 커짐에 따라 실제 값에 가까워짐)우도(likelihood) 계산
훈련 집합에서
C
i에 속하는 샘플들을 가지고P
(x |C
i) 추정 Naïve Bayes:P
(x |C
i) =P
(x
1|C
i)P
(x
2|C
i) ⋯P
(x
p|C
i) DA에서는?P
(x |C
i) = multivariate normal 분포를 가정11/2/2014 18
Classification Based on Normal Populations
정규 분포 (가우시언 분포) 현실 세계에 맞는 경우 있음
평균과 분산이라는 두 종류의 매개 변수만으로 표현 가능 수학적인 매력
우도가 정규 분포를 따른다는 가정 하에 베이시언 분류기의 특성을 해석
11/2/2014
Multivariate Normal Distribution
( )
( )
/2 1/2( )
1( )
1 1
| exp
2 2
T
j p j j
j
p C
p
é
-ù
= ê - - - ú
ë Σ û
Σ
x x μ x μ
( )
( )
1
11 1
1
1
, , MVN( , ),
where , , ,
T p
p T
p
p pp
x x
s s
m m
s s
=
é ù
ê ú
= = ê ú
ê ú
ë û
Σ
Σ
L :
L
L M O M
L
x μ
μ
( ) ( )
22
In case of 1, 1 exp
2 2
p p x x
m
p s s
é - ù
ê ú
= = -
ê ú
ë û
11/2/2014
Multivariate Normal Distribution
( )
( )
/2 1/2( )
1( )
1 1
| exp
2 2
T
j p j j
j
p C
p
é - ù
= ê- - - ú
ë Σ û
Σ
x x μ x μ
11/2/2014 21
Discriminant Function
( ) ( ) ( ) ( )
( ) ( ) ( )
| | , or
ln | ln
i i i i
i i i
P C p C P C
p C P C
d d
º µ
= +
x x x
x x
( ) , 1, 2, , (where is the number of classes)
i
i K K
d x = L
The classifier will assign an observation x to class C
iif ( ) ( ) for all
i j
i j
d x > d x ¹
Discriminant function in terms of posterior probability
11/2/2014 22
Discriminant Function for Normal Distribution
( ) ( ) ( )
( ) ( ) ( )
1( )
ln | ln
1 1
ln 2 ln ln
2 2 2
i i i
T
i i i i i
p C P C
p P C
d
p
-= +
= - - Σ + - - Σ -
x x
x μ x μ
( )
( )
/2 1/2( )
1( )
1 1
| exp
2 2
t
i p i i i
i
p C
p
é
-ù
= ê - - - ú
ë Σ û
Σ
x x μ x μ
Decision boundary between class k and class r
( ) ( )
d
kx = d
rx
11/2/2014 23
Derivation of Decision Boundary
0 ) (
) (
2 1
|
| ln 2 1 ) 2 ln(
2 ) 2 ( ln
) ( ) ( 2 1
|
| ln 2 1 ) 2 ln(
2 ) 1 ( ln
2 1
2 2 2
1 1 1 1 1
= - S - +
S +
+ -
- S - -
S -
-
- -
m m
p
m m
p
x x
p P
x x
p P
T T
) 2 (
) 1 ln ( 2 ln
) (
) (
) (
) (
2 1 2
1 2 2 1
1 1
1
P
x P x
x
x
T T-
S
= S - S - - - S
- m
-m m
-m
1
( )
2( ) 0
d x - d x =
24
Decision Boundary When S 1 = S 2
) 2 (
) 1 ln ( 2 ln
) (
) (
) (
) (
2 1 2
1 2 2 1
1 1
1
P
x P x
x
x
T T-
S
= S - S -
- - S
- m
-m m
-m
) 0 2 (
) 1 ln ( ) (
) 2 (
) 1
(
1-
2S
-1-
1-
2S
-1 1-
2- = P
x
TP
T
m m m m
m m
( ) ( ) (
1) (
2)
22 1
2 1
1 2
1 1 1
1 1
1
When 1 úS
û ù êë
é
- + - + - úS û ù êë
é
- + -
= - S
= S
=
S n n
n n
n n
A Decision
Assign x into class 1 if A + B > 0, class 2 if A + B < 0.
x + B =0
A + B = 0: Linear Equation
• 결정 경계
• 예제
27
Decision Boundary When S 1 ¹ S 2
) 2 (
) 1 ln ( 2 ln
) (
) (
) (
) (
2 1 2
1 2 2 1
1 1
1
P
x P x
x
x
T T-
S
= S - S -
- - S
- m
-m m
-m
) 0 2 (
) 1 ln ( )
( ) 2 (
1
12 2 1 1 1 2
2 1
1
- S + S - S - - =
S
-
- - - -P T P x x
x
Tm
Tm
T2
When S
1¹ S
A
Decision
Assign x into class 1 if Ax
2+ Bx + C > 0, class 2 if Ax
2+ Bx + C < 0.
x
2+ =0
Ax
2+ Bx + C = 0: Quadratic Equation
x + C
B
• 예제
11/2/2014 30
Linear Decision Boundary from LDA
(a) Three normal distributions, with the same covariance and different means.
(b) A sample of 30 drawn from each normal distribution, and the fitted LDA decision boundaries.
(a) (b)
11/2/2014 31
Quadratic Decision Boundary from QDA
11/2/2014 32
QDA vs. LDA
Hanyang University Quest Lab.
12.4 Classification performance of discriminant analysis
① 각 class에서의 distance측정에서 multivariate normal distribution을 가정함
• 이 가정이 적절한 경우 다른 어떤 classification 방법보다도 효과적 (적은 data로도 동일한 결과를 얻을 수 있다는 뜻)
• predictor들이 non-normal이거나 dummy variable일 지라도 robust
• outlier에는 sensitive하므로 exploratory analysis가 필요 Main assumptions:
Hanyang University Quest Lab.
12.4 Classification performance of discriminant analysis
② Class내에서의 correlation structure가 다른 class에서도 동일하다는 가정
• 각 class에서의 correlation matrix를 구하여 서로 비교하여 check
• Correlation matrix가 class마다 다른 경우 variability가 가장 큰 쪽으로 classify하는 경향
• 대안은 quadratic discriminant analysis를 수행하는 것(if dataset is very large)
Two main assumptions:
Hanyang University Quest Lab.
12.5 Prior probabilities
③ 이 방법은 각 class들이 향후 특정한 어떤 item을 만나게 될 확률이 동일하 다고 보는 가정을 바탕으로 하고 있음 – 이것이 다를 경우는?
Let = prior probability of membership in class
각 class의 classification function에log()를 더해줌 (natural logarithm)
Ex. Riding-mower owner의 모비율이 15% (sample에서는 12대 12, 즉 50%이 었음)이면?
→ owner로 classify하는 비율이 낮아지도록 유도해야 함
→ Fig 12.3의 owner classification function의 constant term에 각각 log(0.15), nonowner에 log(0.85)를 추가
Hanyang University Quest Lab.
12.5 Prior probabilities
Record가 income = $60K, lot size = 18.4K 인 경우
Owner score = -73.16 + (0.43)(60) + (5.47)(18.4) + log(0.15)= 51.3 Non-owner score= -51.42+(0.33)(60)+(4.68)(18.4) + log(0.85)= 53.64
Hanyang University Quest Lab.
12.6 Unequal misclassification costs
④ Misclassification cost가 symmetric하지 않은 경우에는?
Let = misclassifying cost of class .
• 각 class의 classification function에log()를 더해줌 (prior probability와 동시에 고려한다면log()를 더해주는 셈)
• 각각을 estimate하는 것이 어려우면 그 비율 2/1을 사용. 이 경우에는 class 2의 classification function에만log(2/1)를 더해주면 됨
Hanyang University Quest Lab.
12.7 Classifying more than two classes
119 자원이 일시적으로 부족한데 dispatching 요청이 있을 때 어느 현장 으로 갈 것인가?
Outcome: (3 classes) No injury Non-fatal injury Fatal injury
Predictors: Time of day, day of week, weather, type of road, road surface conditions, …
Example 3: Medical dispatch to accident scene
Hanyang University Quest Lab.
12.7 Classifying more than two classes
- Applied discriminant analysis to 1,000 records,
- 각 record 마다 3개 classification function, confusion matrix is 3×3 - Compute the classification scores
- Highest classification score인 class에 assign Example 3: Medical dispatch to accident scene
Hanyang University Quest Lab.
12.7 Classifying more than two classes
Example 3: Medical dispatch to accident scene
e30.93/ (e25.94+e31.42+e30.93+)
= 0.38
Hanyang University Quest Lab.
12.8 Advantages and weaknesses
• DM 방법이라기보다는 statistical classification method
• Very popular in social science
• Use and performance are similar to multiple linear regression.
-> use same least square method
-> normal assumption (predictors are approx. multivariate normal) -> 실제로 normal이 아닌 경우에도 good
-> 지나치게 skew된 variable의 경우에는 log transform을 하면 좋음
• It provides estimates of single-predictor contributions, and is computationally simple, useful for small datasets.
Hanyang University Quest Lab.
Hanyang University Quest Lab.
• 12.1 (Personal Loan)
• 12.2 (SystemAdministrator)
• 12.3 (detecting spam mail)