• 검색 결과가 없습니다.

Chapter 12 Discriminant Analysis

N/A
N/A
Protected

Academic year: 2022

Share "Chapter 12 Discriminant Analysis"

Copied!
22
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

HanyangUniversity

Quest Lab.

Chapter 12

Discriminant Analysis

Fall, 2014 Department of IME Hanyang University

Hanyang University Quest Lab.

• Classical statistical method for classification and profiling

• Model-based approach to classification

• Classification is based on the “statistical” distance from each class average

• Classification functions are used to produce classification scores

• Common uses: 미생물의 종(種)분류, loans, credit cards, insurances, 대학입학허가여부, 지문분석 등등

12.1 Introduction

(2)

Hanyang University Quest Lab.

12.1 Introduction

Example 1 : Riding mowers (from Chapter 7)

12 owners and 12 nonowners

Hanyang University Quest Lab.

12.1 Introduction

Example 1 : Riding mowers (from Chapter 9)

The line seems to do a good job – 4 misclassifications Can we do better?

(3)

Hanyang University Quest Lab.

12.2 Distance of an observation froma class

General idea:

• To classify a new record, measure its distance from the center of each class

• Then, classify the record to the closest class

Hanyang University Quest Lab.

Step 1: Measuring Distance

• Euclidean distance between an item and mean of a class

→ 각 predictor들의 단위에 따라 거리값이 달라짐(달러 vs. 천 달러 등)

→ 각 predictor들의 variability를 고려하지 못함(ex. 거리는 A보다 B에 가까워 도 variability가 B가 더 작으면 A로 classify하는 것이 타당할 수도 있음)

→ Euclidean distance는 predictor간 correlation은 고려하지 못함

→ “Statistical distance” (Mahalanobis distance)

12.2 Distance of an observation froma class

2 2 2

1 1 2 2

( , ) ( ) ( ) (

p p

)

DEuclidean x x

=

x

-

x

+

x

-

x

+ L +

x

-

x

(4)

Hanyang University Quest Lab.

Statistical distance (Mahalanobis distance)

where is the covariance matrix.

If  = 1(single predictor), this reduces to z-score.

If  =  (all predictors are non-correlated with each other), then

12.2 Distance of an observation froma class

[ ] [ ]

2 1

1 1

2 2

1

1 1 2 2

( , )

[( ), ( ), , ( )] ,

T

p p

p p

D S

x x x x

x x x x x x S

x x

-

-

= - -

é - ù

ê - ú

ê ú

= - - -

ê ú

ê ú

ê - ú

ë û

L L

Statistical

x x x x x x

Statistical

( , )

Euclidean

( , ).

D x x = D x x

Hanyang University Quest Lab.

12.3 Fisher’s linear classification functions

Idea: to find linear function of measurements that maximizes:

   

   

• 분자값이 클수록 class간 구분이 확실히 되며, 분모값이 작을수록 class 내의 동질성 높음)

• 각 obs.마다 이 function을 사용하여 score를 계산하고 이 값이 가장 큰 class로 assign

• 각 class마다 classification function을 하나씩 생성 Step 2: Classification Functions

(5)

Hanyang University Quest Lab.

12.3 Fisher’s linear classification functions

Record #1: income = $60K, lot size = 18.4K

Owner score = -73.16 + (0.43)(60) + (5.47)(18.4) = 53.2 Non-owner score= -51.42+(0.33)(60)+(4.68)(18.4)= 54.48

“Non-owner” score is higher → classify as non-owner

How do we find these functions?

Hanyang University Quest Lab.

12.3 Fisher’s linear classification functions

(6)

Hanyang University Quest Lab.

12.3 Fisher’s linear classification functions

Step 3:(alternative way to classify) Converting to Probabilities

• It is possible to convert classification scores to probabilities of belonging to a class:

 = ()

()+ ()+ ⋯ + ()

Probability that record  belongs to class 

• The probability is then compared to the cutoff value in order to classify a record

• 장점: 비교없이 lift chart 가능

Hanyang University Quest Lab.

12.3 Fisher’s linear classification functions

.

.+ .= 0.218

(7)

Hanyang University Quest Lab.

12.3 Fisher’s linear classification functions

DA 추측

Linear/Quadratic Discriminant Analysis

(8)

11/2/2014 15

Bayesian Decision Theory

( ) ( ) ( )

( )

| |

j j

j

P C P C

P C

=

x P

x x

Posterior probability from Bayes Formula:

( )

j

Class probabilities

P C

(Prior)

Conditional density of X:

P ( x | C

j

)

( ) ( ) ( )

1

|

n

j j

j

P P C P C

=

= å

x x

Unconditional density of X

11/2/2014 16

Bayesian Classifier

(

1

)

1

(

2

)

2

For a given , if | ( ) | ( ),

then is classified to class 1, otherwise, class 2.

x P C p C P C p C x

x

>

x

Find the maximum posterior probability

(

1

) (

2

)

For a given , if | | ,

then is classified to class 1, otherwise, class 2.

P C > P C

x x x

x

( ) ( ) ( )

( )

| |

i i

i

P C P C P C

=

xP

x x

(9)

11/2/2014 17

Bayesian Classifier

(

1

)

1

(

2

)

2

For a given , if | ( ) | ( ),

then is classified to class 1, otherwise, class 2.

x P C p C P C p C x

>

x x

사전 확률 계산

P

(

C

1) =

n

1 /

N

,

P

(

C

2) =

n

2 /

N

정확한 값이 아니라 추정 (

N

이 커짐에 따라 실제 값에 가까워짐)

우도(likelihood) 계산

훈련 집합에서

C

i에 속하는 샘플들을 가지고

P

(x |

C

i) 추정 Naïve Bayes:

P

(x |

C

i) =

P

(

x

1|

C

i)

P

(

x

2|

C

i) ⋯

P

(

x

p|

C

i) DA에서는?

P

(x |

C

i) = multivariate normal 분포를 가정

11/2/2014 18

Classification Based on Normal Populations

정규 분포 (가우시언 분포) 현실 세계에 맞는 경우 있음

평균과 분산이라는 두 종류의 매개 변수만으로 표현 가능 수학적인 매력

우도가 정규 분포를 따른다는 가정 하에 베이시언 분류기의 특성을 해석

(10)

11/2/2014

Multivariate Normal Distribution

( )

( )

/2 1/2

( )

1

( )

1 1

| exp

2 2

T

j p j j

j

p C

p

é

-

ù

= ê - - - ú

ë Σ û

Σ

x x μ x μ

( )

( )

1

11 1

1

1

, , MVN( , ),

where , , ,

T p

p T

p

p pp

x x

s s

m m

s s

=

é ù

ê ú

= = ê ú

ê ú

ë û

Σ

Σ

L :

L

L M O M

L

x μ

μ

( ) ( )

2

2

In case of 1, 1 exp

2 2

p p x x

m

p s s

é - ù

ê ú

= = -

ê ú

ë û

11/2/2014

Multivariate Normal Distribution

( )

( )

/2 1/2

( )

1

( )

1 1

| exp

2 2

T

j p j j

j

p C

p

é - ù

= ê- - - ú

ë Σ û

Σ

x x μ x μ

(11)

11/2/2014 21

Discriminant Function

( ) ( ) ( ) ( )

( ) ( ) ( )

| | , or

ln | ln

i i i i

i i i

P C p C P C

p C P C

d d

º µ

= +

x x x

x x

( ) , 1, 2, , (where is the number of classes)

i

i K K

d x = L

The classifier will assign an observation x to class C

i

if ( ) ( ) for all

i j

i j

d x > d x ¹

Discriminant function in terms of posterior probability

11/2/2014 22

Discriminant Function for Normal Distribution

( ) ( ) ( )

( ) ( ) ( )

1

( )

ln | ln

1 1

ln 2 ln ln

2 2 2

i i i

T

i i i i i

p C P C

p P C

d

p

-

= +

= - - Σ + - - Σ -

x x

x μ x μ

( )

( )

/2 1/2

( )

1

( )

1 1

| exp

2 2

t

i p i i i

i

p C

p

é

-

ù

= ê - - - ú

ë Σ û

Σ

x x μ x μ

Decision boundary between class k and class r

( ) ( )

d

k

x = d

r

x

(12)

11/2/2014 23

Derivation of Decision Boundary

0 ) (

) (

2 1

|

| ln 2 1 ) 2 ln(

2 ) 2 ( ln

) ( ) ( 2 1

|

| ln 2 1 ) 2 ln(

2 ) 1 ( ln

2 1

2 2 2

1 1 1 1 1

= - S - +

S +

+ -

- S - -

S -

-

- -

m m

p

m m

p

x x

p P

x x

p P

T T

) 2 (

) 1 ln ( 2 ln

) (

) (

) (

) (

2 1 2

1 2 2 1

1 1

1

P

x P x

x

x

T T

-

S

= S - S - - - S

- m

-

m m

-

m

1

( )

2

( ) 0

d x - d x =

24

Decision Boundary When S 1 = S 2

) 2 (

) 1 ln ( 2 ln

) (

) (

) (

) (

2 1 2

1 2 2 1

1 1

1

P

x P x

x

x

T T

-

S

= S - S -

- - S

- m

-

m m

-

m

) 0 2 (

) 1 ln ( ) (

) 2 (

) 1

(

1

-

2

S

-1

-

1

-

2

S

-1 1

-

2

- = P

x

T

P

T

m m m m

m m

( ) ( ) (

1

) (

2

)

2

2 1

2 1

1 2

1 1 1

1 1

1

When 1 úS

û ù êë

é

- + - + - úS û ù êë

é

- + -

= - S

= S

=

S n n

n n

n n

A Decision

Assign x into class 1 if A  + B > 0, class 2 if A  + B < 0.

x + B =0

A  + B = 0: Linear Equation

(13)

• 결정 경계

• 예제

(14)

27

Decision Boundary When S 1 ¹ S 2

) 2 (

) 1 ln ( 2 ln

) (

) (

) (

) (

2 1 2

1 2 2 1

1 1

1

P

x P x

x

x

T T

-

S

= S - S -

- - S

- m

-

m m

-

m

) 0 2 (

) 1 ln ( )

( ) 2 (

1

1

2 2 1 1 1 2

2 1

1

- S + S - S - - =

S

-

- - - -

P T P x x

x

T

m

T

m

T

2

When S

1

¹ S

A

Decision

Assign x into class 1 if Ax

2

+ Bx + C > 0, class 2 if Ax

2

+ Bx + C < 0.

x

2

+ =0

Ax

2

+ Bx + C = 0: Quadratic Equation

x + C

B

(15)

• 예제

11/2/2014 30

Linear Decision Boundary from LDA

(a) Three normal distributions, with the same covariance and different means.

(b) A sample of 30 drawn from each normal distribution, and the fitted LDA decision boundaries.

(a) (b)

(16)

11/2/2014 31

Quadratic Decision Boundary from QDA

11/2/2014 32

QDA vs. LDA

(17)

Hanyang University Quest Lab.

12.4 Classification performance of discriminant analysis

① 각 class에서의 distance측정에서 multivariate normal distribution을 가정함

• 이 가정이 적절한 경우 다른 어떤 classification 방법보다도 효과적 (적은 data로도 동일한 결과를 얻을 수 있다는 뜻)

• predictor들이 non-normal이거나 dummy variable일 지라도 robust

• outlier에는 sensitive하므로 exploratory analysis가 필요 Main assumptions:

Hanyang University Quest Lab.

12.4 Classification performance of discriminant analysis

② Class내에서의 correlation structure가 다른 class에서도 동일하다는 가정

• 각 class에서의 correlation matrix를 구하여 서로 비교하여 check

• Correlation matrix가 class마다 다른 경우 variability가 가장 큰 쪽으로 classify하는 경향

• 대안은 quadratic discriminant analysis를 수행하는 것(if dataset is very large)

Two main assumptions:

(18)

Hanyang University Quest Lab.

12.5 Prior probabilities

③ 이 방법은 각 class들이 향후 특정한 어떤 item을 만나게 될 확률이 동일하 다고 보는 가정을 바탕으로 하고 있음 – 이것이 다를 경우는?

Let = prior probability of membership in class 

각 class의 classification function에log()를 더해줌 (natural logarithm)

Ex. Riding-mower owner의 모비율이 15% (sample에서는 12대 12, 즉 50%이 었음)이면?

→ owner로 classify하는 비율이 낮아지도록 유도해야 함

→ Fig 12.3의 owner classification function의 constant term에 각각 log(0.15), nonowner에 log(0.85)를 추가

Hanyang University Quest Lab.

12.5 Prior probabilities

Record가 income = $60K, lot size = 18.4K 인 경우

Owner score = -73.16 + (0.43)(60) + (5.47)(18.4) + log(0.15)= 51.3 Non-owner score= -51.42+(0.33)(60)+(4.68)(18.4) + log(0.85)= 53.64

(19)

Hanyang University Quest Lab.

12.6 Unequal misclassification costs

④ Misclassification cost가 symmetric하지 않은 경우에는?

Let  = misclassifying cost of class .

• 각 class의 classification function에log()를 더해줌 (prior probability와 동시에 고려한다면log()를 더해주는 셈)

• 각각을 estimate하는 것이 어려우면 그 비율 2/1을 사용. 이 경우에는 class 2의 classification function에만log(2/1)를 더해주면 됨

Hanyang University Quest Lab.

12.7 Classifying more than two classes

119 자원이 일시적으로 부족한데 dispatching 요청이 있을 때 어느 현장 으로 갈 것인가?

Outcome: (3 classes) No injury Non-fatal injury Fatal injury

Predictors: Time of day, day of week, weather, type of road, road surface conditions, …

Example 3: Medical dispatch to accident scene

(20)

Hanyang University Quest Lab.

12.7 Classifying more than two classes

- Applied discriminant analysis to 1,000 records,

- 각 record 마다 3개 classification function, confusion matrix is 3×3 - Compute the classification scores

- Highest classification score인 class에 assign Example 3: Medical dispatch to accident scene

Hanyang University Quest Lab.

12.7 Classifying more than two classes

Example 3: Medical dispatch to accident scene

e30.93/ (e25.94+e31.42+e30.93+)

= 0.38

(21)

Hanyang University Quest Lab.

12.8 Advantages and weaknesses

• DM 방법이라기보다는 statistical classification method

• Very popular in social science

• Use and performance are similar to multiple linear regression.

-> use same least square method

-> normal assumption (predictors are approx. multivariate normal) -> 실제로 normal이 아닌 경우에도 good

-> 지나치게 skew된 variable의 경우에는 log transform을 하면 좋음

• It provides estimates of single-predictor contributions, and is computationally simple, useful for small datasets.

Hanyang University Quest Lab.

(22)

Hanyang University Quest Lab.

• 12.1 (Personal Loan)

• 12.2 (SystemAdministrator)

• 12.3 (detecting spam mail)

Ch. 12 Problems

참조

관련 문서

Lab., Hanyang Univ...

Chapter 15 Analysis of Qualitative Data Part V Communicating with Others Chapter 16 Writing the Research Report and the Politics

(4) Apply RF pulses (electromagnetic waves) for exciting spins in the position of which resonant frequency is same to one of the applied pulses.. (5) Turn off the pulses then

 false path : instruct the synthesis to ignore a particular path for timing optimization.  multicycle path: inform the synthesis tool regarding the number of clock cycles

Chapter 3 proposes the educational values and significance of chalk art through viewing its characteristics in public interest and comparing the example

Through the statistical analysis of marine accident reports, human factors which contribute to accidents is defined as the classification of accident causes

 다중 접근(Multiple Access): 노드나 지국이 다중점 또는 브로드캐스트 링크라고 부르는 공유 링크를 사용할 때 링크 에 접근하는 것을 조율하기 위한 다중 접근

• Using the method of superposition, the separate M/EI diagrams for the redundant reaction B y and the load P are shown in Fig...