Nonparametric Es/ma/on

(1)

(2)

Nonparametric Es/ma/on

’  Parametric (single global model), semiparametric (small number of local models)

’  Nonparametric: Similar inputs have similar outputs

’  Func/ons (pdf, discriminant, regression) change smoothly

’  Keep the training data;“let the data speak for itself”

’  Given x, ﬁnd a small number of closest training instances and interpolate from these

’  Aka lazy/memory-‐based/case-‐based/instance-‐based learning

(3)

’  Given the training set X={x^t}_t drawn iid from p(x)

’  Divide data into bins of size h

’  Histogram es/mator:

’  Naive es/mator:

or

Density Es/ma/on

( ) { }

Nh

x x x

pˆ = # ^t in the same bin as

3

( ) { }

Nh

h x

x h

x x

p ^t

2

+

≤

<

= # − ˆ

( ) ( )

⎩⎨

⎧ <

⎟⎟ =

⎠

⎞

⎜⎜⎝

=

∑

⎛ −

= otherwise

if

0

1 2

1 1

1

u u h w

x w x

x Nh

p ^N

t

t /

ˆ

(4)

(5)

5

(6)

Kernel Es/mator (Parzen windows)

( ) ∑

= ⎟⎟

⎠

⎞

⎜⎜⎝

= ⎛ −

N t

t

h x K x

x Nh p

1

ˆ 1

’  Kernel func/on, e.g., Gaussian kernel:

’  Kernel es/mator (Parzen windows)

( ) _⎥

⎦

⎢ ⎤

⎣

⎡ −

= 2 2

1 u

²

u

K exp

π

(7)

7

(8)

’  Instead of ﬁxing bin width h and coun/ng the number of instances, ﬁx the instances (neighbors) k and check bin width

d_k(x), distance to kth closest instance to x (h=2d_k(x))

k-‐Nearest Neighbor Es/mator

( )

^Nd

( )

^x

x k p

2 k

ˆ =

(9)

9

(10)

’  Kernel density es/mator

Mul/variate Gaussian kernel spheric

ellipsoid

Mul/variate Data

( ) ∑

= ⎟⎟

⎠

⎞

⎜⎜⎝

= ⎛ −

N

t

d K h

p Nh

1

1 x x

ˆ x

( )

^⎡ ^⎤

⎥⎥

⎦

⎤

⎢⎢

⎣

⎡−

⎟⎠

⎜ ⎞

⎝

= ⎛

− u u

u u u

1 2

1 1

2 2 1

TS

d

K K

exp π exp

(11)

’  Es/mate p(x|C_i) and use Bayes’ rule

’  Kernel es/mator

’  k-‐NN es/mator

Nonparametric Classiﬁca/on

( ) ( )

( ) ( ) ( )

i^t

N

t

t i d

i i

i i it

N

t

t d

i i

h r Nh K

C P C p

g

N C N

P h r

h K C N

p

∑

=

⎟⎟⎠

⎞

⎜⎜⎝

= ⎛ −

=

⎟⎟ =

⎠

⎞

⎜⎜⎝

= ⎛ −

1 1

x x x

x

x x x

| ˆ ˆ

ˆ ˆ |

11

( )

( ) ( ) ( ) ( )

( )

^k

k p

C P C C p

V P N C k

p _k _i ⁱ ⁱ ⁱ

i

i = i = =

x x x

x x

ˆ

| ˆ

| ˆ ˆ

ˆ |

(12)

Condensed Nearest Neighbor

’  Time/space complexity of k-‐NN is O (N)

’  Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968)

(13)

Condensed Nearest Neighbor

’  Incremental algorithm: Add instance if needed

13

(14)

Nonparametric Regression

( ) ( )

⎩⎨

= ⎧

=

∑

=

otherwise

with bin

same

the in is if where

0 1

1 1

x x x

x b

x x b

r x x x b

g

t t

N t

t N t

t

,

, ˆ ,

’  Aka smoothing models

’  Regressogram: r^t = g(x^t) + e

(15)

15

(16)

(17)

Running Mean/Kernel Smoother

’  Running mean smoother

’  Running line smoother

’  Kernel smoother

where K( ) is Gaussian

’  Addi/ve models (Has/e and Tibshirani, 1990)

17

( )

⎩⎨

⎧ <

=

⎟⎟⎠

⎞

⎜⎜⎝

⎛ −

⎟⎟⎠

⎞

⎜⎜⎝

⎛ −

=

∑

=

otherwise 1 if

where

ˆ

0 1

1 1

u u w

h x w x

h r x w x

x g

N t

t N t

t

( )

∑

=

⎟⎟⎠

⎜⎜ ⎞

⎝

⎛ −

⎟⎟⎠

⎞

⎜⎜⎝

⎛ −

= N

t

t N t

t

h x K x

h r x K x

x g

1

ˆ

(18)

(19)

19

(20)

(21)

How to Choose k or h?

’  When k or h is small, single instances mader; bias is small, variance is large (undersmoothing): High

complexity

’  As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing):

Low complexity

’  Cross-‐validaCon is used to ﬁnetune k or h.

21

(22)

Nonparametric Es/ma/on