Nonparametric Es/ma/on
’ Parametric (single global model), semiparametric (small number of local models)
’ Nonparametric: Similar inputs have similar outputs
’ Func/ons (pdf, discriminant, regression) change smoothly
’ Keep the training data;“let the data speak for itself”
’ Given x, find a small number of closest training instances and interpolate from these
’ Aka lazy/memory-‐based/case-‐based/instance-‐based learning
’ Given the training set X={xt}t drawn iid from p(x)
’ Divide data into bins of size h
’ Histogram es/mator:
’ Naive es/mator:
or
Density Es/ma/on
( ) { }
Nh
x x x
pˆ = # t in the same bin as
3
( ) { }
Nh
h x
x h
x x
p t
2
+
≤
<
= # − ˆ
( ) ( )
⎩⎨
⎧ <
⎟⎟ =
⎠
⎞
⎜⎜⎝
=
∑
⎛ −= otherwise
if
0
1 2
1 1
1
u u h w
x w x
x Nh
p N
t
t /
ˆ
5
Kernel Es/mator (Parzen windows)
( ) ∑
= ⎟⎟
⎠
⎞
⎜⎜⎝
= ⎛ −
N t
t
h x K x
x Nh p
1
ˆ 1
’ Kernel func/on, e.g., Gaussian kernel:
’ Kernel es/mator (Parzen windows)
( ) ⎥
⎦
⎢ ⎤
⎣
⎡ −
= 2 2
1 u
2u
K exp
π
7
’ Instead of fixing bin width h and coun/ng the number of instances, fix the instances (neighbors) k and check bin width
dk(x), distance to kth closest instance to x (h=2dk(x))
k-‐Nearest Neighbor Es/mator
( )
Nd( )
xx k p
2 k
ˆ =
9
’ Kernel density es/mator
Mul/variate Gaussian kernel spheric
ellipsoid
Mul/variate Data
( ) ∑
= ⎟⎟
⎠
⎞
⎜⎜⎝
= ⎛ −
N
t
t
d K h
p Nh
1
1 x x
ˆ x
( )
( )
⎡ ⎤⎥⎥
⎦
⎤
⎢⎢
⎣
⎡−
⎟⎠
⎜ ⎞
⎝
= ⎛
− u u
u u u
1 2
1 1
2 2 1
TS
d
K K
exp π exp
’ Es/mate p(x|Ci) and use Bayes’ rule
’ Kernel es/mator
’ k-‐NN es/mator
Nonparametric Classifica/on
( ) ( )
( ) ( ) ( )
itN
t
t i d
i i
i i it
N
t
t d
i i
h r Nh K
C P C p
g
N C N
P h r
h K C N
p
∑
∑
=
=
⎟⎟⎠
⎞
⎜⎜⎝
= ⎛ −
=
⎟⎟ =
⎠
⎞
⎜⎜⎝
= ⎛ −
1 1
1 1
x x x
x
x x x
| ˆ ˆ
ˆ ˆ |
11
( )
( ) ( ) ( ) ( )
( )
kk p
C P C C p
V P N C k
p k i i i i
i
i = i = =
x x x
x x
ˆ
| ˆ
| ˆ ˆ
ˆ |
Condensed Nearest Neighbor
’ Time/space complexity of k-‐NN is O (N)
’ Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968)
Condensed Nearest Neighbor
’ Incremental algorithm: Add instance if needed
13
Nonparametric Regression
( ) ( )
( ) ( )
⎩⎨
= ⎧
=
∑
∑
=
=
otherwise
with bin
same
the in is if where
0 1
1 1
x x x
x b
x x b
r x x x b
g
t t
N t
t N t
t
t
,
, ˆ ,
’ Aka smoothing models
’ Regressogram: rt = g(xt) + e
15
Running Mean/Kernel Smoother
’ Running mean smoother
’ Running line smoother
’ Kernel smoother
where K( ) is Gaussian
’ Addi/ve models (Has/e and Tibshirani, 1990)
17
( )
( )
⎩⎨
⎧ <
=
⎟⎟⎠
⎞
⎜⎜⎝
⎛ −
⎟⎟⎠
⎞
⎜⎜⎝
⎛ −
=
∑
∑
=
=
otherwise 1 if
where
ˆ
0 1
1 1
u u w
h x w x
h r x w x
x g
N t
t N t
t
t
( )
∑
∑
=
=
⎟⎟⎠
⎜⎜ ⎞
⎝
⎛ −
⎟⎟⎠
⎞
⎜⎜⎝
⎛ −
= N
t
t N t
t
t
h x K x
h r x K x
x g
1
1
ˆ
19
How to Choose k or h?
’ When k or h is small, single instances mader; bias is small, variance is large (undersmoothing): High
complexity
’ As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing):
Low complexity
’ Cross-‐validaCon is used to finetune k or h.
21