• 검색 결과가 없습니다.

semi-supervised learning SSL

N/A
N/A
Protected

Academic year: 2022

Share "semi-supervised learning SSL"

Copied!
26
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

semi-supervised learning

SSL

Continue

(2)

Outline of Lecture (1)

Graph Spectral Theory

Definition of Graph

Graph Laplacian

Laplacian Smoothing

Graph Node Clustering

Minimum Graph Cut

Ratio Graph Cut

Normalized Graph Cut

Manifold Learning

Spectral Analysis in Riemannian Manifolds

Dimension Reduction, Node Embedding

Semi-supervised Learning (SSL)

Self-Training Methods

SSL with SVM

SSL with Graph using MinCut

Semi-supervised Learning (SSL) : conti.

SSL with Graph using Regularized Harmonic Functions

SSL with Graph using Soft Harmonic Functions

SSL with Graph using Manifold Regularization

SSL with Graph using Laplacian SVMs

SSL with Graph using Max-Margin Graph Cuts

Online SSL and SSL for large graph

Graph Convolution Networks (GCN)

Graph Filtering in GCN

Graph Pooling in GCN

Spectral Filtering in GCN

Spatial Filtering in GCN

(3)

SSL: Classical SVM (Review)

margin 𝒇 𝒙 = 𝒘𝑻𝒙 + 𝑏 < 𝟎

𝒇 𝒙 = 𝒘𝑻𝒙 + 𝑏 > 𝟎 𝒇 𝒙 = 𝒘𝑻𝒙 + 𝑏 = 𝟎

Maximal Margin Hyperplane = Optimal Hyperplane

(4)

SSL: Classical SVM (Review)

max-margin classification: separable case min𝒘,𝑏 𝒘 2

𝒔. 𝒕. 𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 ≥ 1, ∀ 𝑖 = 1, … , 𝑛

max-margin classification: non-separable case min𝒘,𝑏 𝜆 𝒘 2 + σ𝑖 𝜉𝑖

𝒔. 𝒕. 𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 ≥ 1 − 𝜉𝑖, ∀ 𝑖 = 1, … , 𝑛 𝜉𝑖 ≥ 0, ∀ 𝑖 = 1, … , 𝑛

margin 𝒇 𝒙 = 𝒘𝑻𝒙 + 𝑏 < 𝟎

𝒇 𝒙 = 𝒘𝑻𝒙 + 𝑏 > 𝟎 𝒇 𝒙 = 𝒘𝑻𝒙 + 𝑏 = 𝟎

Maximal Margin Hyperplane=Optimal Hyperplane

(5)

SSL: Classical SVM (Review)

max-margin classification: non-separable case min𝒘,𝑏 𝜆 𝒘 2 + σ𝑖 𝜉𝑖

𝒔. 𝒕. 𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 ≥ 1 − 𝜉𝑖, ∀ 𝑖 = 1, … , 𝑛 𝜉𝑖 ≥ 0, ∀ 𝑖 = 1, … , 𝑛

Unconstrained formulation using hinge loss:

min𝒘,𝑏 𝜆 𝒘 2 + σ𝑖 max(1 − 𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 , 0) General formulation:

min 𝜆Ω(𝒇(𝒘, 𝑏)) + ෍ Φ(𝒙𝑖, 𝑦𝑖, 𝒇(𝒘, 𝑏; 𝒙𝑖))

(6)

SSL: Classical SVM (Review)

hinge loss

Φ(𝒙𝑖, 𝑦𝑖, 𝒇(𝒘, 𝑏; 𝒙𝑖)) = max(1 − 𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 , 0)

(7)

SSL: SVM: Unlabeled Examples

Unconstrained formulation using hinge loss:

min𝒘,𝑏 𝜆 𝒘 2 + σ𝑖=1𝑛𝑙 max(1 − 𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 , 0)

How to incorporate unlabeled examples?

Prediction of 𝑓 for (any) 𝒙? 𝑦ො𝑖 = 𝑠𝑔𝑛 𝑓 𝒙𝑖 = 𝑠𝑔𝑛(𝒘𝑇𝒙𝑖 + 𝑏)

Φ(𝒙𝑖, 𝑦𝑖, 𝒇(𝒘, 𝑏; 𝒙𝑖)) = max(1 − 𝑦ො𝑖 𝒘𝑇𝒙𝑖 + 𝑏 , 0)

= max(1 − 𝑠𝑔𝑛(𝒘𝑇𝒙𝑖 + 𝑏) 𝒘𝑇𝒙𝑖 + 𝑏 , 0)

= max(1 − 𝒘𝑇𝒙𝑖 + 𝑏 , 0) hat loss Use 𝑦ො𝑖 instead of 𝑦𝑖

(8)

SSL: SVM: Unlabeled Examples

What is the difference in the objectives?

What does hinge loss penalize?

What does hat loss penalize?

(9)

SSL: SVM: Formulation

Formulation fo SSL via SVM min𝒘,𝑏

𝑖=1 𝑛𝑙

max(1 − 𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 , 0) + 𝜆1 𝒘 2 +𝜆2 σ𝑖=𝑛

𝑙+1 𝑛𝑙+𝑛𝑢

max(1 − 𝒘𝑇𝒙𝑖 + 𝑏 , 0)

 Labelled data term works as a loss to learn the data

 Unlabeled data term works as a regularizer to reduce the effect of noisy data.

 The term 𝒘 works as a regularizer for a large margin.

(10)

semi-supervised learning

with graphs and harmonic functions

SSL(𝒢)

(11)

SSL with Graphs: Prehistory

Blum/Chawla: Learning from Labeled and Unlabeled Data using Graph Mincuts

http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/mincut.pdf

Some insights from vision research in 1980s

(12)

SSL with Graphs: MinCut

MinCut SSL: an idea similar to MinCut clustering

Where is the link?

What is the formal statement? We look for 𝑓 𝑥 ∈ ±1 𝑐𝑢𝑡 = ෍

𝑖,𝑗=1 𝑛𝑙+𝑛𝑢

𝑤𝑖𝑗 𝑓 𝒙𝑖 − 𝑓 𝒙𝑗 2 = 𝒇𝑇𝑳𝒇 = 𝛺(𝑓)

𝑓 𝒙min𝑖 ; 𝒙𝑖∈𝓤 𝛺(𝑓) minimal smoothness for unsupervised clustering What to do for semi-supervised learning?

(13)

SSL with Graphs: using 𝑓 𝜽; 𝒙

𝑖

Inductive SSL with graph: using 𝑓 𝜽; 𝒙𝑖 classifier with parameters 𝜽

min𝜽 𝜆 ෍

𝑖,𝑗=1 𝑛𝑙+𝑛𝑢

𝑤𝑖𝑗 𝑓 𝜽; 𝒙𝑖 − 𝑓 𝜽; 𝒙𝑗 2 + 𝛾 ෍

𝑖 𝑛𝑙

𝑓 𝜽; 𝒙𝑖 − 𝑦𝑖 2

General Formulation

Regularization: Laplacian smoothing 𝛺 {𝑓 𝜽; 𝒙𝑖 }𝑖=1𝑛𝑙+𝑛𝑢 = ෍

𝑖,𝑗=1 𝑛𝑙+𝑛𝑢

𝑤𝑖𝑗 𝑓 𝜽; 𝒙𝑖 − 𝑓 𝜽; 𝒙𝑗 2 = 𝒇𝑇𝑳𝒇 Loss:

Φ 𝑓 𝜽; 𝒙𝑖 , 𝑦𝑖 = 𝑓 𝜽; 𝒙𝑖 − 𝑦𝑖 2 ∀ 𝑖 ∈ 1, … , 𝑛𝑙

(14)

SSL with Graphs: Harmonic Functions

Transductive SSL with graph: fixing 𝑓 𝒙𝑖 for 𝑖 ∈ ℒ

𝒇∈{±𝟏}min𝑛𝑙+𝑛𝑢 𝜆 ෍

𝑖,𝑗=1 𝑛𝑙+𝑛𝑢

𝑤𝑖𝑗 𝑓 𝒙𝑖 − 𝑓 𝒙𝑗 2 + ∞ ෍

𝑖 𝑛𝑙

𝑓 𝒙𝑖 − 𝑦𝑖 2

Solution:

An integer program: NP hard

Can we use eigenvectors? No. Why?

We need a better way to reflect the confidence.

(15)

SSL with Graphs: Harmonic Functions

Relaxation: Transductive SSL with graph: fixing 𝑓 𝒙𝑖 for 𝑖 ∈ ℒ

𝒇∈𝑹min𝑛𝑙+𝑛𝑢 𝜆 ෍

𝑖,𝑗=1 𝑛𝑙+𝑛𝑢

𝑤𝑖𝑗 𝑓 𝒙𝑖 − 𝑓 𝒙𝑗 2 + ∞෍

𝑖 𝑛𝑙

𝑓 𝒙𝑖 − 𝑦𝑖 2

Naïve Solution

Right term solution: constrain 𝑓 to match the supervised data 𝑓 𝒙𝑖 = 𝑦𝑖 ∀ 𝑖 ∈ {1, … , 𝑛𝑙}

Left term solution: enforce the solution 𝑓 to be harmonic (cf. aggregation, rw) 𝑓 𝒙𝑖 = σ𝑖𝑗 𝑓(𝒙𝑗)𝑤𝑖𝑗

σ𝑖𝑗 𝑤𝑖𝑗 ∀ 𝑖 ∈ {𝑛𝑙 + 1, … , 𝑛𝑙+𝑛𝑢} How can we handle unlabeled data?

(16)

SSL with Graphs: Harmonic Functions

Properties of the relaxation from ±𝟏 to ℝ

 There is a closed form solution for 𝑓

 this solution is unique

 globally optimal

 𝑓(𝒙𝑖) may not be integer

 but we can threshold it

 electric-network interpretation

 random-walk interpretation

(17)

SSL with Graphs: Harmonic Functions

Random walk interpretation :

1) start from the vertex you want to label and randomly walk 2) 𝑃 𝑗 𝑖 = 𝑤𝑖𝑗

σ𝑘 𝑤𝑖𝑘 ⟺ 𝑷 = 𝑫−𝟏𝑾

3) finish when a labeled vertex is hit 𝑓 𝒙𝑖 = σ𝑖𝑗 𝑓(𝒙𝑗)𝑤𝑖𝑗

σ𝑖𝑗𝑤𝑖𝑗

4) 𝑓(𝒙𝑖) is assigned by average of the labels of the hit vertices.

(18)

SSL with Graphs: Harmonic Functions

Iterative Solution: propagation

Step 1: Set 𝑓 to match the supervised data 𝑓 𝒙𝑖 = 𝑦𝑖 ∀ 𝑖 ∈ {1, … , 𝑛𝑙}

Step 2: Propagate iteratively (only for unlabeled) 𝑓 𝒙𝑖σ𝑖𝑗 𝑓(𝒙𝑗)𝑤𝑖𝑗

σ𝑖𝑗𝑤𝑖𝑗 ∀ 𝑖 ∈ {𝑛𝑙 + 1, … , 𝑛𝑙+𝑛𝑢} Properties:

 this will converge to the harmonic solution

 we can set the initial values for unlabeled nodes arbitrarily

 an interesting option for large-scale data

(19)

SSL with Graphs: Harmonic Functions

Closed form Solution:

Define 𝑓𝑖 ≜ 𝑓 𝒙𝑖 , 𝒇 ≜ [𝑓1, … , 𝑓𝑖, … , 𝑓𝑛𝑙+𝑛𝑢] 𝛺 𝒇 = ෍

𝑖,𝑗=1 𝑛𝑙+𝑛𝑢

𝑤𝑖𝑗 𝑓𝑖 − 𝑓𝑗 2 = 𝒇𝑇𝑳𝒇 Then, 𝑳 is a (𝑛𝑙+𝑛𝑢) × (𝑛𝑙+𝑛𝑢) matrix:

𝑳 = 𝑳𝑙𝑙 𝑳𝑙𝑢 𝑳𝑢𝑙 𝑳𝑢𝑢 The problem becomes

min𝒇 𝛺 𝒇 (= 𝒇𝑙𝑇𝑳𝑙𝑙𝒇𝑙 + 𝒇𝑙𝑇𝑳𝑙𝑢𝒇𝑢 + 𝒇𝑢𝑇𝑳𝑢𝑙𝒇𝑙 + 𝒇𝑢𝑇𝑳𝑢𝑢𝒇𝑢) The solution can be obtained by

𝛻𝒇𝑢𝛺 𝒇 = 2𝑳𝑢𝑙𝒇𝑙 + 2𝑳𝑢𝑢𝒇𝑢 = 0

⟹ 𝒇𝑢 = 𝑳𝑢𝑢−1 −𝑳𝑢𝑙𝒇𝑙 = 𝑳𝑢𝑢−1(𝑾𝑢𝑙𝒇𝑙)

(20)

SSL with Graphs: Harmonic Functions

Relation between closed form solution and random work 𝒇𝑢 = 𝑳𝑢𝑢−1 𝑾𝑢𝑙𝒇𝑙 , 𝑷 = 𝑫−1𝑾

Note that

𝑷 = 𝑫−1𝑾 → 𝑳 = 𝑫 − 𝑾 = 𝑫 𝐈 − 𝑫−1𝑾 = 𝑫 𝐈 − 𝑷 This yields

𝒇𝑢 = 𝐈 −𝑷𝒖𝑢 −1𝑫𝑢𝑢−1 𝑫𝑢𝑢𝑷𝑢𝑙𝒇𝑙 = 𝐈 −𝑷𝒖𝑢 −1𝑷𝑢𝑙𝒇𝑙 For 𝑖 ∈ 𝒰,

𝑓𝑖= 𝐈 −𝑷𝑢𝑢 𝑖𝑢−1𝑷𝑢𝑙𝒇𝑙

= σ𝑗:𝑦

𝑗=1 𝐈 −𝑷𝑢𝑢 𝑖𝑢−1𝑷𝑢𝑗 − σ𝑗:𝑦

𝑗=−1 𝐈 −𝑷𝑢𝑢 𝑖𝑢−1𝑷𝑢𝑗

= 𝑝𝑖(+1) − 𝑝𝑖−1

(21)

SSL with Graphs: Regularized Harmonic Functions

For 𝑖 ∈ 𝒰,

𝑓𝑖= 𝑝𝑖(+1) − 𝑝𝑖−1 ⟹ 𝑓𝑖= 𝑓𝑖 𝑠𝑔𝑛 𝑓𝑖 = 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 × 𝑙𝑎𝑏𝑒𝑙 What happens if an outlier sneaks in?

⟹ The prediction for the outlier can mislead the confidence How to control the confidence of the inference?

⟹ Allow the random walk to die!

⟹ We add a sink to the graph, where sink = artificial node with label 0.

We connect the sink to every other vertex.

What will the sink do in the predictions?

(22)

SSL with Graphs: Regularized Harmonic Functions

Regularized Harmonic solution on the graph with sink

𝛻𝒇𝑢𝛺 𝒇 = 𝟎, 𝛺 𝒇 = 𝒇𝑇𝑳𝒇 𝑳𝑙𝑙 + 𝛾𝑔𝐈𝑛𝑙 𝑳𝑙𝑢 −𝛾𝑔𝟏𝑛𝑙×1

𝑳𝑢𝑙 𝑳𝑢𝑢 + 𝛾𝑔𝐈𝑛𝑢 −𝛾𝑔𝟏𝑛𝑢×1

−𝛾𝑔𝟏1×𝑛𝑙 −𝛾𝑔𝟏1×𝑛𝑢 (𝑛𝑙 + 𝑛𝑢)𝛾𝑔

𝒇𝑙 𝒇𝑢

0

=

⋮ 𝟎𝑢

⋮ We can disregard the last column and row:

𝑳𝑙𝑙 + 𝛾𝑔𝐈𝑛𝑙 𝑳𝑙𝑢

𝑳𝑢𝑙 𝑳𝑢𝑢 + 𝛾𝑔𝐈𝑛𝑢

𝒇𝑙

𝒇𝑢 = ⋮ 𝟎𝑢

⟹ 𝑳 𝒇 + 𝑳 + 𝛾 𝐈 𝒇 = 𝟎

(23)

SSL with Graphs: Regularized Harmonic Functions

How do we compute this regularized random walk?

𝒇𝑢 = (𝑳𝑢𝑢 + 𝛾𝑔𝐈)−1 𝑾𝑢𝑙𝒇𝑙 , How does 𝛾𝑔 influence the solution?

What happens to sneaky outliers?

(24)

SSL with Graphs: Soft Harmonic Functions

Regularized HS objective with 𝑸 = 𝑳 + 𝛾𝑔𝐈 : Define 𝑓𝑖 ≜ 𝑓 𝒙𝑖 , 𝒇 ≜ [𝑓𝑖, … , 𝑓𝑛𝑙+𝑛𝑢]

𝒇∈ℝmin𝑛𝑙+𝑛𝑢 ∞ ෍

𝑖=1 𝑛𝑙

𝑓𝑖 − 𝑦𝑖 2 + 𝜆 𝒇𝑇𝑸𝒇

Soft constraints for 𝑓 𝒙𝑖 = 𝑦𝑖, ∀𝑖 ∈ ℒ: ∞ is replaced by finite values

𝒇∈ℝmin𝑛𝑙+𝑛𝑢 𝒇 − 𝒚 𝑇𝑪(𝒇 − 𝒚) + 𝒇𝑇𝑸𝒇

𝑪 is diagonal with 𝐶𝑖𝑖 = ቊ𝐶𝑙 𝑓𝑜𝑟 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝐶𝑢 𝑓𝑜𝑟 𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠

𝒚 indicates pseudo-targets with 𝑦𝑖 = ቊ𝑡𝑟𝑢𝑒 𝑙𝑎𝑏𝑒𝑙 𝑓𝑜𝑟 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 0 𝑓𝑜𝑟 𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠

(25)

SSL with Graphs: Soft Harmonic Functions

Closed form Soft Harmonic solution:

𝒇 = min

𝒇∈ℝ𝑛𝑙+𝑛𝑢[ 𝒇 − 𝒚 𝑇𝑪 𝒇 − 𝒚 + 𝒇𝑇𝑸𝒇] → 𝒇 = (𝑪−1𝑸 + 𝐈)−1𝒚

What are the differences between hard and soft?

Not much different in practice.

Noisy labels may be smoothed by soft harmonic.

Generalization is improved by the sink node

(26)

Summary questions on the lecture

 What does hinge loss in SVM penalize?

 What does hat loss in SVM for semi-supervised learning penalize?

 Why can’t we use eigenvectors to solve MinCut-based SSL in graph?

 What is the meaning of harmonic function in SSL?

 What is random work interpretation of harmonic function-based SSL in graph?

 What is a key point of regularized harmonic function-based SSL in graph?

 What is a key point of soft harmonic function-based SSL in graph?

참조

관련 문서

그림 6.4는 저기압의 발달 단계에의 전선, 구름 분포, 그리고 저기압 시스 템에 대해서 상대적으로 움직이는 온난 수송대와 한랭 수송대 그리고 건조

북동류형 호우 시스템에서 순전(Veering)은 지상에서 대류권계면까지 나타난다.. 따라서 온대저기압의 발달과 경로를

구름이 만들어지고 강수과정이 나타나는 지구 대류권에서 기화열은 융해열보다 중요하게 다루어진다... 지표부근 대기에서는 등온선과

불안정한 대기에서는 단열 냉각이 상승 운동에 미치는 효과 또는 단열 승 온이 하강 운동에 미치는 효과가 감소되기 때문에, 불안정은 오메가(상승 과 하강

따라서, 고도가 높은 단일층 구름에서 대류권 냉각이 가장 적게 일어나는 반면, 고도가 낮은 단일층 구름에서 대류권 냉각이 가장 크게 일어난다.... 나머지 4%는

여름철 집중호우를 야기하는 대표적인 형태는 중규모 대류계(Mesoscale Convective Systems, MCSs)이다... 하층제트는

빙정의 균질 핵생성은 빙정핵 또는 응결핵을 포함하지 않은 순수한 과냉각 수적의 동결이나 온도가 0°C 이하인 수증기 가 과포화 상태에서 빙정핵이 없이 수증기 분자들 간의

Johnson, 1979: The coupling of upper and lower tropospheric jet streaks and implications for the development of severe convective storms...