전체 글

(1)

J. Y. Choi. SNU

배 움

1

(2)

배움의 자세

 자신이 익힌 지식에 집착함으로 인하여, 우월감과 타인에 대한 멸시함이 자라나지 않도록 한다.

 지식을 과시하고자 하는 충동으로 인하여 무분별한 지식 습득과 사념에 빠지지 않도록 한다.

 자신의 무지와 몰이해에 대해 자책하거나, 일깨움이 더딘 것을 탓하지 말 고 그저 부지런히 행함을 놓치지 않는다.

 언제나 자신에게 솔직할 것이며, 나눌 준비와 겸허한 태도로 자신을 있는 그대로 꾸밈없이 표현하도록 한다.

 매일 일상에서 이루어지는 순간순간의 과정이 그대를 이끌 것이니 미래를 걱정하며 계획에 연연해 하지 않는다.

(3)

J. Y. Choi. SNU

Summary Questions of the last lecture

3

What are the two options for out of sample extension in SSL?

→ One is to add the new sample to the graph and re-compute HF solution transductively and the other is to make the algorithms inductive.

Why do we have to make a classifier be smooth in inductive SSL for out of sample extension?

→ The smoothness deal with noisy samples by providing reasonable interpolation for new samples.

(4)

Summary Questions of the last lecture

What is the meaning of manifold regularization?

→ It enforces the classifier function to yield a reasonably interpolated value for any sample which is not given in the training data.

(5)

J. Y. Choi. SNU

Summary Questions of the last lecture

5

What is the key idea of Max-Margin Graph Cuts for SSL?

→ The self-training for SSL is done by using the confidently predicted labels obtained by regularized HF solution of SSL.

(6)

Summary Questions of the last lecture

What are the two options for online SSL?

→ One is to add the new sample to the graph and re-compute HF solution transductively and the other is to make the algorithms inductive. The key issue of the former option is to keep the computation cost and memory within a reasonable level even for continuous increment of unlabeled samples.

(7)

J. Y. Choi. SNU

Summary Questions of the last lecture

7

What is the key idea for keeping # of representative nodes?

→ Determine the minimum distance (R) between any two centroids and each new sample within a ball centered by a centroid and with radius R is

added to the cluster. If there are samples that do not belong to any cluster, R is doubled.

(8)

Outline of Lecture (1)

Graph Spectral Theory

Definition of Graph

Graph Laplacian

Laplacian Smoothing

Graph Node Clustering

Minimum Graph Cut

Ratio Graph Cut

Normalized Graph Cut

Manifold Learning

Spectral Analysis in Riemannian Manifolds

Dimension Reduction, Node Embedding

Semi-supervised Learning (SSL)

Self-Training Methods

SSL with SVM

SSL with Graph using MinCut

SSL with Graph using Harmonic Functions

Semi-supervised Learning (SSL) : conti.

SSL with Graph using Regularized Harmonic Functions

SSL with Graph using Soft Harmonic Functions

SSL with Graph using Manifold

Regularization (out of sample extension)

SSL with Graph using Laplacian SVMs

SSL with Graph using Max-Margin Graph Cuts

Online SSL

SSL for large graph

Graph Convolution Networks (GCN)

Graph Filtering in GCN

Graph Pooling in GCN

Spectral Filtering in GCN

Spatial Filtering in GCN

Recent GCN papers

(9)

J. Y. Choi. SNU

when 𝓖 does not fit to memory

9

SSL, Huge 𝒢

(10)

Scaling SSL with Graphs to Millions

Soft HS for SSL

𝒇 = argmin

𝒇∈ℝ𝑛𝑙+𝑛𝑢

𝒇 − 𝒚 𝑇𝑪 𝒇 − 𝒚 + 𝒇𝑇𝑳𝒇

Using eigenbasis of 𝑳 = 𝑼𝚲𝑼𝑻 and 𝒇 = 𝑼𝜶 𝜶 = argmin

𝜶∈ℝ𝑛𝑙+𝑛𝑢

𝑼𝜶 − 𝒚 𝑇𝑪 𝑼𝜶 − 𝒚 + 𝜶𝑇𝚲𝜶

What is the problem with scalability?

What

Let’s take

can we do?

only first 𝑘 eigenvectors in 𝒇 = ഥ𝑼ഥ𝜶!

(11)

J. Y. Choi. SNU

Scaling SSL with Graphs to Millions

11

𝑼 is a 𝑁 × 𝑘 matrix, ഥഥ 𝜶 ∈ ℝ𝑘, 𝑘 ≪ 𝑁 𝜶ഥ = argmin

𝜶∈ℝ 𝑘

𝑼ഥഥ𝜶 − 𝒚 𝑇𝑪 ഥ𝑼ഥ𝜶 − 𝒚 + ഥ𝜶𝑇𝚲ഥഥ𝜶

Closed form solution is

𝜶ഥ = (ഥ𝚲 + ഥ𝑼𝑇𝑪ഥ𝑼)−1𝑼ഥ𝑇𝑪𝒚

What is the size of this system of equation now?

It requires 𝑘 × 𝑘 𝑚𝑎𝑡𝑟𝑖𝑥 𝑖𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛

https://cs.nyu.edu/~fergus/papers/fwt_ssl.pdf

(12)

Scaling SSL with Graphs to Millions

Investigation on eigenbasis depending on 𝑵

https://cs.nyu.edu/~fergus/papers/fwt_ssl.pdf

𝑵 𝑵

(13)

J. Y. Choi. SNU

Scaling SSL with Graphs to Millions

13

What happens to 𝑳 when 𝑁 → ∞?

We have data 𝒙𝑖 ∈ ℝ𝑑 sampled from 𝑝(𝒙).

When 𝑁 → ∞, instead of vectors 𝒇, we consider functions 𝐹(𝒙).

Instead of 𝑳, we define ℒ𝑝 − weighted smoothness operator ℒ𝑝 𝐹 = 𝟏

𝟐׬ 𝐹 𝒙1 − 𝐹 𝒙2 𝟐 𝑊 𝒙1, 𝒙2 𝑝 𝒙1 𝑝 𝒙2 𝑑𝒙1𝑑𝒙2 where 𝑊 𝒙1, 𝒙2 = exp − 𝒙1 − 𝒙2 2 /(2𝜎2)

As 𝑁 → ∞, 1

𝑁2 𝒇𝑻𝑳𝒇 = 1

2𝑁2 σ𝑖,𝑗 𝑊𝑖𝑗 𝑓𝑖 − 𝑓𝑗 2 → ℒ𝑝 𝐹

(14)

Scaling SSL with Graphs to Millions

𝑳 defines the eigenvectors for increasing smoothness.

As 𝑁 → ∞, 1

𝑁2 𝒇𝑻𝑳𝒇 = 1

2𝑁2 σ𝑖,𝑗 𝑊𝑖𝑗 𝑓𝑖 − 𝑓𝑗 2 → ℒ𝑝 𝐹

What does ℒ𝑝 define?

𝑝 𝐹 = 𝟏

𝟐න 𝐹 𝒙1 − 𝐹 𝒙2 𝟐 𝑊 𝒙1, 𝒙2 𝑝 𝒙1 𝑝 𝒙2 𝑑𝒙1𝑑𝒙2

→ Eigenfunctions!

(15)

J. Y. Choi. SNU

Scaling SSL with Graphs to Millions

15

𝑝 𝐹 = 𝟏

𝟐න 𝐹 𝒙1 − 𝐹 𝒙2 𝟐 𝑊 𝒙1, 𝒙2 𝑝 𝒙1 𝑝 𝒙2 𝑑𝒙1𝑑𝒙2 First eigenfunction: 𝜙1 𝒙 = min

𝐹:׬ 𝐹𝟐 𝒙 𝑝 𝒙 𝐷 𝒙 𝑑𝒙=1𝑝 𝐹 where 𝐷 𝒙 = ׬ 𝑊(𝒙, 𝒙2)𝑝 𝒙2 𝑑𝒙2, lim

𝑁→∞

1

𝑁 σ𝑖 𝑓𝑖2𝐷 𝑖, 𝑖 = ׬ 𝐹𝟐 𝒙 𝑝 𝒙 𝐷 𝒙 𝑑𝒙

What is the solution?

How to define 𝜙2 𝒙 ?

𝜙1 𝒙 = 1 because ℒ𝑝 1 = 0

same, constraining to be orthogonal to 𝜙1 𝒙

׬ 𝐹(𝒙)𝜙1 𝒙 𝑝 𝒙 𝐷 𝒙 𝑑𝒙 = 0

(16)

Scaling SSL with Graphs to Millions

Eeigenfunctions of ℒ𝑝

𝜙3 𝒙 as before, orthogonal to 𝜙1 𝒙 and 𝜙2 𝒙 etc.

How to define eigenvalues?

Relationship to the discrete Laplacian

1

𝑁2 𝒇𝑇𝑳𝒇 = 1

2𝑁2 σ𝑖,𝑗 𝑤𝑖𝑗 𝑓𝑖 − 𝑓𝑗 2

𝑁→∞𝑝(𝐹) Can we compute eigenfunctions from 𝑝 𝒙 numerically?

𝜆𝑘 = ℒ𝑝 𝜙𝑘

(17)

J. Y. Choi. SNU

Scaling SSL with Graphs to Millions

17

(18)

Scaling SSL with Graphs to Millions

https://cs.nyu.edu/~fergus/papers/fwt_ssl.pdf

Factorized data distribution:

𝑝 𝒔 = 𝑝(𝑠1)𝑝(𝑠2) ∙∙∙ 𝑝(𝑠𝑑)

In general, this is not true. But we can rotate data with 𝒔 = 𝑹𝒙.

Treating each factor individually 𝑝𝑘 ≜ marginal distribution of 𝑠𝑘

𝜙𝑖(𝑠𝑘) ≜ eigenfunction of ℒ𝑝𝑘 with eigenvalue 𝜆𝑖

Then: 𝜙𝑖 𝒔 = 𝜙𝑖(𝑠𝑘) is eigenfunction of ℒ𝑝 with 𝜆𝑖 (proof)

(19)

J. Y. Choi. SNU

Scaling SSL with Graphs to Millions

19

https://cs.nyu.edu/~fergus/papers/fwt_ssl.pdf

Graph Quantization 𝑾𝑞 = 𝑽෪𝑾𝑞𝑽

How to approximate 1𝐷 density? Histograms!

Algorithm in [fergus2009semi-supervised] for eigenfunctions

 Find 𝑹 such that 𝒔 = 𝑹𝒙

 For each “independent” 𝑠𝑘 approximate 𝑝(𝑠𝑘)

 Given 𝑝(𝑠𝑘) numerically solve for eigensystem of ℒ𝑝𝑘 𝑫 − 𝑷෪෩ 𝑾𝑷 𝑔 = 𝜆 ෩𝑫𝑔

 𝑔 − vector of length 𝐵 ≡ number of bins

 𝑷 − diagonal elements denote density values at discrete points

 ෩𝑫 − diagonal sum of the columns of 𝑷෪𝑾𝑷

 Order eigenfunctions by increasing eigenvalues

(20)

Scaling SSL with Graphs to Millions

Numerical 𝟏𝑫 Eigenfunctions

𝜙2(𝑠1) 𝜙3(𝑠2) 𝜙1 𝑠 = 1

(21)

J. Y. Choi. SNU

Scaling SSL with Graphs to Millions

21

Computational complexity for 𝑁 × 𝑑 dataset Typical harmonic approach

one diagonalization of 𝑁 × 𝑁 system (too complex)

Numerical eigenfunctions with 𝐵 bins and 𝑘 eigenvectors 𝑫 − 𝑷෪෩ 𝑾𝑷 𝑔 = 𝜆 ෩𝑫𝑔

Laplacian smoothing problem: 𝑘 × 𝑘 least squares solution (𝚲 + ഥ𝑼𝑇𝑪ഥ𝑼) ഥ𝜶 = ഥ𝑼𝑇𝑪𝒚

Final solution:

𝒇 = ഥ𝑼ഥ𝜶

(22)

Scaling SSL with Graphs to Millions

Tiny Image CIFAR label set

(23)

J. Y. Choi. SNU

SSL without Graph (remind)

23

SVM formulation using hinge loss:

min𝒘,𝑏 𝜆 𝒘 2 + σ𝑖=1𝑛𝑙 max(1 − 𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 , 0)

Formulation for Transductive SVM min𝒘,𝑏

𝑖=1 𝑛𝑙

max(1 − 𝑦𝑖 𝒘𝑇𝒙𝑖 + 𝑏 , 0) + 𝜆1 𝒘 2 +𝜆2 σ𝑖=𝑛

𝑙+1 𝑛𝑙+𝑛𝑢

max(1 − 𝒘𝑇𝒙𝑖 + 𝑏 , 0)

(24)

SSL with Graphs (remind)

Inductive SSL with graph: using 𝑓 𝜽; 𝒙𝑖 classifier with parameters 𝜽

min𝜽 𝜆 ෍

𝑖,𝑗=1 𝑛𝑙+𝑛𝑢

𝑤𝑖𝑗 𝑓 𝜽; 𝒙𝑖 − 𝑓 𝜽; 𝒙𝑗 2 + 𝛾 ෍

𝑖 𝑛𝑙

𝑓 𝜽; 𝒙𝑖 − 𝑦𝑖 2 Transductive SSL with graph: fixing 𝑓 𝒙𝑖 for 𝑖 ∈ ℒ

𝒇∈{±𝟏}min𝑛𝑙+𝑛𝑢 𝜆 ෍

𝑖,𝑗=1 𝑛𝑙+𝑛𝑢

𝑤𝑖𝑗 𝑓 𝒙𝑖 − 𝑓 𝒙𝑗 2 + ∞෍

𝑖 𝑛𝑙

𝑓 𝒙𝑖 − 𝑦𝑖 2

𝒇 ∈ 𝑹𝑛𝑙+𝑛𝑢

(25)

J. Y. Choi. SNU

SSL with Graphs (remind)

25

Transductive SSL with graph: fixing 𝑓 𝒙𝑖 for 𝑖 ∈ ℒ

𝒇∈𝑹min𝑛𝑙+𝑛𝑢 𝜆 ෍

𝑖,𝑗=1 𝑛𝑙+𝑛𝑢

𝑤𝑖𝑗 𝑓 𝒙𝑖 − 𝑓 𝒙𝑗 2 + ∞ ෍

𝑖 𝑛𝑙

𝑓 𝒙𝑖 − 𝑦𝑖 2

𝒇𝑙 = 𝒚𝑙 Ω 𝒇 = 𝒇𝑇𝑳𝒇

𝛻𝒇𝑢𝛺 𝒇 = 0

⟹ 𝒇𝑢 = 𝑳𝑢𝑢−1(𝑾𝑢𝑙𝒇𝑙) cf. random work HS: harmonic solution:

⟹ 𝒇𝑢= (𝑳𝑢𝑢 + 𝛾𝑔𝐈)−1 𝑾𝑢𝑙𝒇𝑙 Regularized HS:

sink node weight

robust to outliers

(26)

SSL with Graphs (remind)

Regularized Soft HS 𝒇 = argmin

𝒇∈ℝ𝑛𝑙+𝑛𝑢

𝒇 − 𝒚 𝑇𝑪 𝒇 − 𝒚 + 𝒇𝑇𝑸𝒇 = (𝑪−1𝑸 + 𝐈)−1𝒚 , 𝑸 = 𝑳 + 𝛾𝑔𝐈 , 𝒚𝒖 = 𝟎

Out of sample extension: Laplacian SVM with kernels (Inductive)

𝑓∈ℋmin𝒦

𝑖 𝑛𝑙

max(0, 1 − 𝑦𝑖𝑓 𝑥𝑖 ) + 𝜆𝐴 𝑓 𝒦2 + 𝜆𝑙𝒇𝑇𝑳𝒇 manifold regularization

𝑓 𝒙 = ෍

𝑛𝑙+𝑛𝑢

𝛼𝒦(𝒙, 𝒙 )

(27)

J. Y. Choi. SNU

SSL with Graphs (remind)

27

Self-training with the confident data (MM Graph Cuts) 𝑓 = 𝑎𝑟𝑔min

𝑓∈ℋ𝒦

𝑖: 𝑙𝑖 ≥𝜀

Φ(𝒙𝑖, 𝑠𝑔𝑛 ℓ𝑖 , 𝑓(𝒙𝑖)) + 𝜆 𝑓 𝒦2 s. t. ℓ = 𝑎𝑟𝑔min

ℓ∈ℝ𝑁

𝑇 (𝑳 + 𝛾𝑔𝐈) ℓ

s. t. ℓ𝑖 = 𝑦𝑖, ∀ 𝑖 = 1, … , 𝑛𝑙

Representer theorem is still cool:

𝑓 𝒙 = ෍

𝑖: 𝑙𝑖 ≥𝜀

𝛼𝑖𝒦(𝒙, 𝒙𝑖)

(28)

SSL with Graphs (remind)

Online SSL

(29)

J. Y. Choi. SNU

Scaling SSL with Graphs to Millions (remind)

29

Computational complexity for 𝑁 × 𝑑 dataset Typical harmonic approach

one diagonalization of 𝑁 × 𝑁 system (too complex)

Numerical eigenfunctions with 𝐵 bins and 𝑘 eigenvectors 𝑫 − 𝑷෪෩ 𝑾𝑷 𝑔 = 𝜆 ෩𝑫𝑔

Laplacian smoothing problem: 𝑘 × 𝑘 least squares solution (𝚲 + ഥ𝑼𝑇𝑪ഥ𝑼) ഥ𝜶 = ഥ𝑼𝑇𝑪𝒚

Final solution:

𝒇 = ഥ𝑼ഥ𝜶

(30)

SSL with Graphs:

Graph-based SSL is obviously sensitive to graph construction!

(31)

J. Y. Choi. SNU

Summary Questions of the lecture

31

 What is the key idea to mitigate the scalability problem of SSL for millions of data?

 What happens to 𝑳 when 𝑁 → ∞?

 What does the weighted smoothness operator in Hilbert space define, instead of eigenvectors of Laplacian 𝐿?

 How can we obtain numerical eigenfunctions with sampled large dada?

 How can we determine eigenvectors from the numerical eigenfunctions?

수치

Updating...

참조

관련 주제 :