Summary Questions of the lecture

(1)

Summary Questions of the lecture

 Explain the spectral filtering of a graph signal

 Decompose the graph signal 𝒇 with the Fourier Transform, and multiply the

filter 𝑔 to filter the transformed signal in the spectral domain. Then, reconstructො the filtered signal in the spatial/temporal domain with the inverse Fourier

Transform.

(2)

Summary Questions of the lecture

 How to design the spectral filter for GCN?

 The spectral filters are learned in a data-driven way. For end-to-end learning, such filters are parametrized.

(3)

Summary Questions of the lecture

 What is the channel in a graph signal?

 Letting 𝑁 and 𝑑₁ denote the number of nodes and the feature dimension in each node respectively, the graph signal is represented by a 𝑁 ×

𝑑₁ matrix. Then, each column of the matrix refers to ‘channel’.

(4)

Summary Questions of the lecture

 How to deal with multi-channel signals in GCN?

 To create the 𝑛'th output channel, we transform each input channel with a different filter and add up the resulting signals. Thus, we need to learn 𝑑₁ × 𝑑₂ many filters .

(5)

Geometric deep learning via graph convolution … (continue)

GCN (𝒢)

𝒉_𝒊^(𝒍+𝟏) = ෍

𝒗_𝒋∈𝑵(𝒗_𝒊)

𝒇 𝒙_𝒊, 𝒘_𝒊𝒋, 𝒉_𝒋^(𝒍), 𝒙_𝒋

𝑥₃, ℎ₃^𝑙

𝑣₁

𝑣₃ 𝑣₄

𝑥₁, ℎ₁^𝑙

𝑥₄, ℎ₄^𝑙

𝑥₅, ℎ₅^𝑙 𝑥 , ℎ^𝑙

(6)

Outline of Lecture (3)

 Graph Convolution Networks (GCN)

 What are issues on GCN

 Graph Filtering in GCN

 Graph Pooling in GCN

 Original GNN (Scarselli et al. 2005)

 Spectral GCN

 Spectral Filtering

 Graph Spectral Filtering in GCN

 Spectral Graph CNN (Bruna et al. ICLR 2014)

 ChebNet (Defferard et al. NIPS 2016)

 Simplified ChebNet (Kipf & Welling, ICLR 2017)

 Spatial GCN

 GraphSage (Hamilton et al. NIPS 2017)

 GCN (Kipf & Welling. ICLR 2017)

 GAT (Veličković et al. ICLR 2018)

 MPNN (Glimer et al. ICML 2017)

 Link Analysis

 PageRank

 Diffusion

 Propagation using graph diffusion

 Predict Then Propagate [ICLR’19]

 Graph Diffusion-Embedding Networks [CVPR’19]

 Making a new graph

 Diffusion Improves Graph Learning [NIPS’19]

 Graph Learning-Convolutional Nets. [CVPR’19]

(7)

GCN: Types of Operations for Graph Filtering

Spatial filtering Spectral filtering

Original GNN (Scarselli et al.

2005)

Simp_ChebNet (Kipf & Welling.

ICLR 2017) (VeličkovićGAT et al.

ICLR 2018)

GraphSage (Hamilton et al.

NIPS 2017)

(Glimer et al. MPNN ICML 2017)

Spectral Graph CNN (Bruna et al.

ICLR 2014)

ChebNet (Defferard et al.

NIPS 2016)

…

(8)

GCN: Graph Filtering

How to design the spectral filter for GCN?

Data-driven! Learn 𝑔(Λ)ො from data!

How to deal with multi-channel signals?

Each input channel contributes to each output channel 𝑭_𝑂 : , 𝑛 = ෍

𝑚=1 𝑑₁

ො

𝑔_𝑛𝑚 𝑳 𝑭_𝐼 : , 𝑚 𝑛 = 1, … 𝑑₂ 𝑭_𝐼 ∈ ℝ^𝑁×𝑑¹ → 𝑭_𝑂 ∈ ℝ^𝑁×𝑑².

Filter for input channel Learn 𝑑₂ × 𝑑₁ filter

(9)

GCN: Graph Filtering

ො

𝑔_𝑛𝑚(Λ): non-parametric → not learnable; 𝑛 = 1, … 𝑑₂, 𝑚 = 1, … 𝑑₁

ො

𝑔_𝑛𝑚 Λ =

ො

𝑔_𝑛𝑚 𝜆₁ 0 0

0 ⋱ 0

0 0 𝑔ො_𝑛𝑚 𝜆_𝑁

⟹ ො𝑔_𝑛𝑚 𝑳 = 𝑼 ො𝑔_𝑛𝑚 Λ 𝑼^𝑻

(10)

GCN: Graph Filtering

ො

𝑔_𝑛𝑚 Λ : learnable parameterization;

𝑛 = 1, … 𝑑₂, 𝑚 = 1, … 𝑑₁

ො

𝑔_𝑛𝑚 Λ =

𝜃₁^(𝑛𝑚) 0 0

0 ⋱ 0

0 0 𝜃_𝑁^(𝑛𝑚)

Spectral GCN: Spectral Networks and Deep Locally Connected Networks on Graphs (Bruna et al. ICLR 2014)

⟹ 𝑑₁× 𝑑₂ × 𝑁 parameters

(11)

GCN: Graph Filtering

ො

𝑔_𝑛𝑚 Λ : polynomial parameterization;

𝑛 = 1, … 𝑑₂, 𝑚 = 1, … 𝑑₁

ො

𝑔_𝑛𝑚 Λ =

෍

𝑘=0 𝐾

𝜃_𝑘^(𝑛𝑚) 𝜆₁^𝑘 0 0

0 ⋱ 0

0 0 ෍

𝑘=0 𝐾

𝜃_𝑘^(𝑛𝑚) 𝜆_𝑁^𝑘

(12)

GCN: Graph Filtering

ො

𝑔_𝑛𝑚 Λ : polynomial parameterization; 𝑛 = 1, … 𝑑₂, 𝑚 = 1, … 𝑑₁

ො

𝑔_𝑛𝑚 Λ =

෍

𝑘=0 𝐾−1

𝜃_𝑘^(𝑛𝑚) 𝜆₁^𝑘 0 0

0 ⋱ 0

0 0 ෍

𝑘=0 𝐾−1

𝜃_𝑘^(𝑛𝑚) 𝜆^𝑘_𝑁

⟹ 𝑑₁× 𝑑₂ × 𝐾 parameters

𝑭_𝑂 : , 𝑛 = 𝑼 ො𝑔_𝑛𝑚 Λ 𝑼^𝑻𝑭_𝐼 : , 𝑚 = σ_𝑚=1^𝑑¹ σ_𝑘=0^𝐾−1 𝜃_𝑘^(𝑛𝑚) 𝑳^𝑘 𝑭_𝐼 : , 𝑚

= ෍

𝑘=0 𝐾−1

𝜃_𝑘^(𝑛𝑚)Λ^𝑘

(13)

GCN: Graph Filtering

ො

𝑔_𝑛𝑚 Λ : polynomial parameterization; 𝑛 = 1, … 𝑑₂, 𝑚 = 1, … 𝑑₁ 𝑭_𝑂 : , 𝑛 = 𝑼 ො𝑔_𝑛𝑚 Λ 𝑼^𝑻𝑭_𝐼 : , 𝑚 = σ_𝑚=1^𝑑¹ σ_𝑘=0^𝐾−1 𝜃_𝑘^(𝑛𝑚) 𝑳^𝑘 𝑭_𝐼 : , 𝑚

If the node 𝑣_𝑗 is more than 𝐾 − ℎ𝑜𝑝𝑠 away from node 𝑣_𝑖, then, σ_𝑘=0^𝐾−1𝜃_𝑘^(𝑛𝑚)𝐿^𝑘

𝑖𝑗 = 0 _{𝒉 = 𝑳𝒇, ℎ}_𝒊 _{= σ}_𝒗

𝒋∈𝓝(𝒗_𝒊)(𝑓_𝒊 − 𝑓_𝑗) 𝒈 = 𝑳𝒉 = 𝑳𝑳𝒇 = 𝑳^𝟐𝒇

𝑔_𝒌 = σ_𝒗 _∈𝓝 _𝓝(𝒗 ₎(𝑓_𝑘 − 𝑓_𝑗)

(14)

GCN: Graph Filtering

ො

𝑔_𝑛𝑚 Λ : polynomial parameterization; 𝑛 = 1, … 𝑑₂, 𝑚 = 1, … 𝑑₁

𝑭_𝑂 : , 𝑛 = 𝑼 ො𝑔_𝑛𝑚 Λ 𝑼^𝑻𝑭_𝐼 : , 𝑚 = σ_𝑚=1^𝑑¹ σ_𝑘=0^𝐾−1 𝜃_𝑘^(𝑛𝑚) 𝑳^𝑘 𝑭_𝐼 : , 𝑚

If the node 𝑣_𝑗 is more than 𝐾 − ℎ𝑜𝑝𝑠 away from node 𝑣_𝑖, then,

෍

𝑘=0 𝐾−1

𝜃_𝑘^(𝑛𝑚)𝐿^𝑘

𝑖𝑗

= 0

⟹ The filter is localized within 𝐾 − ℎ𝑜𝑝𝑠 neighbors in the spatial domain

(15)

GCN: Chebyshev Polynomials

High-order polynomial has non-orthogonal basis 1, 𝑥, 𝑥², … 𝑔 𝑥 = 𝜃₀ + 𝜃₁𝑥 + 𝜃₂𝑥² + ⋯

Unstable under perturbation of coefficients (convoluted)

(16)

GCN: Chebyshev Polynomials

High-order polynomial has non-orthogonal basis 1, 𝑥, 𝑥², … 𝑔 𝑥 = 𝜃₀ + 𝜃₁𝑥 + 𝜃₂𝑥² + ⋯

Chebyshev polynomials for recursive formulation for fast filtering:

Recursive definition:

 𝑇₀ 𝑥 = 1; 𝑇₁ 𝑥 = 𝑥

 𝑇_𝑘 𝑥 = 2𝑥𝑇_𝑘−1 𝑥 − 𝑇_𝑘−2 𝑥

Chebyshev polynomials 𝑇_𝑘 form an orthogonal basis for the Hilbert space 𝐿²( −1,1 , ^𝑑𝑥

1−𝑥²).

(17)

GCN: Chebyshev Polynomials

Chebyshev polynomials:

Recursive definition:

 𝑇₀ 𝑥 = 1; 𝑇₁ 𝑥 = 𝑥

 𝑇_𝑘 𝑥 = 2𝑥𝑇_𝑘−1 𝑥 − 𝑇_𝑘−2 𝑥

Chebyshev polynomials 𝑇_𝑘 form an orthogonal basis for the Hilbert space 𝐿²( −1,1 , ^𝑑𝑥

1−𝑥²).

Chebyshev filter has orthogonal basis 𝑇₀ 𝑥 , 𝑇₁ 𝑥 , 𝑇₂ 𝑥 , …

(18)

GCN: ChebNet

Parameterize 𝑔ො_𝑛𝑚 𝜦 with Chebyshev polynomials

ො

𝑔_𝑛𝑚 𝜦 = ෍

𝑘=0 𝐾

𝜃_𝑘^(𝑚𝑛)𝑇_𝑘 𝜦 , 𝑤𝑖𝑡ℎ ෩෩ 𝜦 = 2𝜦

𝜆_𝑚𝑎𝑥 − 𝐈 ∈ [−𝐈, 𝐈]

𝑑₂ × 𝑑₁ × 𝐾 parameters

ChebNet: Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (Defferard et al. NIPS 2016)

(19)

GCN: ChebNet

Parameterize 𝑔ො_𝑛𝑚 𝜦 with Chebyshev polynomials

ො

𝑔_𝑛𝑚 𝜦 = ෍

𝑘=0 𝐾

𝜃_𝑘^(𝑚𝑛)𝑇_𝑘 𝜦 , 𝑤𝑖𝑡ℎ ෩෩ 𝜦 = 2𝜦

𝜆_𝑚𝑎𝑥 − 𝐈 ∈ [−𝐈, 𝐈]

Single channel:

𝒇_𝑶 = 𝑼 ො𝑔 𝜦 𝑼^𝑇𝒇_𝑰 = σ_𝑘=0^𝐾 𝜃_𝑘𝑇_𝑘(෨𝑳) 𝒇_𝑰, with ෨𝑳 = ^2𝑳

𝜆_𝑚𝑎𝑥 − 𝐈

(20)

GCN: ChebNet

Single channel:

𝒇_𝑶 = 𝑼 ො𝑔 𝜦 𝑼^𝑇𝒇_𝑰 = σ_𝑘=0^𝐾 𝜃_𝑘𝑇_𝑘(෨𝑳) 𝒇_𝑰, with ෨𝑳 = ^2𝑳

𝜆_𝑚𝑎𝑥 − 𝐈 Multiple channel: 𝑑₂ × 𝑑₁ × 𝐾 parameters

𝑭_𝑂 : , 𝑛 = σ_𝑚=1^𝑑¹ σ_𝑘=0^𝐾 𝜃_𝑘^(𝑚𝑛)𝑇_𝑘(෨𝑳)𝑭_𝐼 : , 𝑚

(21)

GCN: Simplified ChebNet

Chebyshev polynomials 𝑔ො_𝑛𝑚 𝜦 with 𝐾 = 1 and 𝜆_𝑚𝑎𝑥 = 2^(∗) for 𝑳_𝑠𝑦𝑚

ො

𝑔_𝑛𝑚 𝜦 = 𝜃₀^(𝑚𝑛) + 𝜃₁^(𝑚𝑛)(𝜦 − 𝐈) Further constrain 𝜃 = 𝜃₀ = −𝜃₁

ො

𝑔_𝑛𝑚 𝜦 = 𝜃^(𝑚𝑛)(2𝐈 − 𝜦) Single channel: 𝑛 = 1, 𝑚 = 1, for 𝑳 = 𝑳_𝑠𝑦𝑚

𝒇_𝑂 = 𝑼 ො𝑔 𝜦 𝑼^𝑇𝒇_𝐼 = 𝜃 2𝐈 − 𝑳 𝒇_𝐼 = 𝜃 𝐈 + 𝑫^−𝟏/𝟐𝑨𝑫^−𝟏/𝟐 𝒇_𝐼

(22)

GCN: Simplified ChebNet

Single channel: 𝑛 = 1, 𝑚 = 1

𝒇_𝑂 = 𝑼 ො𝑔 𝜦 𝑼^𝑇𝒇_𝐼 = 𝜃 2𝐈 − 𝑳 𝒇_𝐼 = 𝜃 𝐈 + 𝑫^−𝟏/𝟐𝑨𝑫^−𝟏/𝟐 𝒇_𝐼

Since 𝜆 𝑰 + 𝑫^−𝟏/𝟐𝑨𝑫^−𝟏/𝟐 ∈ [0,2], repeated operations lead to instabilties.

Renormalization trick:

𝐈 + 𝑫^−𝟏/𝟐𝑨𝑫^−𝟏/𝟐 → ෩𝑫^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 with 𝑨 = 𝑨 + 𝐈, and ෩෩ 𝑫_𝑖𝑖 = σ_𝑗 𝑨෩_𝑖𝑗 . This yields

𝒇_𝑂 = 𝜃(෩𝑫^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐)𝒇_𝐼

(23)

GCN: Simplified ChebNet

Single channel: 𝑚 = 1, 𝑛 = 1

𝒇_𝑂 = 𝜃 ෩𝑫^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 𝒇_𝐼, with 𝑨 = 𝑨 + 𝐈, and ෩෩ 𝑫_𝑖𝑖 = σ_𝑗 𝑨෩_𝑖𝑗 Multiple channel: 𝑚 = 1, … , 𝑑₁; 𝑛 = 1, … , 𝑑₂

𝑭_𝑂 : , 𝑛 = ෍

𝑚=1 𝑑₁

𝜃^(𝑚𝑛) 𝑫෩^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 𝑭_𝐼 : , 𝑚

(24)

GCN: Simplified ChebNet

Multiple channel: 𝑚 = 1, … , 𝑑₁; 𝑛 = 1, … , 𝑑₂ 𝑭_𝑂 : , 𝑛 = ෍

𝑚=1 𝑑₁

𝜃^(𝑚𝑛) 𝑫෩^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 𝑭_𝐼 : , 𝑚 𝑭_𝑂 : , 𝑛 = 𝑫෩^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 ෍

𝑚=1 𝑑₁

𝜃^(𝑚𝑛)𝑭_𝐼 : , 𝑚

𝑭_𝑂 : , 𝑛 = 𝑫෩^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 𝑭_𝐼 : , 1 … 𝑭_𝐼 : , 𝑑₁

𝜃^(1𝑛)

⋮ 𝜃^(𝑑¹^𝑛) 𝑭_𝑂 : , 𝟏 … 𝑭_𝑂 : , 𝑑₂ = 𝑫෩^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 𝑭_𝐼 : , 1 … 𝑭_𝐼 : , 𝑑₁

𝜃⁽¹¹⁾

⋮ 𝜃^(𝑑¹¹⁾

…

⋱

…

𝜃^(1𝑑²⁾

⋮ 𝜃^(𝑑¹^𝑑²⁾ 𝑭_𝑂 = 𝑫෩^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 𝑭_𝐼𝚯, with 𝚯 ∈ ℝ^𝑑¹^×𝑑² and 𝚯 𝑚, 𝑛 = 𝜃^(𝑚𝑛)

(25)

GCN: Simplified ChebNet

Multiple channel: 𝑚 = 1, … , 𝑑₁; 𝑛 = 1, … , 𝑑₂ 𝑭_𝑂 : , 𝑛 = ෍

𝑚=1 𝑑₁

𝜃^(𝑚𝑛) 𝑫෩^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 𝑭_𝐼 : , 𝑚 Matrix form:

𝑭_𝑂 = 𝑫෩^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 𝑭_𝐼𝚯, with 𝚯 ∈ ℝ^𝑑¹^×𝑑² and 𝚯 𝑚, 𝑛 = 𝜃^(𝑚𝑛) Why Spectral filtering?

(26)

GCN: Simplified ChebNet

Matrix form:

𝑭_𝑂 = 𝑫෩^−𝟏/𝟐𝑨෩෩𝑫^−𝟏/𝟐 𝑭_𝐼𝚯, with 𝚯 ∈ ℝ^𝑑¹^×𝑑² and 𝚯 𝑚, 𝑛 = 𝜃^(𝑚𝑛) 𝒇 GFT ෠𝒇 = 𝑼^𝑻𝒇

Coefficients Decompose

ො

𝑔(Λ)

Filter 𝑔(Λ)𝑼ො ^𝑻𝒇

Filtered coefficients

IGFT

Reconstruct 𝑼 ො𝑔(Λ)𝑼^𝑻𝒇

ො 𝑔(𝑳)

(27)

Summary Questions of the lecture

 Explain the key idea of Spectral GCN: Spectral Networks and Deep Locally Connected Networks on Graphs (Bruna et al. ICLR 2014)

 Explain the key idea of ChebNet: Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (Defferard et al. NIPS 2016)

 Explain the key idea of Simplified ChebNet: Semi-Supervised

Classification with Graph Convolutional Networks (Kipf & Welling, ICLR 2017)