A gentle introduction to graph neural networks

(1)

A gentle introduction to graph neural networks

Seongok Ryu, Department of Chemistry @ KAIST

(2)

• Motivation

• Graph Neural Networks

• Applications of Graph Neural Networks

(3)

Motivation

(4)

Non-Euclidean data structure

Successful deep learning architectures 1. Convolutional neural networks

(5)

Non-Euclidean data structure

(6)

Non-Euclidean data structure

https://medium.com/comet-app/review-of-deep-learning-algorithms-for-object-detection-c1f3d437b852

(7)

Non-Euclidean data structure

Lee, Jason, Kyunghyun Cho, and Thomas Hofmann.

"Fully character-level neural machine translation without explicit segmentation." arXiv preprint arXiv:1610.03017 (2016).

(8)

Non-Euclidean data structure

Successful deep learning architectures 2. Recurrent neural network

(9)

Non-Euclidean data structure

(10)

Non-Euclidean data structure

Sentiment analysis

(11)

Non-Euclidean data structure

Neural machine translation

(12)

Non-Euclidean data structure

Data structures of shown examples are regular.

Image – values on pixels (grids) Sentence – sequential structure

(13)

Non-Euclidean data structure

HOWEVER, there are lots of irregular data structure, …

Social Graph

(Facebook, Wikipedia) 3D Mesh Molecular Graph

All you need is GRAPH!

(14)

Graph structure

𝑮𝒓𝒂𝒑𝒉 = 𝑮(𝑿, 𝑨)

X : Node, Vertex

- Individual person in a social network - Atoms in a molecule

Represent elements of a system

(15)

Graph structure

𝑮𝒓𝒂𝒑𝒉 = 𝑮(𝑿, 𝑨)

A : Adjacency matrix - Edges of a graph

- Connectivity, Relationship

Represent relationship or interaction between elements of the system

(16)

Graph structure

More detail, …

Battaglia, Peter W., et al.

"Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).

(17)

Graph structure

Node features Ariana Grande

• 25 years old

• Female

• American

• Singer

• … Donald Trump

• 72 years old

• Male

• American

• President, business man

• …

Vladimir Putin

• 65 years old

• Male

• Russian

• President

• …

(18)

Graph structure

Edge features

Double bond Single bond

Aromatic bond

(19)

Learning relation and interaction

What can we do with graph neural networks?

(20)

Learning relation and interaction

What can we do with graph neural networks?

• Node classification

• Link prediction

• Node2Vec, Subgraph2Vec, Graph2Vec

: Embedding node/substructure/graph structure to a vector

• Learning physics law from data

• And you can do more amazing things with GNN!

(21)

Graph neural networks

(22)

• Overall architecture of graph neural networks

• Updating node states

- Graph Convolutional Network (GCN) - Graph Attention Network (GAT)

- Gated Graph Neural Network (GGNN)

• Readout : permutation invariance on changing node orders

• Graph Auto-Encoders

• Practical issues

- Skip connection - Inception

- Dropout

(23)

Principles of graph neural network

Weights using in updating hidden states of fully-connected Net, CNN and RNN

(24)

Overall neural network structure – case of supervised learning

Input node features, 𝑯_𝒊^(𝟎) : Raw node information

Final node states, 𝑯_𝒊^(𝑳)

: How the NN recognizes the nodes Graph features, Z

(25)

Principles of graph neural network

Updates in a graph neural network

• Edge update : relationship or interactions, sometimes called as ‘message passing’

ex) the forces of spring

• Node update : aggregates the edge updates and used in the node update ex) the forces acting on the ball

• Global update : an update for the global attribute

ex) the net forces and total energy of the physical system

(26)

Principles of graph neural network

Weights using in updating hidden states of GNN

Sharing weights for all nodes in graph,

but nodes are differently updated by reflecting individual node features, 𝑯_𝒋^(𝒍)

(27)

GCN : Graph Convolutional Network

Kipf, Thomas N., and Max Welling.

"Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016)

http://tkipf.github.io/ Famous for variational autoencoder (VAE)

(28)

GCN : Graph Convolutional Network

What NN learns

𝑿

_𝒊^(𝒍+𝟏)

= 𝝈(

𝒋∈[𝒊−𝒌,𝒊+𝒌]

𝑾

_𝒋^(𝒍)

𝑿

_𝒋^(𝒍)

+ 𝒃

_𝒋^(𝒍)

)

(29)

GCN : Graph Convolutional Network

𝑯

_𝟐^(𝒍+𝟏)

= 𝝈 𝑯

_𝟏^𝒍

𝑾

^𝒍

+ 𝑯

_𝟐^𝒍

𝑾

^𝒍

+ 𝑯

_𝟑^𝒍

𝑾

^𝒍

+ 𝑯

_𝟒^𝒍

𝑾

^𝒍

1 2

4

3

(30)

GCN : Graph Convolutional Network

𝑯

_𝒊^(𝒍+𝟏)

= 𝝈

𝒋∈𝑵 𝒊

𝑯

_𝒋^𝒍

𝑾

^𝒍

= 𝝈 𝑨𝑯

_𝒋^𝒍

𝑾

^𝒍

What NN learns

(31)

GCN : Graph Convolutional Network

"Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016)

• Classification of nodes of citation networks and a knowledge graph

• 𝐿 = _𝑙∈𝑦_𝐿 _𝑓=1^𝐹 𝑌_𝑙𝑓 ln 𝑍_𝑙𝑓

(32)

GAT : Graph Attention Network

• Attention revisits

𝑯

_𝒊^(𝒍+𝟏)

= 𝝈

𝒋∈𝑵 𝒊

𝑯

_𝒋^𝒍

𝑾

^𝒍

= 𝝈 𝑨𝑯

_𝒋^𝒍

𝑾

^𝒍

What NN learns

GCN :

(33)

GAT : Graph Attention Network

𝑯

_𝒊^(𝒍+𝟏)

= 𝝈

𝒋∈𝑵 𝒊

𝜶

_𝒊𝒋^(𝒍)

𝑯

_𝒋^𝒍

𝑾

^𝒍

What NN learns

: Convolution weight and attention coefficient

GAT :

Velickovic, Petar, et al.

"Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).

(34)

GAT : Graph Attention Network

• Attention mechanism in natural language processing

https://devblogs.nvidia.com/introduction-neural-machine-translation-gpus-part-3/figure6_sample_translations-2/

(35)

GAT : Graph Attention Network

http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/

(36)

GAT : Graph Attention Network

Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning.

"Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).

𝒂_𝒕 𝑠 = 𝑎𝑙𝑖𝑔𝑛 𝒉_𝒕, 𝒉_𝒔 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑠𝑐𝑜𝑟𝑒 𝒉_𝒕, 𝒉_𝒔

𝑠𝑐𝑜𝑟𝑒(𝒉_𝒕, 𝒉_𝒔) =

𝛽 ⋅ 𝒉_𝒕^𝑻𝒉_𝒔 𝒉_𝒕^𝑻𝑾_𝒂𝒉_𝒔

𝒗_𝒂^𝑻tanh(𝑾_𝒂[𝒉_𝒕, 𝒉_𝒔])

(37)

GAT : Graph Attention Network

𝑯_𝒊^(𝒍+𝟏) = 𝝈

𝒋∈𝑵 𝒊

𝜶_𝒊𝒋^(𝒍)𝑯_𝒋^𝒍 𝑾 ^𝒍

What NN learns

: Convolution weight and attention coefficient

𝜶_𝒊𝒋 = 𝒇(𝑯_𝒊𝑾, 𝑯_𝒋𝑾)

(38)

GAT : Graph Attention Network

𝑯_𝒊^(𝒍+𝟏) = 𝝈

𝒋∈𝑵 𝒊

What NN learns

: Convolution weight and attention coefficient

𝜶_𝒊𝒋 = 𝒇(𝑯_𝒊𝑾, 𝑯_𝒋𝑾)

• Velickovic, Petar, et al. – network analysis 𝜶_𝒊𝒋 = 𝒔𝒐𝒇𝒕𝒎𝒂𝒙 𝒆_𝒊𝒋 = 𝒆_𝒊𝒋

𝒆𝒙𝒑 _{𝒌∈𝑵 𝒊} 𝒆_𝒊𝒌 𝒆_𝒊𝒋 = 𝑳𝒆𝒂𝒓𝒌𝒚𝑹𝒆𝑳𝑼(𝒂^𝑻 𝑯_𝒊𝑾, 𝑯_𝒋𝑾 )

• Seongok Ryu, et al. – molecular applications 𝜶_𝒊𝒋 = 𝐭𝐚𝐧𝐡 𝑯_𝒊𝑾 ^𝑻𝑪 𝑯_𝒋𝑾

(39)

GAT : Graph Attention Network

• Multi-head attention

𝑯_𝒊^(𝒍+𝟏) = 𝝈 𝟏 𝑲𝒌=𝟏

𝑲

𝒋∈𝑵 𝒊

• Average

• Concatenation

𝑯_𝒊^(𝒍+𝟏) =

𝒌=𝟏 𝑲

𝝈

𝒋∈𝑵 𝒊

(40)

GGNN : Gated Graph Neural Network

Wu, Zhenqin, et al.

"MoleculeNet: a benchmark for molecular machine learning." Chemical science 9.2 (2018): 513-530.

• Message Passing Neural Network (MPNN) framework

ℎ_𝑖^(𝑙+1) = 𝑈(ℎ_𝑖^𝑙 , 𝑚_𝑖^𝑙+1 )

Using previous node state and message state to update the i^th hidden node state

The message state is updated by previous

neighbor states and the route state, and edge states.

𝑚_𝑖^(𝑙+1) =

𝑗∈𝑁(𝑖)

𝑀 ^𝑙+1 (ℎ_𝑖, ℎ_𝑗, 𝑒_𝑖𝑗)

(41)

GGNN : Gated Graph Neural Network

Li, Yujia, et al.

"Gated graph sequence neural networks." arXiv preprint arXiv:1511.05493 (2015).

• Using recurrent unit for updating the node states, in this case GRU.

𝒉_𝒊^(𝒍+𝟏) = 𝑼(𝒉_𝒊^𝒍 , 𝒎_𝒊^𝒍+𝟏 ) 𝒉_𝒊^(𝒍+𝟏) = 𝑮𝑹𝑼(𝒉_𝒊^𝒍 , 𝒎_𝒊^𝒍+𝟏 )

𝒉_𝒊^(𝒍+𝟏) = 𝒛 ^𝒍+𝟏 ∗ 𝒉_𝒊^(𝒍+𝟏) + 𝟏 − 𝒛 ^𝒍+𝟏 ∗ 𝒉_𝒊^(𝒍)

Temporal node state Previous node state Updating rate of the temporal state

(42)

Readout : permutation invariance on changing nodes order

Input node features, 𝑯_𝒊^(𝟎) : Raw node information

Final node states, 𝑯_𝒊^(𝑳)

: How the NN recognizes the nodes Graph features, Z

(43)

Readout : permutation invariance on changing nodes order

Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction - Scientific Figure on ResearchGate. Available from:

https://www.researchgate.net/Graph-permutation-invariance-and-structured-prediction-A-graph-labeling-function-F-is_fig1_323217335 [accessed 8 Sep, 2018]

(44)

Readout : permutation invariance on changing nodes order

𝑧

_𝐺

= 𝑓 𝐻

_𝑖^(𝐿)

• Graph feature

• Node-wise summation

• Graph gathering 𝑧

_𝐺

= 𝜏

𝑖∈𝐺

𝑀𝐿𝑃 𝐻

_𝑖^𝐿

𝑧

_𝐺

= 𝜏

𝑖∈𝐺

𝜎 𝑀𝐿𝑃

₁

𝐻

_𝑖^𝐿

, 𝐻

_𝑖⁰

⨀𝑀𝐿𝑃

₂

𝐻

_𝑖^𝐿 ^• 𝜏 : ReLU activation

• 𝜎 : sigmoid activation

Gilmer, Justin, et al.

"Neural message passing for quantum chemistry." arXiv preprint arXiv:1704.01212 (2017).

(45)

Graph Auto-Encoders (GAE)

"Variational graph auto-encoders." arXiv preprint arXiv:1611.07308 (2016).

https://github.com/tkipf/gae

• Clustering

• Link prediction

• Matrix completion and recommendation

(46)

Graph Auto-Encoders (GAE)

Input graph, 𝑮(𝑿, 𝑨)

Node features, 𝑞(𝒁|𝑿, 𝑨)

Reconstructed

Adjacency matrix, 𝐴 = 𝝈(𝒁𝒁

^𝑻

)

Encoder

Decoder

(47)

Graph Auto-Encoders (GAE)

Input graph, 𝑮(𝑿, 𝑨)

Node features, 𝑞(𝒁|𝑿, 𝑨)

Reconstructed

Adjacency matrix, 𝐴 = 𝝈(𝒁𝒁

^𝑻

)

Encoder

Decoder

Encoder - Inference model

• Two-layer GCN

𝐺𝐶𝑁 𝑿, 𝑨 = 𝑨𝑅𝑒𝐿𝑈 𝑨𝑿𝑾_𝟎 𝑾_𝟏

• Variational Inference

𝑨 = 𝑫⁻¹²𝑨𝑫⁻¹² : symmetrically normalized adjacency matrix

𝑞 𝒁 𝑿, 𝑨 =

𝑖=1 𝑁

𝑞(𝒛_𝒊|𝑿, 𝑨) 𝑞 𝒛_𝒊 𝑿, 𝑨 = 𝑁 𝒛_𝒊 𝝁_𝒊, 𝑑𝑖𝑎𝑔 𝝈_𝒊^𝟐

(48)

Graph Auto-Encoders (GAE)

Input graph, 𝑮(𝑿, 𝑨)

Node features, 𝑞(𝒁|𝑿, 𝑨)

Reconstructed

Adjacency matrix, 𝑨 = 𝝈(𝒁𝒁

^𝑻

)

Encoder

Decoder

Decoder - Generative model

• Inner product between latent vectors

𝑝 𝑨 𝒁 =

𝑖=1 𝑁

𝑗=1 𝑁

𝑝(𝑨_𝒊𝒋|𝒛_𝒊, 𝒛_𝒋) 𝑝 𝑨_𝒊𝒋 𝒛_𝒊, 𝒛_𝒋 = 𝜎 𝒛_𝒊^𝑇𝒛_𝒋

𝐴_𝑖𝑗 : the elements of (reconstructed) A 𝜎 ⋅ : sigmoid activation

• Learning

ℒ = 𝔼

_{𝑞(𝒁|𝑿,𝑨)}

log 𝑝(𝑨|𝒁) − KL 𝑞 𝒁 𝑿, 𝑨 ||(𝑝(𝒁)

Reconstruction loss KL-divergence

𝑨 = 𝝈(𝒁𝒁^𝑻) with 𝒁 = 𝑮𝑪𝑵(𝑿, 𝑨)

(49)

Practical Issues : Inception

GoogLeNet – Winner of 2014 ImageNet Challenge

https://towardsdatascience.com/an-intuitive-guide-to-deep-network-architectures-65fdc477db41

(50)

Practical Issues : Inception

Inception module

(51)

Practical Issues : Inception

𝑯

_𝒊^(𝟏)

= 𝝈 𝑨𝑯

_𝒋^𝟎

𝑾

^(𝟎)

𝑯

_𝒊^(𝟐)

= 𝝈 𝑨𝑯

_𝒋^𝟏

𝑾

^(𝟏)

(52)

Practical Issues : Inception

• Make network wider

• Avoid vanishing gradient

(53)

Practical Issues

: Skip-connection

ResNet – Winner of 2015 ImageNet Challenge

Going Deeeeeeeeper!

(54)

Practical Issues

: Skip-connection

ResNet – Winner of 2015 ImageNet Challenge

𝑦 = 𝐻

_𝑖^(𝑙+1)

+ 𝐻

_𝑖^(𝑙)

(55)

Practical Issues : Dropout

Dropout rate (𝑝) = 0.25

# parameters : 4x4 = 16

# parameters : 16x0.75 = 12

# parameters

: 16x0.75x0.75 = 9 For a dense network, the dropout of hidden state reduces the number of parameters

from 𝑁_𝑤 to 𝑁_𝑤 1 − 𝑝 ²

(56)

Practical Issues : Dropout

28 40 36 ⋅⋅⋅

M F F ⋅⋅⋅

Korean American French ⋅⋅⋅

Student Medical

Doctor Politician ⋅⋅⋅

Hidden states in a graph neural network

Node 1 Node 2 Node 3 ⋅⋅⋅

𝑯_𝟏^(𝒍) 𝑯_𝟐^(𝒍) 𝑯_𝟑^(𝒍) 𝑯_𝟒^(𝒍)

Graph Conv. : 𝐻

_𝑖^(𝑙+1)

= 𝐴𝑯

^(𝒍)

𝑊

Age

Sex

Nationality

Job

(57)

Practical Issues : Dropout

28 40 36 ⋅⋅⋅

M F F ⋅⋅⋅

Student Medical

Hidden states in a graph neural network

𝑯_𝟏^(𝒍) 𝑯_𝟐^(𝒍) 𝑯_𝟑^(𝒍) 𝑯_𝟒^(𝒍)

Age

Sex

Nationality

Job

28 40 36 ⋅⋅⋅

M F F ⋅⋅⋅

Student Medical

𝑯_𝟏^(𝒍) 𝑯_𝟐^(𝒍) 𝑯_𝟑^(𝒍) 𝑯_𝟒^(𝒍)

Age

Sex

Nationality

Job

Mask individual person Mask information (features) of node states

And many other options are possible. The proper method depends on your task.

(58)

Applications of graph neural networks

(59)

• Network Analysis

1. Node classification 2. Link prediction 3. Matrix completion

• Molecular Applications

1. Neural molecular fingerprint

2. Quantitative Structure-Property Relationship (QSPR) 3. Molecular generative model

• Interacting physical system

(60)

Network analysis

1. Node classification – karate club network

"Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016) http://tkipf.github.io/graph-convolutional-networks/

Karate club graph, colors denote communities obtained

via modularity-based clustering (Brandes et al., 2008). GCN embedding (with random weights) for nodes in the karate club network.

• All figures and descriptions are taken from Thomas N. Kipf’s blog.

• Watch video on his blog.

(61)

Network analysis

1. Node classification

• Good node features → Good node classification results

(62)

Network analysis

1. Node classification

• Semi-supervised learning – low label rate

• Citation network – Citeseer, Cora, Pubmed / Bipartite graph - NELL

(63)

Network analysis

1. Node classification

• Outperforms classical machine learning methods

(64)

Network analysis

1. Node classification

• Spectral graph convolution

𝐻^(𝑙+1) = 𝜎 𝐷^−1/2𝐴 𝐷^−1/2𝐻 ^𝑙 𝑊^(𝑙) 𝐴 = 𝐴 + 𝐼_𝑁, 𝐷_𝑖𝑖 = _𝑗𝐴_𝑖𝑗

• Spectral graph filtering

𝑔_𝜃 ⋆ 𝑥 = 𝑈𝑔_𝜃^′ Λ 𝑈^𝑇𝑥 𝑈 : the matrix of eigenvectors of the normalized graph Laplacian 𝐿 = 𝐼_𝑁 − 𝐷⁻¹²𝐴 𝐷⁻¹² = 𝑈Λ𝑈^𝑇

𝑔_𝜃^′ Λ ≈

𝑘=0 𝐾

𝜃_𝑘^′𝑇_𝑘( Λ) Polynomial approximation (In this case, Chebyshev polynomial)

(65)

Network analysis

1. Node classification

• Spectral graph filtering

𝑔_𝜃 ⋆ 𝑥 ≈

𝑘=0 𝐾

𝜃_𝑘^′𝑇_𝑘 𝐿 𝑥 𝑔_𝜃 ⋆ 𝑥 ≈ 𝜃₀^′𝑥 + 𝜃₁^′ 𝐿 − 𝐼_𝑁 𝑥 = 𝜃₀^′𝑥 + 𝜃₁^′𝐷^−1/2𝐴𝐷^−1/2𝑥 Linear approx.

𝑔_𝜃 ⋆ 𝑥 ≈ 𝜃 𝐼_𝑁 + 𝐷^−1/2𝐴𝐷^−1/2 𝑥 𝑔_𝜃 ⋆ 𝑥 ≈ 𝜃 𝐷^−1/2𝐴 𝐷^−1/2 𝑥

Renormalization trick

𝐼_𝑁 + 𝐷^−1/2𝐴𝐷^−1/2 = 𝐷^−1/2𝐴 𝐷^−1/2

Use a single parameter 𝜃 = 𝜃₀^′ = 𝜃₁^′

(66)

Network analysis

1. Node classification

(67)

Network analysis 2. Link prediction

• Clustering

• Link prediction

• Matrix completion and recommendation

(68)

Network analysis 2. Link prediction

Input graph, 𝑮(𝑿, 𝑨)

Node features, 𝑞(𝒁|𝑿, 𝑨)

Reconstructed

Adjacency matrix, 𝐴 = 𝝈(𝒁𝒁

^𝑻

)

Encoder

Decoder

(69)

Network analysis 2. Link prediction

• Trained on an incomplete version of {Cora, Citeseer, Pubmed} datasets where parts of the citation links (edges) have been removed, while all node features are kept.

• Form validation and test sets from previously removed edges and the same number of randomly sampled pairs of unconnected nodes (non-edges).

(70)

Network analysis

3. Matrix completion

Berg, Rianne van den, Thomas N. Kipf, and Max Welling.

"Graph convolutional matrix completion." arXiv preprint arXiv:1706.02263 (2017).

• Matrix completion → Can be applied for recommending system

(71)

Network analysis

3. Matrix completion

• A rating matrix 𝑀 of shape 𝑁_𝑢 × 𝑁_𝑣

𝑁_𝑢 : the number of users, 𝑁_𝑣 : the number of items

• User 𝑖 rated item 𝑖, or the rating is unobserved (𝑀_𝑖𝑗 = 0).

• Matrix completion problem or recommendation

→ a link prediction problem on a bipartite user-item interaction graph.

Input graph : 𝑮(𝓦, 𝓔, 𝓡)

• 𝓦 = 𝓤 ∪ 𝓥 : user nodes 𝒖_𝐢 ∈ 𝓤, with 𝑖 ∈ {1, … , 𝑁_𝑢} and item nodes 𝒗_𝒋 ∈ 𝓥, with 𝑗 ∈ 1, … , 𝑁_𝑣

• 𝒖_𝒊, 𝑟, 𝒗_𝒋 ∈ 𝓔 represent rating levels, such as 𝑟 ∈ 1, … , 𝑅 = 𝓡.

(72)

Network analysis

3. Matrix completion

• GAE for the link prediction task

𝒁 = 𝑓(𝑿,𝑨)

Take as input an 𝑁 × 𝐷 feature matrix, 𝑿

A graph adjacency matrix, 𝑨 𝑁 × 𝐸 node embedding matrix, 𝒁 = 𝒛_𝟏^𝑻, … , 𝒛_𝑵^{𝑻 𝑻}

𝑨 = 𝑔(𝒁)

which takes pairs of node embeddings 𝒛_𝒊,𝒛_𝒋 and predicts respective entries 𝑨_𝒊𝒋

(73)

Network analysis

3. Matrix completion

• GAE for the bipartite recommender graphs, 𝑮(𝓦, 𝓔, 𝓡)

𝑈, 𝑉 = 𝑓(𝑋, 𝑀₁, … , 𝑀_𝑅), where 𝑀_𝑟 ∈ 0,1 ^𝑁^𝑢^×𝑁^𝑣 is the adjacency matrix associated with rating type 𝑟 ∈ ℛ 𝑈, 𝑉 : matrices of user and item embeddings with shape 𝑁_𝑢 × 𝐸 and 𝑁_𝑣 × 𝐸, respectively.

Encoder

Decoder

𝑀 = 𝑔(𝑈, 𝑉), rating matrix 𝑀 of shape 𝑁_𝑢 × 𝑁_𝑣

𝒁 = 𝑓(𝑿,𝑨) 𝑨 = 𝑔(𝒁)

𝐺(𝑿,𝑨) GAE for the link prediction task

𝐺(𝓦,𝓔, 𝓡) 𝑼, 𝑽 = 𝑓(𝑿,𝑴_𝟏, … ,𝑴_𝑹) 𝑴 = 𝑔(𝑼,𝑽) GAE for the bipartite recommender

(74)

Network analysis

3. Matrix completion

• GAE for the bipartite recommender graphs, 𝑮(𝓦, 𝓔, 𝓡)

𝐺(𝓦,𝓔, 𝓡) 𝑼, 𝑽 = 𝑓(𝑿,𝑴_𝟏, … ,𝑴_𝑹) 𝑴 = 𝑔(𝑼,𝑽)

(75)

Network analysis

3. Matrix completion

ℎ_𝑖 = 𝜎 𝑎𝑐𝑐𝑢𝑚

𝑗∈𝒩_𝑖,1

𝜇_{𝑗→𝑖,1}, … ,

𝑗∈𝒩_𝑖,𝑅

𝜇_{𝑗→𝑖,𝑅} Encoder

𝑢_𝑖 = 𝜎 𝑊ℎ_𝑖 : the final user embedding

𝜇_{𝑗→𝑖,𝑟} = _𝑐¹

𝑖𝑗𝑊_𝑟𝑥_𝑗 : Message function from item 𝑗 to user i 𝑐_𝑖𝑗 = 𝒩_𝑖 𝒩_𝑗 : intermediate node state

(76)

Network analysis

3. Matrix completion

Decoder

𝑝 𝑀_𝑖𝑗 = 𝑟 = 𝑒^𝑢^𝑖^𝑇^𝑄^𝑟^𝑣^𝑗

𝑠∈𝑅𝑒^𝑢^𝑖^𝑇^𝑄^𝑠^𝑣^𝑗

𝑀_𝑖𝑗 = 𝑔 𝑢_𝑖, 𝑣_𝑗 = 𝔼_𝑝_𝑀_𝑖𝑗_=𝑟 𝑟 =

𝑟∈𝑅

𝑟 ⋅ 𝑝( 𝑀_𝑖𝑗 = 𝑟)

: probability that rating 𝑀_𝑖𝑗 is 𝑟

: expected rating Loss function

ℒ = −

𝑖,𝑗 𝑟=1 𝑅

𝐼 𝑟 = 𝑀_𝑖𝑗 log 𝑝 𝑀_𝑖𝑗 = 𝑟 𝐼 𝑘 = 𝑙 = 1, when 𝑘 = 𝑙 and 0 otherwise

(77)

Network analysis

3. Matrix completion

(78)

Molecular applications

: Which kinds of datasets exist?

• Bioactive molecules with drug-like properties

• ~1,828,820 compounds

• https://www.ebi.ac.uk/chembldb/

 Drugs and targets

 FDA approved, investigational, experimental, …

 7,713 (all drugs), 4,115 (targets), …

 https://drugbank.ca/

 Drugs and targets

 FDA approved, investigational, experimental, …

 7,713 (all drugs), 4,115 (targets), …

 https://drugbank.ca/

(79)

Molecular applications

: Which kinds of datasets exist?

Tox21 Data Challenge (@ Kaggle)

• 12 types of toxicity

• Molecule species (represented with SMILES) and toxicity labels are given

• But too small to train a DL model

https://tripod.nih.gov/tox21/challenge/data.jsp

(80)

Molecular applications

1. Neural molecular fingerprint

Hash function have been used to generate molecular fingerprints.

* Molecular fingerprint : a vector representation of molecular substructures

https://mc.ai/machine-learning-for-drug-discovery-in-a-nutshell%E2%80%8A-%E2%80%8Apart-ii/

(81)

Molecular applications

1. Neural molecular fingerprint

http://kehang.github.io/basic_project/2017/04/18/machine-learning-in-molecular-property-prediction/

Such molecular fingerprints can be easily obtained by open source packages, e.g.) RDKit.

(82)

Molecular applications

1. Neural molecular fingerprint

http://kehang.github.io/basic_project/2017/04/18/machine-learning-in-molecular-property-prediction/

In recent days, neural fingerprints generated by graph convolutional network is widely used for more accurate molecular property predictions.

(83)

Molecular applications

1. Neural molecular fingerprint

Duvenaud, David K., et al.

"Convolutional networks on graphs for learning molecular fingerprints." Advances in neural information processing systems. 2015.

(84)

Molecular applications

1. Neural molecular fingerprint

(85)

Molecular applications

1. Neural molecular fingerprint

(86)

Molecular applications

1. Neural molecular fingerprint

(87)

Molecular applications

2. Quantitative Structure-Property Relationships (QSPR)

Ryu, Seongok, Jaechang Lim, and Woo Youn Kim.

"Deeply learning molecular structure-property relationships using graph attention neural network." arXiv preprint arXiv:1805.10988 (2018).

(88)

Molecular applications

2. Quantitative Structure-Property Relationships (QSPR)

(89)

Molecular applications

2. Quantitative Structure-Property Relationships (QSPR)

Final node states

The neural network recognizes several functional groups differently

Learning solubility of molecules

(90)

Molecular applications

2. Quantitative Structure-Property Relationships (QSPR)

Final node states

Learning photovoltaic efficiency (QM phenomena)

Interestingly, The NN also can differentiate nodes according to the quantum mechanical characteristics.

(91)

Molecular applications

2. Quantitative Structure-Property Relationships (QSPR)

Graph features

Learning solubility of molecules

Similar molecules are located closely in the graph latent space

𝒁_𝒊 − 𝒁_𝒋 ^𝟐 Ascending

order

(92)

Molecular applications

3. Molecular generative model

Motivation : de novo molecular design

• Chemical space is too huge

: only 10⁸ molecules have been

synthesized as potential drug candidates, whereas it is estimated that there are 10²³ to 10⁶⁰ molecules.

• Limitation of virtual screening

(93)

Molecular applications

3. Molecular generative model

Molecules can be represented as strings according to defined rules

Simplified Molecule Line-Entry System (SMILES)

Molecule Graph

(94)

Molecular applications

3. Molecular generative model

Gómez-Bombarelli, Rafael, et al. "Automatic chemical design using a data-driven continuous representation of molecules." ACS central science4.2 (2018): 268-276.

Segler, Marwin HS, et al. "Generating focused molecule libraries for drug discovery with recurrent neural networks." ACS central science4.1 (2017): 120-131.

Many SMILES-based generative models exist

(95)

Molecular applications

3. Molecular generative model

SMILES representation also has a fatal problem that

small changes of structure can lead to quite different expressions.

→ Difficult to reflect topological information of molecules

(96)

Molecular applications

3. Molecular generative model

Literatures

• Li, Y., Vinyals, O., Dyer, C., Pascanu, R., & Battaglia, P. (2018). Learning deep generative models of graphs.

arXiv preprint arXiv:1803.03324.

• Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph Generation." arXiv preprint arXiv:1802.04364 (2018).

• Constrained Graph Variational Autoencoders for Molecule Design." arXiv preprint arXiv:1805.09076 (2018).

• "GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders." arXiv preprint arXiv:1802.03480 (2018).

• De Cao, Nicola, and Thomas Kipf. "MolGAN: An implicit generative model for small molecular graphs."

arXiv preprint arXiv:1805.11973 (2018).

• …

(97)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

"Learning deep generative models of graphs." arXiv preprint arXiv:1803.03324 (2018).

(98)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

1. Sample whether to add a new node of a particular type or terminate : if a node type is chosen 2. Add a node of this type to the graph

3. Check if any further edges are needed to connect the new node to the existing graph

4. If yes, select a node in the graph and add an edge connecting the new to the selected node.

(99)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

• Determine that add a node or not

(100)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

• If a node is added, determine that add edges between current node and other nodes.

(101)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

(102)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

Graph propagation process : readout all node states and generating a graph feature 𝒉_𝑮 =

𝑣∈𝒱

𝒉_𝑽^𝑮 𝒉_𝑮 =

𝑣∈𝒱

𝒈_𝒗^𝑮 ⊙ 𝒉_𝑣^𝐺

or 𝒈_𝒗^𝑮 = 𝜎 𝑔_𝑚 𝒉_𝒗

𝒉_𝒗^′ = 𝑓_𝑛 𝒂_𝒗, 𝒉_𝒗 ∀𝑣 ∈ 𝒱 𝒂_𝒗 =

𝑢,𝑣∈ℰ

𝑓_𝑒 𝒉_𝒖, 𝒉_𝒗, 𝒙_𝒖,𝒗 ∀𝑣 ∈ 𝒱

(103)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

Add node : a step to decide whether or not to add a node 𝒇_{𝒂𝒅𝒅𝒏𝒐𝒅𝒆} 𝑮 = softmax 𝑓_𝑎𝑛 𝒉_𝑮

If “yes”

The new node vectors 𝒉_𝑽^(𝑻) are carried over to the next step

(104)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

Add edge : a step to add an edge to the new node

𝑓_{𝑎𝑑𝑑𝑒𝑑𝑔𝑒} 𝐺, 𝑣 = 𝜎 𝑓_𝑎𝑒 𝒉_𝑮, 𝒉_𝒗^(𝑻) : the probability of adding an edge to the newly created node 𝑣.

𝑓_{𝑛𝑜𝑑𝑒𝑠} 𝐺, 𝑣 = softmax 𝑓_𝑠 𝒉_𝒖^𝑻 , 𝒉_𝒗^(𝑻) If “yes”

: Score for each node to connect the edges

(105)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

Objective function

𝑝(𝐺) : marginal likelihood

permutation 𝑝 𝐺 =

𝜋∈𝒫(𝐺)

𝑝(𝐺, 𝜋) = 𝔼_{𝑞 𝜋|𝐺} 𝑝(𝐺, 𝜋) 𝑞 𝜋|𝐺

Following all possible permutations is intractable

→ samples from data : 𝑞 𝜋|𝐺 ≈ 𝑝_{𝑑𝑎𝑡𝑎}(𝜋|𝐺)

𝔼_𝑝_{𝑑𝑎𝑡𝑎}_(𝐺,𝜋) log 𝑝(𝐺, 𝜋) = 𝔼_𝑝_{𝑑𝑎𝑡𝑎}_(𝐺)𝔼_𝑝_{𝑑𝑎𝑡𝑎} _𝜋|𝐺 log 𝑝(𝐺, 𝜋)

(106)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

(107)

Molecular applications

3. Molecular generative model

Li, Yujia, et al.

(108)

Interacting physical system

Kipf, Thomas, et al.

"Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).

• Interacting systems

• Nodes : particles, Edges : (physical) interaction between particle pairs

• Latent code : the underlying interaction graph

(109)

Interacting physical system

Input graph : 𝓖 = 𝓥, 𝓔 with vertices 𝑣 ∈ 𝓥 and edges 𝑒 = (𝑣, 𝑣^′) ∈ ℰ 𝑣 → 𝑒 ∶ 𝒉_(𝒊,𝒋)^𝒍 = 𝑓_𝑒^𝑙 𝒉_𝒊^𝒍, 𝒉_𝒋^𝒍, 𝒙 _𝒊,𝒋

𝑒 → 𝑣 ∶ 𝒉_𝒋^𝒍+𝟏 = 𝑓_𝑣^𝑙

𝑖∈𝒩_𝑗

𝒉^𝒍_𝒊,𝒋 , 𝒙_𝒋 𝒙_𝒊 : initial node features, 𝒙 _𝒊,𝒋 : initial edge features

(110)

Interacting physical system

(111)

Interacting physical system

𝑞_𝜙 𝑧_𝑖𝑗 𝑥 = softmax 𝑓_{𝑒𝑛𝑐,𝜙} 𝑥 _{𝑖𝑗,1:𝐾} = softmax ℎ_(𝑖,𝑗)² 𝒉_𝒋^𝟏 = 𝑓_𝑒𝑚𝑏(𝒙_𝒋)

𝒉_(𝒊,𝒋)^𝟏 = 𝑓_𝑒¹ 𝒉_𝒊^𝟏, 𝒉_𝒋^𝟏 𝒉_𝒋^𝟐 = 𝑓_𝑣¹

𝑖≠𝑗𝒉_(𝒊,𝒋)^𝟏 𝒉_(𝒊,𝒋)^𝟐 = 𝑓_𝑒² 𝒉_𝒊^𝟐, 𝒉_𝒋^𝟐 Encoder

(112)

Interacting physical system

Decoder

𝒉_(𝒊,𝒋)^𝒕 =

𝑘

𝑧_{𝑖𝑗,𝑘} 𝑓_𝑒^𝑘 𝒉_𝒊^𝒕, 𝒉_𝒋^𝒕

𝑝 𝒙^𝒕+𝟏|𝒙^𝒕, 𝒛 = 𝒩 𝝁^𝒕+𝟏, 𝜎²𝐈 𝝁_𝒋^𝒕+𝟏 = 𝒙_𝒋^𝒕 + 𝑓_𝑜𝑢𝑡 𝒉_𝒋^𝒕+𝟏

𝒉_𝒋^𝒕+𝟏 = 𝐺𝑅𝑈 𝑀𝑆𝐺_𝑗^𝑡, 𝒙_𝒋^𝒕 , 𝒉_𝒋^𝒕 𝑀𝑆𝐺_𝑗^𝑡 =

𝑖≠𝑗𝒉_(𝒊,𝒋)^𝒕

(113)

Interacting physical system

ℒ = 𝔼_𝑞_𝜙 _𝒛|𝒙 log 𝑝_𝜃 𝒙|𝒛 − KL 𝑞_𝜙 𝒛|𝒙 ||𝑝(𝒛)

𝑝_𝜃 𝒙|𝒛 = _𝑡=1^𝑇 𝑝_𝜃 𝒙^𝒕+𝟏|𝒙^𝒕, … , 𝒙^𝟏|𝒛 = _𝑡=1^𝑇 𝑝_𝜃 𝒙^𝒕+𝟏|𝒙^𝒕|𝒛 , since the dynamics is Markovian.

𝑝 𝒛 = _𝒊≠𝒋𝑝_𝜃 𝒛_𝒊𝒋 , the prior is a factorized uniform distribution over edge types

(114)

Interacting physical system

(115)

Interacting physical system

(116)

References

Geometric Deep Learning and Surveys on Graph Neural Networks

https://github.com/SeongokRyu/Graph-neural-networks

• Bronstein, Michael M., et al. "Geometric deep learning: going beyond euclidean data." IEEE Signal Processing Magazine 34.4 (2017): 18-42.

• [NIPS 2017] Tutorial - Geometric deep learning on graphs and manifolds, https://nips.cc/Conferences/2017/Schedule?showEvent=8735

• Goyal, Palash, and Emilio Ferrara. "Graph embedding techniques, applications, and performance: A survey."

Knowledge-Based Systems 151 (2018): 78-94., https://github.com/palash1992/GEM

• Battaglia, Peter W., et al. "Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).

• Awesome Graph Embedding And Representation Learning Papers, https://github.com/benedekrozemberczki/awesome-graph-embedding

(117)

References

Graph Convolutional Network (GCN)

• Defferrard, Michaël, Xavier Bresson, and Pierre Vandergheynst. "Convolutional neural networks on graphs with fast localized spectral filtering." Advances in Neural Information Processing Systems. 2016.

• Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).

• van den Berg, Rianne, Thomas N. Kipf, and Max Welling. "Graph Convolutional Matrix Completion." stat 1050 (2017): 7.

• Schlichtkrull, Michael, et al. "Modeling relational data with graph convolutional networks." European Semantic Web Conference. Springer, Cham, 2018.

• Levie, Ron, et al. "Cayleynets: Graph convolutional neural networks with complex rational spectral filters." arXiv preprint arXiv:1705.07664 (2017).

(118)

References

Attention mechanism in GNN

• Velickovic, Petar, et al. "Graph attention networks." arXiv preprint arXiv:1710.10903 (2017).

• GRAM: Graph-based Attention Model for Healthcare Representation Learning

• Shang, C., Liu, Q., Chen, K. S., Sun, J., Lu, J., Yi, J., & Bi, J. (2018). Edge Attention-based Multi-Relational Graph Convolutional Networks. arXiv preprint arXiv:1802.04944.

• Lee, John Boaz, et al. "Attention Models in Graphs: A Survey." arXiv preprint arXiv:1807.07984 (2018).

Message Passing Neural Network (MPNN)

• Li, Yujia, et al. "Gated graph sequence neural networks." arXiv preprint arXiv:1511.05493 (2015).

• Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212.