Recurrent
Neural
Network
Wanho Choi
Handling Sequence Data
•
For example, speech recognition
•
FCNN, CNN, AE, and GAN?
•
RNN!
•
The current state ➜ the next state
y
1
x
1
y
2
x
2
…
y
n
x
n
h
1
h
2
h
n−1
output
hidden
input
h
t
= f( h
t−1
, x
t
)
x
t
fold
unfold
: sequence length
n
y
0
x
0
x
1
…
Vanilla RNN Structure
y
1
y
t
h
1
h
2
h
n−1
y
n−1
x
n−1
h
t
x
t
fold
unfold
: sequence length
n
y
0
x
0
x
1
…
Vanilla RNN Weights
W
hy
W
hy
W
hh
W
hh
W
hh
W
xh
W
xh
y
1
y
t
h
1
h
2
h
n−1
y
n−1
x
n−1
W
hy
W
xh
h
t
W
hh
W
hy
W
xh
x
t
h
t
= tanh( W
hh
h
t−1
+ W
xh
x
t
)
y
t
= W
hy
h
t
W
hh
W
hy
W
xh
y
t
tanh +Vanilla RNN Computation
How to use RNN in PyTorch
rnn = torch.nn.RNN( input_size, hidden_size )
outputs, status = rnn( input_data )
<creation>
<run>
x
t
y
t
input shape = ( batch_size, sequence_length, input_size )
output shape = ( batch_size, sequence_length, output_size )
(= hidden)
Example: ‘hello’
import torch
import numpy as np
torch.manual_seed(0) # to make results deterministic and reproducible
unique_characters = ['h', 'e', 'l', ‘o'] INPUT_SIZE = len(unique_characters) # = 4 HIDDEN_SIZE = len(unique_characters) # = 4 x_data = [[0, 1, 2, 2]] # hell # batch_size = 1 x_one_hot = [[[1, 0, 0, 0], # ‘h' [0, 1, 0, 0], # ‘e' [0, 0, 1, 0], # ‘l' [0, 0, 1, 0]]] # ‘l' y_data = [[1, 2, 2, 3]] # ello x = torch.FloatTensor(x_one_hot) y = torch.LongTensor(y_data)
1/2
Example: ‘hello’
# batch_first guarantees the order = (batch_size, sequence_length, data_size)
model = torch.nn.RNN(INPUT_SIZE, HIDDEN_SIZE, batch_first=True)
CostFunc = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1) for i in range(10):
outputs, _ = model(x)
cost = CostFunc(outputs.view(-1, INPUT_SIZE), y.view(-1)) cost.backward()
optimizer.step()
optimizer.zero_grad()
result = outputs.data.numpy().argmax(axis=2)
result = ''.join([unique_characters[c] for c in np.squeeze(result)]) print(result) # ello
Example: ‘hello’
one hot encoding
‘h’ = [1,0,0,0]
‘e’ = [0,1,0,0]
‘l’ = [0,0,1,0]
‘o’ = [0,0,0,1]
RNN is flexible.
one-to-one
one-to-many
many-to-one
many-to-many
vanilla RNN
image captioning
video captioning
image → sentence
(seq. of words)
language modeling
sentence →sentiment
translation
Stacked RNN
time
depth
RNN Applications
•
Temporal analysis
: time-series anomaly detection and time-series prediction
•
Computer vision
: image description, video tagging and video analysis
•
NLP
: sentiment analysis, speech recognition, language modeling, machine translation and text
generation
https://missinglink.ai/guides/neural-network-concepts/cnn-vs-rnn-neural-network-right/
CNN vs RNN: Which Neural Network Is Right for You?
RNN CNN Hybrids
Object Recognition using RCNN
Advanced RNN
•
LSTM (Long Short Term Memory)
•
GRU
https://www.youtube.com/watch?v=-SHPG_KMUkQ