• 검색 결과가 없습니다.

Lecture 8: Deep Learning Software

N/A
N/A
Protected

Academic year: 2022

Share "Lecture 8: Deep Learning Software"

Copied!
159
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 1 April 27, 2017

Lecture 8:

Deep Learning Software

(2)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 2 2 April 27, 2017

Administrative

- Project proposals were due Tuesday

- We are assigning TAs to projects, stay tuned - We are grading A1

- A2 is due Thursday 5/4

- Remember to stop your instances when not in use

- Only use GPU instances for the last notebook

(3)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Regularization: Add noise, then marginalize out

3

Last time

Optimization: SGD+Momentum, Nesterov, RMSProp, Adam

Regularization: Dropout

Image Conv-64 Conv-64 MaxPool Conv-128 Conv-128 MaxPool Conv-256 Conv-256 MaxPool Conv-512 Conv-512 MaxPool Conv-512 Conv-512 MaxPool FC-4096 FC-4096 FC-C

Freeze these Reinitialize this and train

Train Test

Transfer Learning

(4)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 4 April 27, 2017

Today

- CPU vs GPU

- Deep Learning Frameworks - Caffe / Caffe2

- Theano / TensorFlow - Torch / PyTorch

4

(5)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 5 April 27, 2017

CPU vs GPU

(6)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 6 April 27, 2017

My computer

6

(7)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 7 April 27, 2017

Spot the CPU!

(central processing unit)

This image is licensed under CC-BY 2.0

7

(8)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 8 April 27, 2017

Spot the GPUs!

(graphics processing unit)

This image is in the public domain

8

(9)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 9 April 27, 2017

NVIDIA vs AMD

9

(10)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017

NVIDIA vs AMD

10

(11)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 11 April 27, 2017

CPU vs GPU

# Cores Clock Speed Memory Price

CPU

(Intel Core i7-7700k)

4

(8 threads with hyperthreading )

4.4 GHz Shared with system $339

CPU

(Intel Core i7-6950X)

10

(20 threads with

hyperthreading )

3.5 GHz Shared with system $1723

GPU (NVIDIA Titan Xp)

3840 1.6 GHz 12 GB GDDR5X $1200

GPU (NVIDIA GTX 1070)

1920 1.68 GHz 8 GB GDDR5 $399

CPU: Fewer cores, but each core is much faster and much more

capable; great at sequential tasks

GPU: More cores, but each core is much slower and

“dumber”; great for parallel tasks

11

(12)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 12 April 27, 2017

Example: Matrix Multiplication

A x B

B x C

A x C

=

12

(13)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 13 April 27, 2017

Programming GPUs

● CUDA (NVIDIA only)

○ Write C-like code that runs directly on the GPU

○ Higher-level APIs: cuBLAS, cuFFT, cuDNN, etc

● OpenCL

○ Similar to CUDA, but runs on anything

○ Usually slower :(

● Udacity: Intro to Parallel Programming https://www.udacity.com/course/cs344

○ For deep learning just use existing libraries

13

(14)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 14 April 27, 2017

CPU vs GPU in practice

Data from https://github.com/jcjohnson/cnn-benchmarks

(CPU performance not

well-optimized, a little unfair)

66x 67x 71x 64x 76x

(15)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 15 April 27, 2017

CPU vs GPU in practice

Data from https://github.com/jcjohnson/cnn-benchmarks

cuDNN much faster than

“unoptimized” CUDA

2.8x 3.0x 3.1x 3.4x 2.8x

(16)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 16 April 27, 2017

CPU / GPU Communication

Model is here

Data is here

(17)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 17 April 27, 2017

CPU / GPU Communication

Model is here

Data is here

If you aren’t careful, training can bottleneck on reading data and transferring to GPU!

Solutions:

- Read all data into RAM - Use SSD instead of HDD - Use multiple CPU threads

to prefetch data

(18)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 18 April 27, 2017

Deep Learning

Frameworks

(19)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 19 April 27, 2017

Last year ...

Caffe

(UC Berkeley)

Torch

(NYU / Facebook)

Theano

(U Montreal)

TensorFlow

(Google)

(20)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 20 April 27, 2017

This year ...

Caffe

(UC Berkeley)

Torch

(NYU / Facebook)

Theano

(U Montreal)

TensorFlow

(Google)

Caffe2

(Facebook)

PyTorch

(Facebook)

CNTK

(Microsoft)

Paddle

(Baidu)

MXNet

(Amazon)

Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of choice at AWS

And others...

(21)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 21 April 27, 2017

Today

Caffe

(UC Berkeley)

Torch

(NYU / Facebook)

Theano

(U Montreal)

TensorFlow

(Google)

Caffe2

(Facebook)

PyTorch

(Facebook)

Mostly these A bit about these

CNTK

(Microsoft)

Paddle

(Baidu)

MXNet

(Amazon)

Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of choice at AWS

And others...

(22)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 22 April 27, 2017

Recall: Computational Graphs

x

W

hinge loss

R

+ L

s (scores)

*

(23)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 23 April 27, 2017

input image

loss

weights

Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.

Recall: Computational Graphs

(24)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 24 April 27, 2017

Recall: Computational Graphs

Figure reproduced with permission from a Twitter post by Andrej Karpathy.

input image

loss

(25)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 25 April 27, 2017

The point of deep learning frameworks

(1) Easily build big computational graphs

(2) Easily compute gradients in computational graphs

(3) Run it all efficiently on GPU (wrap cuDNN, cuBLAS, etc)

25

(26)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 26 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c Numpy

26

(27)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 27 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c Numpy

27

(28)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 28 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c Numpy

Problems:

- Can’t run on GPU - Have to compute our own gradients

28

(29)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 29 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c Numpy

TensorFlow

29

(30)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 30 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c

TensorFlow

Create forward

computational graph

30

(31)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 31 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c

TensorFlow

Ask TensorFlow to compute gradients

31

(32)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 32 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c

TensorFlow

Tell

TensorFlow to run on CPU

32

(33)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 33 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c

TensorFlow

Tell

TensorFlow to run on GPU

33

(34)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 34 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c

PyTorch

34

(35)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 35 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c

PyTorch

Define Variables to start building a

computational graph

35

(36)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 36 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c

PyTorch

Forward pass looks just like numpy

36

(37)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 37 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c

PyTorch

Calling c.backward() computes all

gradients

37

(38)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 38 April 27, 2017

Computational Graphs

x y z

*

a +

b Σ c

PyTorch

Run on GPU by casting to .cuda()

38

(39)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 39 April 27, 2017

PyTorch TensorFlow

Numpy

39

(40)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 40 April 27, 2017

TensorFlow

(more detail)

40

(41)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 41 April 27, 2017

TensorFlow:

Neural Net

Running example: Train a two-layer ReLU

network on random data with L2 loss

41

(42)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 42 April 27, 2017

TensorFlow:

Neural Net

(Assume imports at the top of each snipppet)

42

(43)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 43 April 27, 2017

TensorFlow:

Neural Net

43

First define

computational graph

Then run the graph many times

(44)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 44 April 27, 2017

TensorFlow:

Neural Net

Create placeholders for input x, weights w1 and w2, and targets y

44

(45)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 45 April 27, 2017

TensorFlow:

Neural Net

Forward pass: compute prediction for y and loss (L2 distance between y and y_pred)

No computation happens here - just building the graph!

45

(46)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 46 April 27, 2017

TensorFlow:

Neural Net

Tell TensorFlow to

compute loss of gradient with respect to w1 and w2.

Again no computation here - just building the graph

46

(47)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 47 April 27, 2017

TensorFlow:

Neural Net

Now done building our graph, so we enter a session so we can actually run the graph

47

(48)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 48 April 27, 2017

TensorFlow:

Neural Net

Create numpy arrays that will fill in the

placeholders above

48

(49)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 49 April 27, 2017

TensorFlow:

Neural Net

Run the graph: feed in the numpy arrays for x, y, w1, and w2; get

numpy arrays for loss, grad_w1, and grad_w2

49

(50)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 50 April 27, 2017

TensorFlow:

Neural Net

Train the network: Run the graph over and over, use gradient to update weights

50

(51)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 51 April 27, 2017

TensorFlow:

Neural Net

Train the network: Run the graph over and over, use gradient to update weights

Problem: copying

weights between CPU / GPU each step

51

(52)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 52 April 27, 2017

TensorFlow:

Neural Net

Change w1 and w2 from placeholder (fed on each call) to Variable (persists in the graph between calls)

52

(53)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 53 April 27, 2017

TensorFlow:

Neural Net

Add assign operations to update w1 and w2 as part of the graph!

53

(54)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 54 April 27, 2017

TensorFlow:

Neural Net

Run graph once to initialize w1 and w2

Run many times to train

54

(55)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 55 April 27, 2017

TensorFlow:

Neural Net

Problem: loss not going down! Assign calls not actually being executed!

55

(56)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 56 April 27, 2017

TensorFlow:

Neural Net

Add dummy graph node that depends on updates Tell graph to compute dummy node

56

(57)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 57 April 27, 2017

TensorFlow:

Optimizer

Can use an optimizer to compute gradients and update weights

Remember to execute the output of the optimizer!

57

(58)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 58 April 27, 2017

TensorFlow:

Loss

Use predefined common lossees

58

(59)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 59 April 27, 2017

TensorFlow:

Layers

Use Xavier initializer

tf.layers automatically sets up weight and (and bias) for us!

59

(60)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 60 April 27, 2017

Keras: High-Level Wrapper

Keras is a layer on top of

TensorFlow, makes common things easy to do

(Also supports Theano backend)

60

(61)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 61 April 27, 2017

Keras: High-Level Wrapper

Define model object as a sequence of layers

61

(62)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 62 April 27, 2017

Keras: High-Level Wrapper

Define optimizer object

62

(63)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 63 April 27, 2017

Keras: High-Level Wrapper

Build the model, specify loss function

63

(64)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 64 April 27, 2017

Keras: High-Level Wrapper

Train the model with a single line!

64

(65)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 65 April 27, 2017

TensorFlow: Other High-Level Wrappers

Keras

(https://keras.io/)

TFLearn

(http://tflearn.org/)

TensorLayer

(http://tensorlayer.readthedocs.io/en/latest/)

tf.layers

(https://www.tensorflow.org/api_docs/python/tf/layers)

TF-Slim

(https://github.com/tensorflow/models/tree/master/inception/inception/slim)

tf.contrib.learn

(https://www.tensorflow.org/get_started/tflearn)

Pretty Tensor

(https://github.com/google/prettytensor)

65

(66)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 66 April 27, 2017

TensorFlow: Other High-Level Wrappers

Keras

(https://keras.io/)

TFLearn

(http://tflearn.org/)

TensorLayer

(http://tensorlayer.readthedocs.io/en/latest/)

tf.layers

(https://www.tensorflow.org/api_docs/python/tf/layers)

TF-Slim

(https://github.com/tensorflow/models/tree/master/inception/inception/slim)

tf.contrib.learn

(https://www.tensorflow.org/get_started/tflearn)

Pretty Tensor

(https://github.com/google/prettytensor)

Ships with TensorFlow

66

(67)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 67 April 27, 2017

TensorFlow: Other High-Level Wrappers

Keras

(https://keras.io/)

TFLearn

(http://tflearn.org/)

TensorLayer

(http://tensorlayer.readthedocs.io/en/latest/)

tf.layers

(https://www.tensorflow.org/api_docs/python/tf/layers)

TF-Slim

(https://github.com/tensorflow/models/tree/master/inception/inception/slim)

tf.contrib.learn

(https://www.tensorflow.org/get_started/tflearn)

Pretty Tensor

(https://github.com/google/prettytensor)

Ships with TensorFlow

From Google

67

(68)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 68 April 27, 2017

TensorFlow: Other High-Level Wrappers

Keras

(https://keras.io/)

TFLearn

(http://tflearn.org/)

TensorLayer

(http://tensorlayer.readthedocs.io/en/latest/)

tf.layers

(https://www.tensorflow.org/api_docs/python/tf/layers)

TF-Slim

(https://github.com/tensorflow/models/tree/master/inception/inception/slim)

tf.contrib.learn

(https://www.tensorflow.org/get_started/tflearn)

Pretty Tensor

(https://github.com/google/prettytensor)

Sonnet

(https://github.com/deepmind/sonnet)

Ships with TensorFlow

From Google

From DeepMind

68

(69)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 69 April 27, 2017

TensorFlow: Pretrained Models

TF-Slim

: (https://github.com/tensorflow/models/tree/master/slim/nets)

Keras:

(https://github.com/fchollet/deep-learning-models)

69

Image Conv-64 Conv-64 MaxPool Conv-128 Conv-128 MaxPool Conv-256 Conv-256 MaxPool Conv-512 Conv-512 MaxPool Conv-512 Conv-512 MaxPool FC-4096 FC-4096 FC-C

Freeze these Reinitialize this and train Transfer Learning

(70)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 70 April 27, 2017

TensorFlow: Tensorboard

Add logging to code to record loss, stats, etc

Run server and get pretty graphs!

(71)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 71 April 27, 2017

TensorFlow: Distributed Version

https://www.tensorflow.org/deploy/distributed

Split one graph

over multiple

machines!

(72)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 72 April 27, 2017

Side Note: Theano

TensorFlow is similar in many ways to Theano (earlier

framework from Montreal)

(73)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 73 April 27, 2017

Side Note: Theano

Define symbolic variables (similar to TensorFlow placeholder)

(74)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 74 April 27, 2017

Side Note: Theano

Forward pass: compute predictions and loss

(75)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 75 April 27, 2017

Side Note: Theano

Forward pass: compute predictions and loss

(no computation performed yet)

(76)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 76 April 27, 2017

Side Note: Theano

Ask Theano to compute gradients for us

(no computation performed yet)

(77)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 77 April 27, 2017

Side Note: Theano

Compile a function that computes loss, scores, and gradients from data and weights

(78)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 78 April 27, 2017

Side Note: Theano

Run the function many times to train the network

(79)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 79 April 27, 2017

PyTorch

(Facebook)

79

(80)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 80 April 27, 2017

PyTorch: Three Levels of Abstraction

Tensor: Imperative ndarray, but runs on GPU

Variable: Node in a

computational graph; stores data and gradient

Module: A neural network layer; may store state or learnable weights

80

(81)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 81 April 27, 2017

PyTorch: Three Levels of Abstraction

Tensor: Imperative ndarray, but runs on GPU

Variable: Node in a

computational graph; stores data and gradient

Module: A neural network layer; may store state or learnable weights

Numpy array

Tensor, Variable, Placeholder

tf.layers, or TFSlim, or TFLearn, or Sonnet, or ….

TensorFlow equivalent

81

(82)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 82 April 27, 2017

PyTorch: Tensors

PyTorch Tensors are just like numpy arrays, but they can run on GPU.

No built-in notion of computational graph, or gradients, or deep learning.

Here we fit a two-layer net using PyTorch Tensors:

82

(83)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 83 April 27, 2017

PyTorch: Tensors

Create random tensors for data and weights

83

(84)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 84 April 27, 2017

PyTorch: Tensors

Forward pass: compute predictions and loss

84

(85)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 85 April 27, 2017

PyTorch: Tensors

Backward pass:

manually compute gradients

85

(86)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 86 April 27, 2017

PyTorch: Tensors

Gradient descent step on weights

86

(87)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 87 April 27, 2017

PyTorch: Tensors

To run on GPU, just cast tensors to a cuda datatype!

87

(88)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 88 April 27, 2017

PyTorch: Autograd

A PyTorch Variable is a node in a computational graph

x.data is a Tensor

x.grad is a Variable of gradients (same shape as x.data)

x.grad.data is a Tensor of gradients

88

(89)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 89 April 27, 2017

PyTorch: Autograd

PyTorch Tensors and Variables have the same API!

Variables remember how they were created (for backprop)

89

(90)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 90 April 27, 2017

PyTorch: Autograd

We will not want gradients (of loss) with respect to data

Do want gradients with respect to weights

90

(91)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 91 April 27, 2017

PyTorch: Autograd

Forward pass looks exactly the same as the Tensor version, but everything is a variable now

91

(92)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 92 April 27, 2017

PyTorch: Autograd

Compute gradient of loss with respect to w1 and w2 (zero out grads first)

92

(93)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 93 April 27, 2017

PyTorch: Autograd

Make gradient step on weights

93

(94)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 94 April 27, 2017

PyTorch: New Autograd Functions

Define your own autograd functions by writing forward and backward for Tensors

(similar to modular layers in A2)

94

(95)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 95 April 27, 2017

PyTorch: New Autograd Functions

Can use our new autograd function in the forward pass

95

(96)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 96 April 27, 2017

PyTorch: nn

Higher-level wrapper for working with neural nets

Similar to Keras and friends … but only one, and it’s good =)

96

(97)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 97 April 27, 2017

PyTorch: nn

Define our model as a sequence of layers

nn also defines common loss functions

97

(98)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 98 April 27, 2017

PyTorch: nn

Forward pass: feed data to model, and prediction to loss function

98

(99)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 99 April 27, 2017

PyTorch: nn

Backward pass:

compute all gradients

99

(100)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 0

PyTorch: nn

Make gradient step on each model parameter

100

(101)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 1

PyTorch: optim

Use an optimizer for different update rules

101

(102)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 2

PyTorch: optim

Update all parameters after computing gradients

102

(103)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 3

PyTorch: nn

Define new Modules

A PyTorch Module is a neural net layer; it inputs and outputs Variables Modules can contain weights (as Variables) or other Modules

You can define your own Modules using autograd!

103

(104)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 4

PyTorch: nn

Define new Modules

Define our whole model as a single Module

104

(105)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 5

PyTorch: nn

Define new Modules

Initializer sets up two children (Modules can contain modules)

105

(106)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 6

PyTorch: nn

Define new Modules

Define forward pass using child modules and

autograd ops on Variables No need to define

backward - autograd will handle it

106

(107)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 7

PyTorch: nn

Define new Modules

Construct and train an instance of our model

107

(108)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 8

PyTorch: DataLoaders

A DataLoader wraps a Dataset and provides minibatching, shuffling, multithreading, for you When you need to load custom data, just write your own Dataset class

108

(109)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 10 April 27, 2017 9

PyTorch: DataLoaders

Iterate over loader to form minibatches

Loader gives Tensors so you need to wrap in Variables

109

(110)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 11 April 27, 2017 0

PyTorch: Pretrained Models

Super easy to use pretrained models with torchvision https://github.com/pytorch/vision

110

(111)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 111 April 27, 2017

PyTorch: Visdom

This image is licensed under CC-BY 4.0; no changes were made to the image

Somewhat similar to

TensorBoard: add logging to your code, then

visualized in a browser Can’t visualize

computational graph structure (yet?)

https://github.com/facebookresearch/visdom

(112)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 112 April 27, 2017

Aside: Torch

Direct ancestor of PyTorch (they share a lot of C backend) Written in Lua, not Python

PyTorch has 3 levels of

abstraction: Tensor, Variable, and Module

Torch only has 2: Tensor, Module More details: Check 2016 slides

(113)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 113 April 27, 2017

Aside: Torch

Build a model as a sequence of layers, and a loss function

(114)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 114 April 27, 2017

Aside: Torch

Define a callback that inputs weights, produces loss and gradient on weights

(115)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 115 April 27, 2017

Aside: Torch

Forward: compute scores and loss

(116)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 116 April 27, 2017

Aside: Torch

Backward: compute gradient

(no autograd, need to pass grad_scores around)

(117)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 117 April 27, 2017

Aside: Torch

Pass callback to

optimizer over and over

(118)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 118 April 27, 2017

Torch vs PyTorch

Torch (-) Lua

(-) No autograd (+) More stable

(+) Lots of existing code (0) Fast

PyTorch (+) Python (+) Autograd

(-) Newer, still changing

(-) Less existing code

(0) Fast

(119)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 119 April 27, 2017

Torch vs PyTorch

Torch (-) Lua

(-) No autograd (+) More stable

(+) Lots of existing code (0) Fast

PyTorch (+) Python (+) Autograd

(-) Newer, still changing (-) Less existing code (0) Fast

Conclusion: Probably use PyTorch for new projects

(120)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 12 April 27, 2017 0

Static vs Dynamic Graphs

TensorFlow: Build graph once, then run many times (static)

PyTorch: Each forward pass defines a new graph (dynamic)

Build graph

Run each iteration

New graph each iteration

120

(121)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 121 April 27, 2017

Static vs Dynamic: Optimization

With static graphs, framework can

optimize the graph for you before it runs!

Conv ReLU

Conv ReLU

Conv ReLU

The graph you wrote

Conv+ReLU

Equivalent graph with fused operations

Conv+ReLU

Conv+ReLU

(122)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 122 April 27, 2017

Static vs Dynamic: Serialization

Once graph is built, can serialize it and run it without the code that built the graph!

Graph building and execution are intertwined, so always

need to keep code around

Static Dynamic

(123)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 12 April 27, 2017 3

Static vs Dynamic: Conditional

y = w1 * x if z > 0

w2 * x otherwise

123

(124)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 12 April 27, 2017 4

Static vs Dynamic: Conditional

y = w1 * x if z > 0

w2 * x otherwise

PyTorch: Normal Python

124

(125)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 12 April 27, 2017 5

Static vs Dynamic: Conditional

y = w1 * x if z > 0

w2 * x otherwise

PyTorch: Normal Python

TensorFlow: Special TF control flow operator!

125

(126)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 12 April 27, 2017 6

Static vs Dynamic: Loops

y t = (y t-1 + x t ) * w

y 0

x 1 x 2 x 3

+ * + * +

w

*

126

(127)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 12 April 27, 2017 7

Static vs Dynamic: Loops

y t = (y t-1 + x t ) * w

y 0

x 1 x 2 x 3

+ * + * +

w

*

PyTorch: Normal Python

127

(128)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 12 April 27, 2017 8

Static vs Dynamic: Loops

y t = (y t-1 + x t ) * w

PyTorch: Normal Python

TensorFlow: Special TF control flow

128

(129)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 129 April 27, 2017

Dynamic Graphs in TensorFlow

Looks et al, “Deep Learning with Dynamic Computation Graphs”, ICLR 2017 https://github.com/tensorflow/fold

TensorFlow Fold make dynamic graphs easier in TensorFlow

through dynamic batching

(130)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 130 April 27, 2017

Dynamic Graph Applications

Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015

Figure copyright IEEE, 2015. Reproduced for educational purposes.

- Recurrent networks

(131)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 131 April 27, 2017

Dynamic Graph Applications

- Recurrent networks - Recursive networks

The cat ate a big rat

(132)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 132 April 27, 2017

Dynamic Graph Applications

- Recurrent networks - Recursive networks - Modular Networks

What color is the cat?

find[cat]

query[color]

white

This image is in the public domain

Andreas et al, “Neural Module Networks”, CVPR 2016

Andreas et al, “Learning to Compose Neural Networks for Question Answering”, NAACL 2016

(133)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 133 April 27, 2017

Dynamic Graph Applications

- Recurrent networks - Recursive networks - Modular Networks

Are there more cats

than dogs?

find[cat]

count

no

find[dog]

count compare

This image is in the public domain

Andreas et al, “Neural Module Networks”, CVPR 2016

Andreas et al, “Learning to Compose Neural Networks for Question Answering”, NAACL 2016

(134)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 134 April 27, 2017

Dynamic Graph Applications

- Recurrent networks - Recursive networks - Modular Networks

- (Your creative idea here)

(135)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 13 April 27, 2017 5

Caffe

(UC Berkeley)

135

(136)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 13 April 27, 2017 6

● Core written in C++

● Has Python and MATLAB bindings

● Good for training or finetuning feedforward classification models

● Often no need to write code!

● Not used as much in research anymore, still popular for deploying models

Caffe Overview

136

(137)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 137 April 27, 2017

No need to write code!

1. Convert data (run a script) 2. Define net (edit prototxt)

3. Define solver (edit prototxt)

4. Train (with pretrained weights) (run a script)

Caffe: Training / Finetuning

(138)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 138 April 27, 2017

Caffe step 1: Convert Data

● DataLayer reading from LMDB is the easiest

● Create LMDB using convert_imageset

● Need text file where each line is

○ “[path/to/image.jpeg] [label]”

● Create HDF5 file yourself using h5py

(139)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 139 April 27, 2017

Caffe step 1: Convert Data

● ImageDataLayer: Read from image files

● WindowDataLayer: For detection

● HDF5Layer: Read from HDF5 file

● From memory, using Python interface

● All of these are harder to use (except Python)

(140)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 140 April 27, 2017

Caffe step 2: Define Network (prototxt)

(141)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 141 April 27, 2017

Caffe step 2: Define Network (prototxt)

https://github.com/KaimingHe/deep-residual-networks/blob/master/prototxt/ResNet-152-deploy.prototxt

● .prototxt can get ugly for big models

● ResNet-152 prototxt is 6775 lines long!

● Not “compositional”; can’t

easily define a residual

block and reuse

(142)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 142 April 27, 2017

Caffe step 3: Define Solver (prototxt)

● Write a prototxt file defining a SolverParameter

● If finetuning, copy existing solver.prototxt file

○ Change net to be your net

○ Change snapshot_prefix to your output

○ Reduce base learning rate (divide by 100)

○ Maybe change max_iter

and snapshot

(143)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 143 April 27, 2017

Caffe step 4: Train!

./build/tools/caffe train \ -gpu 0 \

-model path/to/trainval.prototxt \ -solver path/to/solver.prototxt \

-weights path/to/pretrained_weights.caffemodel

https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp

(144)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 144 April 27, 2017

Caffe step 4: Train!

./build/tools/caffe train \ -gpu 0 \

-model path/to/trainval.prototxt \ -solver path/to/solver.prototxt \

-weights path/to/pretrained_weights.caffemodel -gpu -1 for CPU-only

-gpu all for multi-gpu

https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp

(145)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 145 April 27, 2017

Caffe Model Zoo

AlexNet, VGG,

GoogLeNet, ResNet, plus others

https://github.com/BVLC/caffe/wiki/Model-Zoo

(146)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 146 April 27, 2017

Caffe Python Interface

Not much documentation…

Read the code! Two most important files:

● caffe/python/caffe/_caffe.cpp:

○ Exports Blob, Layer, Net, and Solver classes

● caffe/python/caffe/pycaffe.py

○ Adds extra methods to Net class

(147)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 147 April 27, 2017

Caffe Python Interface

Good for:

● Interfacing with numpy

● Extract features: Run net forward

● Compute gradients: Run net backward (DeepDream, etc)

● Define layers in Python with numpy (CPU only)

(148)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 148 April 27, 2017

● (+) Good for feedforward networks

● (+) Good for finetuning existing networks

● (+) Train models without writing any code!

● (+) Python interface is pretty useful!

● (+) Can deploy without Python

● (-) Need to write C++ / CUDA for new GPU layers

● (-) Not good for recurrent networks

● (-) Cumbersome for big networks (GoogLeNet, ResNet)

Caffe Pros / Cons

(149)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 14 April 27, 2017 9

Caffe2

(Facebook)

149

(150)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 150 April 27, 2017

Caffe2 Overview

● Very new - released a week ago =)

● Static graphs, somewhat similar to TensorFlow

● Core written in C++

● Nice Python interface

● Can train model in Python, then serialize and deploy without Python

● Works on iOS / Android, etc

(151)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 151 April 27, 2017

Google:

TensorFlow

Facebook:

PyTorch +Caffe2

“One framework

to rule them all” Research Production

(152)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 152 April 27, 2017

My Advice:

TensorFlow is a safe bet for most projects. Not perfect but has huge community, wide usage. Maybe pair with high-level wrapper (Keras, Sonnet, etc)

I think PyTorch is best for research. However still new, there can be rough patches.

Use TensorFlow for one graph over many machines

Consider Caffe, Caffe2, or TensorFlow for production deployment

Consider TensorFlow or Caffe2 for mobile

(153)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 153 April 27, 2017

Next Time:

CNN Architecture Case Studies

(154)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 154 April 27, 2017

(155)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 155 April 27, 2017

(156)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 156 April 27, 2017

Caffe step 2: Define Network (prototxt)

Layers and Blobs often have same name!

(157)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 157 April 27, 2017

Caffe step 2: Define Network (prototxt)

Layers and Blobs often have same name!

Learning rates (weight + bias) Regularization (weight + bias)

(158)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 158 April 27, 2017

Caffe step 2: Define Network (prototxt)

Layers and Blobs often have same name!

Learning rates (weight + bias) Regularization (weight + bias)

Number of output classes

(159)

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - April 27, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 8 - 159 April 27, 2017

Caffe step 2: Define Network (prototxt)

Layers and Blobs often have same name!

Learning rates (weight + bias) Regularization (weight + bias)

Number of output classes

Set these to 0 to freeze a layer

참조

관련 문서

The study carried out small group cooperative learning as an experimental group and lecture-type learning as a control group for 12 weeks and compared

[r]

[r]

[r]

인천에 하고 싶은 말 인천 예술회관역에 처음 가게를 열었을 때는 상권이 죽어있었는데, 지금은 많이 활기차졌다.. 지금 우리 가게에는

여성을 위한 교육훈련향상을 위한 활동은 노동시장 기능과 조치를 개선함 으로써 다른 조치들이 성공적으로 수행될 수 있는 원동력이 된다. 구직지원

Disaster & Risk Management Lecture

 Each Tesla multiprocessor consists of 8 streaming processors, which execute eight parallel threads per clock showing horizontally... NVIDIA Tesla