, Developer Relations Manager, NVIDIA Korea, 2021 3
GPU ,
NVIDIA DGX SUPERPOD
2
132
54 36
26
10 5
중간관리 팀장
실무직 임원 기타 대표
3
63
22
13 7
2 7 40
36 31
17
12 4 3 6
IT기획 영업
경영|전략 교육|인사 마케팅 회계|재무
시스템운영
엔지니어|프로그래머 연구|개발
네트워크 보안
고객지원|서비스 데이터처리|분석
기타
4
GPU / ( )
1~2 GPU
4~8 GPU
20 + 1~2 GPU
GPU
4~8 GPU
20 +
GPU
5
NVIDIA DGX FAMILY
DGX A100
A100 8-GPU 320GB/640GB DGX POD/SuperPOD
4x/8x/…/20x/…/140x DGX A100
DGX Station A100
A100 4-GPU 160GB/320GB
w/ NVIDIA A100 GPU 40GB/80GB
OEM OEM OEM
6
– GPU
GPT-3 ?
( – - )
?
OpenAI ( )
GPT
GPT-1 (2018 ; 1 1 7 )
GPT-2 (2019 ; 15 )
GPT-3 (2020 ; 1750 )
7
NVIDIA Maxine (AI-powered Video Conferencing Platform)
Inserting video: Insert/Video/Video from File.
Insert video by browsing your directory and selecting OK.
File types that works best in PowerPoint are mp4 or wmv
8
NVIDIA Clara Guardian (Virtual Patient Assistan)
Inserting video: Insert/Video/Video from File.
Insert video by browsing your directory and selecting OK.
File types that works best in PowerPoint are mp4 or wmv
9
DEEP LEARNING NETWORK MODEL
ResNet vs. GPT
Parameter # of ResNet50
Model GPU Memory Parameter # of GPT-2
: Paper, An End-to-End Framework for Constrained Deep Learning Model Optimization, Jan 2021
: Paper, Megatron LM: Training Multi-Billion Parameter Language Models Using Model Parallelism, Mar 202010
(NLP; Natural Language Processing)
Model
2017 2018 2019 2020 2021
GPT-3 175 Bn
2017 2018 2019 2020
Turing-NLG 17 Bn
BERT 340 M Transformer
65 M
GPT-2 8B 8.3 Bn
# Paramet ers (Log sc ale)
11
GPU GPU MEMORY
최신 NVIDIA Data Center GPU 카드, GPU Memory 기준 정렬
Training/HPC/Inference Training/HPC Graphic Inference
GPU Memory
(GB)
GPU Memory
종류 A100 SXM4 A100 PCIe V100 V100S A40 RTX 8000 RTX 6000 T4
80 HBM2 6912 / 432 / 400 6912 / 432 / 250
48 GDDR6 10752 / 336 / 300 4608 / 576 / 250
40 HBM2 6912 / 432 / 400 6912 / 432 / 250
32 HBM2 5120 / 640 / 250 5120 / 640 / 300
32 GDDR5
24 GDDR6 4608 / 576 / 250
24 GDDR6X
16 HBM2 5120 / 640 / 250 5120 / 640 / 300
16 GDDR6 2560 / 320 / 70
10 GDDR6X
: <CUDA Core > / <Tensor Core > / < >
12
MODEL
Model GPU 1 Memory
: Data Parallel Training
: Paper, Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platfrom, Oct 201913
13Untrained
Neural Network Model
Deep Learning
Framework
TRAINING
Learning a new capability from existing data
App or Service
Featuring Capability
INFERENCE
Applying this capability to new data
Trained Model
Optimized for Performance
DEEP LEARNING APPLICATION DEVELOPMENT
Trained Model
New Capability
14
15
16
Darken Sharpen
Brighten Blur
Original Image
0 0 0
0 1.5 0
0 0 0
0 0 0
0 0.5 0
0 0 0
.06 .13 .06 .13 .25 .13 .06 .13 .06
0 -1 0 -1 5 -1
0 -1 0
17
(28, 28,1) Image Input
(3, 3, 1, 2) Kernels
(28, 28, 2) Stacked Images
(3, 3, 2, 2) Kernels
(28, 28, 2) Stacked Images
(1568)
Flattened Image Vector
… …
… …
Output Prediction (10) (512)
Dense (512)
Dense
18
IMAGENET 1,000
(dog)
: imagenet1000_clsidx_to_labels.txt (https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a)
19
( ) vs.
Image Classification
20
GPT-3
GPT-3
21
21Untrained
Neural Network Model
Deep Learning
Framework
TRAINING
Learning a new capability from existing data
App or Service
Featuring Capability
INFERENCE
Applying this capability to new data
Trained Model
Optimized for Performance
DEEP LEARNING APPLICATION DEVELOPMENT
Trained Model
New Capability
22
MODEL
Model GPU 1 Memory
: Data Parallel Training
: Paper, Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platfrom, Oct 201923
MODEL
Model GPU
: Data Parallel Training
: Paper, Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platfrom, Oct 201924
DATA PARALLELISM
Data Parallelism ?
GPU GPU weight Deep Learning Frameworks
) TensorFlow, PyTorch, Keras
: Paper, Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platfrom, Oct 2019
25
MODEL
Data Parallelism *Model Parallelism*
GPT-3
Parameter # = 175B
2017 2018 2019 2020 2021
GPT-3 175 Bn
2017 2018 2019 2020
Turing-NLG 17 Bn
BERT 340 M Transformer
65 M
GPT-2 8B 8.3 Bn
# Paramet ers (Log sc ale)
Training/HPC/Inference Training/HPC GPU
Memory (GB)
GPU Memory
종류 A100 SXM4 A100 PCIe V100 V100S
80 HBM2 6912 / 432 / 400 6912 / 432 / 250
48 GDDR6
40 HBM2 6912 / 432 / 400 6912 / 432 / 250
32 HBM2 5120 / 640 / 250 5120 / 640 / 300
32 GDDR5
24 GDDR6
24 GDDR6X
16 HBM2 5120 / 640 / 250 5120 / 640 / 300
16 GDDR6
10 GDDR6X
Data Center GPU
GPU Memory
26
MODEL PARALLELISM
Model Parallelism ?
Model GPU ( . Layer ) Deep Learning Framework
Data Scientist
: Paper, Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platfrom, Oct 2019