, NVIDIA Korea, 2021 1 20
NVIDIA DGX FAMILY
NVIDIA DGX Family
NVIDIA DATA CENTER GPU HISTORY
2017 2018 2019 2020
Volta Architecture Turing Architecture Ampere Architecture
V100 T4
RTX 6K RTX 8K
A100
NVIDIA DATA CENTER GPU
Workload Description
NVIDIA A100 SXM4 NVIDIA A100 PCIe NVIDIA T4 NVIDIA A40
World's Most Powerful Data Center GPU
Versatile Data Center GPU for Mainstream
Computing
World's Most Powerful Data Center GPU for
Visual Computing Deep Learning
Training
For the absolute fastest model training time
• 8-16 GPUs (for new installs)
80GB: For largest models (DLRM, GPT- 2 over 9.3Bn parameters in one node)
• 4-8 GPUs
Deep Learning Inference
For batch and real-time inference
• 1 GPU w/ MIG
80GB: 7MIGs at 10GB each for large batch size constrained models (RNN-T)
• 1 GPU w/ MIG • 1 GPU
HPC / AI
For scientific computing centers and higher ed and research institutions
• 4 GPUs with MIG for supercomputing centers
80GB: For largest datasets and high memory bandwidth applications
• 1-4 GPUs with MIG for higher ed and research
Render Farms For batch and real-time
rendering • 4-8 GPUs
Graphics
For the best graphics
performance on professional virtual workstations*
• 2–8 GPUs for mid-range virtual workstations for professional graphics
• 4-8 GPUs for mid and high-end professional graphics and RTX
workloads or simulation
Enterprise Acceleration
Mixed Workloads – Graphics, ML, DL, analytics, training, inference
• 1-4 with MIG for compute intensive multi-GPU workloads
80GB: data analytics with largest datasets
• 1-4 GPUs with MIG for compute intensive single GPU workloads
• 4-8 GPUs for balanced workloads*
• 2-4 GPUs for graphics intensive* and compute workloads
Edge solutions with differing 2-4 GPUs for graphics
CHOOSE THE RIGHT NVIDIA DATA CENTER GPU
A100 GPU 5
NVIDIA Ampere Architecture
World’s Largest 7nm chip 54B XTORS, HBM2
3
rdGen Tensor Cores
Faster, Flexible, Easier to use 20x AI Perf with TF32
2.5x HPC Perf
MIG
• A100 GPU, 7 GPU Slice
• : GPU
•
• A100 GPU 1
• , , , HPC
•
• GPU 1 19
• MIG, H/W , QoS
Multi-Instance GPU
A100 GPU 1 MIG
18 (MIG -Disabled 19 )
Slice #1 Slice #2 Slice #3 Slice #4 Slice #5 Slice #6 Slice #7 7
4 2 1
4 1 1 1
2 2 3
2 1 1 3
1 1 2 3
1 1 1 1 3
3 3
3 2 1
3 1 1 1
2 2 2 1
2 2 1 1 1
1 1 2 2 1
NVIDIA DGX FAMILY
DGX A100
A100 8-GPU 320GB/640GB DGX POD/SuperPOD
4x/8x/…/20x/…/140x DGX A100
DGX Station A100
A100 4-GPU 160GB/320GB
w/ NVIDIA A100 GPU 40GB/80GB
GPU / ( )
1~2 GPU
4~8 GPU
20 + 1~2 GPU
GPU
4~8 GPU
20 +
GPU
– 1~2 GPU
GPU 8 ?
InfiniBand ?
HPC/AI software stack …
… …
– 1~2 GPU
NVIDIA DGX Station A100
NVIDIA DGX STATION A100
• NVIDIA HGX A100 4-GPU (GPU Memory 160GB 320GB)
• 3
rdGen NVLink
• GPU bandwidth, 200GB/s (PCIe Gen4 3 )
Workstation
2x A100 Training
1x A100 HPC
1x A100 (=7x MIG) Inference
NVIDIA DGX STATION A100
Connectivity
• 2x 10GbE (RJ45)
• 4x Mini DisplayPort for display out
• Remote management 1GbE LAN port (RJ45) CPU and Memory
• 64-core AMD® Epyc® CPU, PCIe Gen4
• 512GB system memory Internal Storage
• NVME M.2 SSD for OS, NVME U.2 SSD for data cache
GPU Connectivity
NVIDIA DGX STATION A100
• Containers for DL training & inference, HPC, analytics
• Pre-trained models and model scripts
• GPU driver, CUDA
•
Pre-built HPC/AI software stack
PRE-TRAINED MODELS AND MODEL SCRIPTS
A DATA CENTER IN-A-BOX
DGX Station A100 is More Than 4X Faster
SCALABILITY TRAINING INFERENCE
Work from Anywhere AI Appliance
Plug into any standard wall socket, and access Instant Productivity
Unpack to up-and-running in under an hour,
No Data Center, No Problem!
A fully functional AI system out-of-the-box, a
NVIDIA DGX STATION A100
( ), ( )
– 4~8 GPU
NVIDIA DGX A100
– 4~8 GPU
GPU
AI Training 4 A100
Data Analytics 2 A100
AI Training 8 A100
Data Analytics 4 A100
4-node GPU Cluster
HPC
4 A100
– 4~8 GPU
, A100 MIG
ANALYTICS INFERENCE TRAINING
1 2
– 20 + GPU
GPT-3 ?
( – - )
?
OpenAI ( )
GPT
GPT-1 (2018 ; 1 1 7 )
GPT-2 (2019 ; 15 )
GPT-3 (2020 ; 1750 )
GPT-3
1) GPT-3 48%
GPT-3
2)
GPT-3
2) ,
GPT-3
3)
GPT-3
3)
GPT-3
3) ,
GPT-3 ?
7 NVIDIA , OpenAI (Microsoft Azure) GPT-3
. 285,000 CPU cores
10,000 GPUs (NVIDIA V100)
400 gigabits per second of network connectivity for each GPU server
ZDNet , GPU 1장 GPT-3 , 약 355년 .
– 20 + GPU
NVIDIA DGX A100 – Compute Fabric 및 Storage Fabric 위한 Network Adapter 구성
29
– 20 + GPU
NVIDIA DGX A100 – Compute Fabric 및 Storage Fabric 위한 Network Adapter 구성
30
Mellanox ConnectX-6 2-port VPI InfiniBand HDR (Storage) Ethernet 100Gbps (In-band)
Storage Storage
In-band
– 20 + GPU
Compute Fabric Network Architecture
DGX A100 #1 DGX A100 #2 DGX A100 #20
Leaf #2 Leaf #8
Compute 1st HDR
…
…
1 1
1
8x HDR 20x HDR
Leaf #10
Mellanox UFM Mellanox UFM Mgmt Login
Scheduler Provisioning
Leaf 160x HDR port + UFM pot Spine
Mellanox QM8790
Spine #1 Spine #2 Spine #3 Spine #4 Spine #5
Mellanox QM8790
Leaf #1
20x HDRLeaf #9
5x HDR4
4 4 4 4
34x HDR
Compute HDR
1 1 1 1
– 20 + GPU
Storage Fabric Network Architecture
Leaf #2
Compute 1st Storage
…
1 1
2x HDR
20x HDR
Leaf #4
Storage Mellanox QM8790
Spine #1 Spine #2
Mellanox QM8790
Leaf #1
20x HDRLeaf #3
4x HDR10
10
34x HDR
2 2
Mgmt Login Scheduler Provisioning
4 4
1 1
1 1
Leaf 40x HDR port + UFM pot Spine