• 검색 결과가 없습니다.

NVIDIA DGX FAMILY

N/A
N/A
Protected

Academic year: 2022

Share "NVIDIA DGX FAMILY"

Copied!
37
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

, NVIDIA Korea, 2021 1 20

NVIDIA DGX FAMILY

(2)

NVIDIA DGX Family

(3)

NVIDIA DATA CENTER GPU HISTORY

2017 2018 2019 2020

Volta Architecture Turing Architecture Ampere Architecture

V100 T4

RTX 6K RTX 8K

A100

(4)

NVIDIA DATA CENTER GPU

(5)

Workload Description

NVIDIA A100 SXM4 NVIDIA A100 PCIe NVIDIA T4 NVIDIA A40

World's Most Powerful Data Center GPU

Versatile Data Center GPU for Mainstream

Computing

World's Most Powerful Data Center GPU for

Visual Computing Deep Learning

Training

For the absolute fastest model training time

• 8-16 GPUs (for new installs)

80GB: For largest models (DLRM, GPT- 2 over 9.3Bn parameters in one node)

• 4-8 GPUs

Deep Learning Inference

For batch and real-time inference

• 1 GPU w/ MIG

80GB: 7MIGs at 10GB each for large batch size constrained models (RNN-T)

• 1 GPU w/ MIG • 1 GPU

HPC / AI

For scientific computing centers and higher ed and research institutions

• 4 GPUs with MIG for supercomputing centers

80GB: For largest datasets and high memory bandwidth applications

• 1-4 GPUs with MIG for higher ed and research

Render Farms For batch and real-time

rendering 4-8 GPUs

Graphics

For the best graphics

performance on professional virtual workstations*

• 2–8 GPUs for mid-range virtual workstations for professional graphics

4-8 GPUs for mid and high-end professional graphics and RTX

workloads or simulation

Enterprise Acceleration

Mixed Workloads – Graphics, ML, DL, analytics, training, inference

• 1-4 with MIG for compute intensive multi-GPU workloads

80GB: data analytics with largest datasets

• 1-4 GPUs with MIG for compute intensive single GPU workloads

• 4-8 GPUs for balanced workloads*

2-4 GPUs for graphics intensive* and compute workloads

Edge solutions with differing 2-4 GPUs for graphics

CHOOSE THE RIGHT NVIDIA DATA CENTER GPU

(6)

A100 GPU 5

NVIDIA Ampere Architecture

World’s Largest 7nm chip 54B XTORS, HBM2

3

rd

Gen Tensor Cores

Faster, Flexible, Easier to use 20x AI Perf with TF32

2.5x HPC Perf

(7)

MIG

• A100 GPU, 7 GPU Slice

• : GPU

• A100 GPU 1

• , , , HPC

• GPU 1 19

• MIG, H/W , QoS

Multi-Instance GPU

(8)

A100 GPU 1 MIG

18 (MIG -Disabled 19 )

Slice #1 Slice #2 Slice #3 Slice #4 Slice #5 Slice #6 Slice #7 7

4 2 1

4 1 1 1

2 2 3

2 1 1 3

1 1 2 3

1 1 1 1 3

3 3

3 2 1

3 1 1 1

2 2 2 1

2 2 1 1 1

1 1 2 2 1

(9)

NVIDIA DGX FAMILY

DGX A100

A100 8-GPU 320GB/640GB DGX POD/SuperPOD

4x/8x/…/20x/…/140x DGX A100

DGX Station A100

A100 4-GPU 160GB/320GB

w/ NVIDIA A100 GPU 40GB/80GB

(10)

GPU / ( )

1~2 GPU

4~8 GPU

20 + 1~2 GPU

GPU

4~8 GPU

20 +

GPU

(11)

– 1~2 GPU

GPU 8 ?

InfiniBand ?

HPC/AI software stack …

… …

(12)

– 1~2 GPU

NVIDIA DGX Station A100

(13)

NVIDIA DGX STATION A100

• NVIDIA HGX A100 4-GPU (GPU Memory 160GB 320GB)

• 3

rd

Gen NVLink

• GPU bandwidth, 200GB/s (PCIe Gen4 3 )

Workstation

2x A100 Training

1x A100 HPC

1x A100 (=7x MIG) Inference

(14)

NVIDIA DGX STATION A100

Connectivity

• 2x 10GbE (RJ45)

• 4x Mini DisplayPort for display out

• Remote management 1GbE LAN port (RJ45) CPU and Memory

• 64-core AMD® Epyc® CPU, PCIe Gen4

• 512GB system memory Internal Storage

• NVME M.2 SSD for OS, NVME U.2 SSD for data cache

GPU Connectivity

(15)

NVIDIA DGX STATION A100

• Containers for DL training & inference, HPC, analytics

• Pre-trained models and model scripts

• GPU driver, CUDA

Pre-built HPC/AI software stack

PRE-TRAINED MODELS AND MODEL SCRIPTS

(16)

A DATA CENTER IN-A-BOX

DGX Station A100 is More Than 4X Faster

SCALABILITY TRAINING INFERENCE

(17)

Work from Anywhere AI Appliance

Plug into any standard wall socket, and access Instant Productivity

Unpack to up-and-running in under an hour,

No Data Center, No Problem!

A fully functional AI system out-of-the-box, a

NVIDIA DGX STATION A100

( ), ( )

(18)

– 4~8 GPU

NVIDIA DGX A100

(19)

– 4~8 GPU

GPU

AI Training 4 A100

Data Analytics 2 A100

AI Training 8 A100

Data Analytics 4 A100

4-node GPU Cluster

HPC

4 A100

(20)

– 4~8 GPU

, A100 MIG

ANALYTICS INFERENCE TRAINING

1 2

(21)

– 20 + GPU

GPT-3 ?

( – - )

?

OpenAI ( )

GPT

GPT-1 (2018 ; 1 1 7 )

GPT-2 (2019 ; 15 )

GPT-3 (2020 ; 1750 )

(22)

GPT-3

1) GPT-3 48%

(23)

GPT-3

2)

(24)

GPT-3

2) ,

(25)

GPT-3

3)

(26)

GPT-3

3)

(27)

GPT-3

3) ,

(28)

GPT-3 ?

7 NVIDIA , OpenAI (Microsoft Azure) GPT-3

. 285,000 CPU cores

10,000 GPUs (NVIDIA V100)

400 gigabits per second of network connectivity for each GPU server

ZDNet , GPU 1장 GPT-3 , 약 355년 .

(29)

– 20 + GPU

NVIDIA DGX A100 – Compute Fabric 및 Storage Fabric 위한 Network Adapter 구성

29

(30)

– 20 + GPU

NVIDIA DGX A100 – Compute Fabric 및 Storage Fabric 위한 Network Adapter 구성

30

Mellanox ConnectX-6 2-port VPI InfiniBand HDR (Storage) Ethernet 100Gbps (In-band)

Storage Storage

In-band

(31)

– 20 + GPU

Compute Fabric Network Architecture

DGX A100 #1 DGX A100 #2 DGX A100 #20

Leaf #2 Leaf #8

Compute 1st HDR

1 1

1

8x HDR 20x HDR

Leaf #10

Mellanox UFM Mellanox UFM Mgmt Login

Scheduler Provisioning

Leaf 160x HDR port + UFM pot Spine

Mellanox QM8790

Spine #1 Spine #2 Spine #3 Spine #4 Spine #5

Mellanox QM8790

Leaf #1

20x HDR

Leaf #9

5x HDR

4

4 4 4 4

34x HDR

Compute HDR

1 1 1 1

(32)

– 20 + GPU

Storage Fabric Network Architecture

Leaf #2

Compute 1st Storage

1 1

2x HDR

20x HDR

Leaf #4

Storage Mellanox QM8790

Spine #1 Spine #2

Mellanox QM8790

Leaf #1

20x HDR

Leaf #3

4x HDR

10

10

34x HDR

2 2

Mgmt Login Scheduler Provisioning

4 4

1 1

1 1

Leaf 40x HDR port + UFM pot Spine

(33)

– 4~8 GPU

Compute Fabric Storage Fabric Network Architecture

(34)
(35)

NVIDIA DGX POD MANAGEMENT SOFTWARE

– K8s/Slurm, – Prometheus + Grafana, – Ansible

(36)

Inserting video: Insert/Video/Video from File.

Insert video by browsing your directory and selecting OK.

File types that works best in PowerPoint are mp4 or wmv

(37)

참조

관련 문서

The new Nortel Networks Layer 2/3 Copper and Fiber GbE Switch Modules for IBM Eserver BladeCenter serve as a switching and routing fabric for the BladeCenter server chassis..

NVIDIA uses certain non-GAAP measures in this presentation including non-GAAP gross margin, non-GAAP operating margin, non-GAAP net income, non-GAAP operating income,

: Model Parallelism in Deep Learning is NOT What You Think : Paper, Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platfrom,

100 nodes or fewer is simpler as the third layer of switching is

STEAM 융합 프로그램 적용을 위한 학생 팀 구성 및 홍보 : 개발될 STEAM 융합인재 양성 프로그램 의 효과 검증 및 실제 학생들의 STEAM 융합 능력 배양을 위한 학생 팀 구성

Organic Light Emitting Display (OLED)의 원리 및 구성. OLED의

Table 7 Structural design results using flax fabric/vinyl ester

 Each Tesla multiprocessor consists of 8 streaming processors, which execute eight parallel threads per clock showing horizontally... NVIDIA Tesla