NVIDIA A2 vs Intel Gaudi

NVIDIA A2

Intel Gaudi

1280 Shaders 16GB GDDR6 1770MHz	2048 Shaders 32GB HBM2 610MHz
Peak AI Performance 145 TOPS INT4 Tensor Sparse	Peak AI Performance 159.91 TFLOPS BF16 Tensor
FP32 4.53 TFLOPS	- -
FP16 4.53 TFLOPS	- -
Form Factor PCIe Card 1.0-Slots	Form Factor OAM Module -
TDP 60W	TDP 350W
- - - - -	- - - - -

Highlights

Benchmarks

Geekbench 6

GB6 OpenCL 35,625 9%	GB6 OpenCL N/A 0%
GB6 Metal N/A 0%	GB6 Metal N/A 0%
GB6 Vulkan N/A 0%	GB6 Vulkan N/A 0%

Geekbench 5

GB5 OpenCL N/A 0%	GB5 OpenCL N/A 0%
GB5 CUDA N/A 0%	GB5 CUDA N/A 0%
GB5 Metal N/A 0%	GB5 Metal N/A 0%
GB5 Vulkan N/A 0%	GB5 Vulkan N/A 0%

OctaneBench

OCT 2020.1 N/A 0%	OCT 2020.1 N/A 0%
OCT Metal N/A 0%	OCT Metal N/A 0%

Tech Specs

Theoretical Performance

Peak AI Performance 145 TOPS INT4 Tensor Sparse	Peak AI Performance 159.91 TFLOPS BF16 Tensor
- - -	- - -
- - - - - -	- - - - - -
FP16 4.53 TFLOPS 18.13 TFLOPS Tensor (FP16 Accumulate) 36.25 TFLOPS Tensor (FP16 Accumulate) Sparse 18.13 TFLOPS Tensor (FP32 Accumulate) 36.25 TFLOPS Tensor (FP32 Accumulate) Sparse	- - - - - -
FP32 4.53 TFLOPS - -	- - - -
FP64 70 GFLOPS -	- - -
BF16 4.53 TFLOPS 18.13 TFLOPS Tensor 36.25 TFLOPS Tensor Sparse	BF16 - 159.91 TFLOPS Tensor -
TF32 9.06 TFLOPS Tensor 18.12 TFLOPS Tensor Sparse	- - -
INT4 72.5 TOPS Tensor 145 TOPS Tensor Sparse	- - -
INT8 - 36.25 TOPS Tensor 72.5 TOPS Tensor Sparse	- - - -
INT32 2.27 TOPS	- -
Ray Tracing 8.9 TOPS	- -
Pixel Fillrate 56.64 GPixel/s	Pixel Fillrate -
- -	- -
Texture Fillrate 70.8 GTexel/s	Texture Fillrate -

Chip

Manufacturer NVIDIA	Manufacturer Intel
Chip Designer NVIDIA	Chip Designer Intel
Architecture Ampere	Architecture Gaudi
Family Server	Family Gaudi
Codename NV177 GA107 - -	Codename Gaudi HL-2000 Variant HL-2000
Market Segment Server	Market Segment Server
Release Date 11/10/2021	Release Date 6/17/2019

Fabrication

Foundry Samsung -	Foundry TSMC -
Fabrication Node 8N -	Fabrication Node 16FF -
Die Size 200 mm² -	- - -
Transistor Count 8.7 Billion -	- -
Transistor Density 43.50M/mm² -	- - -

Form

Form

PCIe Card

Form

OAM Module

Core Configuration

Shading Units 1280 Shaders -	Shading Units 2048 Shaders -
Texture Mapping Units 40 TMUs	Texture Mapping Units -
Render Output Units 32 ROPs	Render Output Units -
Tensor Cores 40 T-Cores	Tensor Cores 8 T-Cores
Ray-Tracing Cores 10 RT-Cores	- -
Streaming Multiprocessors 10 SMs	- -
- -	Compute Units 1 CU
- -	- -
- -	- -

Clock Speeds

-

-

1440MHz Base

1770MHz

-

-

-

610MHz

Cache

- -	- -
L1 64KB/SM Tex 128KB/SM - -	L1 - - - Unknown
L2 2MB Shared	L2 Unknown
- - -	- - -

Memory

16GB GDDR6 -	32GB HBM2 ECC
Bus Width 128Bit	Bus Width 4096Bit
Clock 782MHz Transfer Rate 6.3GT/s Bandwidth 100GB/s	Clock 980MHz Transfer Rate 2GT/s Bandwidth 1003.5GB/s
- - - - - - - - -	- - - - eSRAM 24MB 3200GB/s - -

Power & Thermals

TDP 60W	TDP 350W
- -	- -

Ports

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

No Ports

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

No Ports

Video Output

Max Resolution Unknown	Max Resolution Unknown
Max Resolution Refresh Rate -	Max Resolution Refresh Rate -
Variable Refresh Rate G-Sync FreeSync -	Variable Refresh Rate - - -
Display Stream Compression (DSC) Not Supported	Display Stream Compression (DSC) Not Supported
Multi Monitor Support Unknown	Multi Monitor Support Unknown
- -	- -

Video Encoder

Model NVENC 7	No Encoders -
Codec - - - - - - - - AVC (H.264) HEVC (H.265) - - - -	- - - - - - - - - - - - - - -

Video Decoder

Model NVDEC 5	No Decoders
Codec MPEG-1 MPEG-2 MPEG-4 - VC-1 VP8 VP9 - AVC (H.264) HEVC (H.265) - AV1 - -	- - - - - - - - - - - - - -

API Support

Direct X 12 Direct 3D 12_2	- - - -
OpenGL 4.6 OpenCL 3.0 Vulkan 1.2	- - OpenCL 3.0 - -
Shader Model 6.6 CUDA 8.6 - - PureVideo HD VP11 VDPAU Feature Set K	- - - - - - - - - -

Card

- - - -	Not a Card - - -
- - - - - - - -	- - - - - - - -
Slots Required 1.0 PCIe Version 4.0 PCIe Lanes 8	- - PCIe Version 4.0 PCIe Lanes 16
- - - -	Multi GPU Support Supported Type RoCE
Height 69 mm (2.72 in) Width 168 mm (6.61 in) Depth 20 mm (0.79 in)	- - - - - -

Competitors

NVIDIA A2

NVIDIA A16

NVIDIA A2 vs NVIDIA A16

NVIDIA A2

AMD Radeon E9565

NVIDIA A2 vs AMD Radeon E9565

Change Comparison

Copy Link