NVIDIA L40S vs AMD Instinct MI250X

NVIDIA L40S

AMD Instinct MI250X

AMD Instinct MI250X

18176 Shaders 48GB GDDR6 2520MHz	14080 Shaders 128GB HBM2e 1700MHz
Peak AI Performance 2.93 POPS INT4 Tensor Sparse	Peak AI Performance 382.98 TOPS INT4 Tensor
FP32 91.61 TFLOPS	FP32 47.87 TFLOPS
FP16 91.61 TFLOPS	FP16 95.74 TFLOPS
Form Factor PCIe Card 2.0-Slots	Form Factor OAM Module -
TDP 300W	TDP 560W
Power Connectors - - - 1x 16-Pin 12VHPWR	- - - - -

Highlights

Benchmarks

Geekbench 6

GB6 OpenCL 361,480 94%	GB6 OpenCL N/A 0%
GB6 Metal N/A 0%	GB6 Metal N/A 0%
GB6 Vulkan 245,275 65%	GB6 Vulkan N/A 0%

Geekbench 5

GB5 OpenCL N/A 0%	GB5 OpenCL N/A 0%
GB5 CUDA N/A 0%	GB5 CUDA N/A 0%
GB5 Metal N/A 0%	GB5 Metal N/A 0%
GB5 Vulkan N/A 0%	GB5 Vulkan N/A 0%

OctaneBench

OCT 2020.1 N/A 0%	OCT 2020.1 N/A 0%
OCT Metal N/A 0%	OCT Metal N/A 0%

Tech Specs

Theoretical Performance

Peak AI Performance 2.93 POPS INT4 Tensor Sparse	Peak AI Performance 382.98 TOPS INT4 Tensor
- - -	- - -
FP8 - 732.86 TFLOPS Tensor (FP16 Accumulate) 1.47 PFLOPS Tensor (FP16 Accumulate) Sparse 732.86 TFLOPS Tensor (FP32 Accumulate) 1.47 PFLOPS Tensor (FP32 Accumulate) Sparse	- - - - - -
FP16 91.61 TFLOPS 366.43 TFLOPS Tensor (FP16 Accumulate) 732.86 TFLOPS Tensor (FP16 Accumulate) Sparse 366.43 TFLOPS Tensor (FP32 Accumulate) 732.86 TFLOPS Tensor (FP32 Accumulate) Sparse	FP16 95.74 TFLOPS 382.98 TFLOPS Tensor (FP16 Accumulate) - 382.98 TFLOPS Tensor (FP32 Accumulate) -
FP32 91.61 TFLOPS - -	FP32 47.87 TFLOPS 95.74 TFLOPS Tensor -
FP64 1.43 TFLOPS -	FP64 47.87 TFLOPS 95.74 TFLOPS Tensor
BF16 91.61 TFLOPS 366.43 TFLOPS Tensor 732.86 TFLOPS Tensor Sparse	BF16 - 382.98 TFLOPS Tensor -
TF32 183.21 TFLOPS Tensor 366.43 TFLOPS Tensor Sparse	- - -
INT4 1.47 POPS Tensor 2.93 POPS Tensor Sparse	INT4 382.98 TOPS Tensor -
INT8 - 732.86 TOPS Tensor 1.47 POPS Tensor Sparse	INT8 - 382.98 TOPS Tensor -
INT32 45.8 TOPS	- -
Ray Tracing 211.7 TOPS	- -
Pixel Fillrate 483.84 GPixel/s	Pixel Fillrate -
- -	- -
Texture Fillrate 1431.36 GTexel/s	Texture Fillrate 1496 GTexel/s

Chip

Manufacturer NVIDIA	Manufacturer AMD
Chip Designer NVIDIA	Chip Designer AMD
Architecture Ada Lovelace	Architecture CDNA 2
Family Server	Family Instinct
Codename NV182 AD102 Variant AD102-895-A1	Codename Aldebaran Aldebaran XTX Variant Aldebaran XTX
Market Segment Server	Market Segment Server
Release Date 10/13/2022	Release Date 11/8/2021

Fabrication

Foundry TSMC -	Foundry TSMC -
Fabrication Node 4N -	Fabrication Node N6 -
Die Size 608 mm² -	Die Size 2x 724 mm² -
Transistor Count 76.3 Billion -	Transistor Count 2x 28 Billion -
Transistor Density 125.41M/mm² -	Transistor Density 38.67M/mm² -

Form

Form

PCIe Card

Form

OAM Module

Core Configuration

Shading Units 18176 Shaders -	Shading Units 14080 Shaders -
Texture Mapping Units 568 TMUs	Texture Mapping Units 880 TMUs
Render Output Units 192 ROPs	Render Output Units -
Tensor Cores 568 T-Cores	Tensor Cores 880 T-Cores
Ray-Tracing Cores 142 RT-Cores	- -
Streaming Multiprocessors 142 SMs	- -
- -	Compute Units 220 CUs
- -	- -
Graphics Processing Clusters 12 GPCs	- -

Clock Speeds

-

-

1110MHz Base

2520MHz

-

-

1000MHz Base

1700MHz

Cache

- -	- -
L1 64KB/SM Tex 128KB/SM - -	L1 - - 16KB/CU -
L2 96MB Shared	L2 16MB Shared
- - -	- - -

Memory

48GB GDDR6 -	128GB HBM2e ECC
Bus Width 384Bit	Bus Width 8192Bit
Clock 2250MHz Transfer Rate 18GT/s Bandwidth 864GB/s	Clock 1600MHz Transfer Rate 3.2GT/s Bandwidth 3276.8GB/s
- - - - - - - - -	- - - - - - - - -

Power & Thermals

TDP 300W	TDP 560W
- -	- -

Ports

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

4x DisplayPort 1.4

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

No Ports

Video Output

Max Resolution 7680x4320	Max Resolution Unknown
Max Resolution Refresh Rate 120Hz	Max Resolution Refresh Rate -
Variable Refresh Rate G-Sync FreeSync -	Variable Refresh Rate - - -
Display Stream Compression (DSC) Supported	Display Stream Compression (DSC) Not Supported
Multi Monitor Support 4	Multi Monitor Support Unknown
Content Protection HDCP 2.3	- -

Video Encoder

Model 2x NVENC 8	Model 2x VCN 2.6
Codec - - - - - - - - AVC (H.264) HEVC (H.265) - AV1 - -	Codec - - - - - - - - AVC (H.264) HEVC (H.265) - - - -

Video Decoder

Model NVDEC 5	No Decoders
Codec MPEG-1 MPEG-2 MPEG-4 - VC-1 VP8 VP9 - AVC (H.264) HEVC (H.265) - AV1 - -	- - - - - - - - - - - - - -

API Support

Direct X 12 Direct 3D 12_3	- - - -
OpenGL 4.6 OpenCL 3.0 Vulkan 1.3	- - OpenCL 3.0 - -
Shader Model 6.7 CUDA 8.9 - - PureVideo HD VP12 VDPAU Feature Set L	- - - - GFX 9.4 - - - -

Card

- - - -	- - - -
Power Connectors - - - - - - 1x 16-Pin 12VHPWR	- - - - - - - -
Slots Required 2.0 PCIe Version 4.0 PCIe Lanes 16	- - PCIe Version 4.0 PCIe Lanes 16
- - - -	Multi GPU Support Supported Type Infinity Fabric
Height 111 mm (4.37 in) Width 267 mm (10.51 in) Depth 40 mm (1.57 in)	- - - - - -

Competitors

NVIDIA L40S

NVIDIA L40

NVIDIA L40S vs NVIDIA L40

AMD Instinct MI250X

AMD Instinct MI250

AMD Instinct MI250X vs AMD Instinct MI250

Change Comparison

Copy Link