NVIDIA has additional expanded its skilled information middle lineup of Ampere GPUs with the A2 Tensor Core GPU accelerator. The brand new accelerator is essentially the most entry-level design we have now seen from NVIDIA and boasts some first rate specs primarily based on its entry-level market designation.
NVIDIA A2 Tensor Core GPU Is An Entry-Stage Knowledge Middle Design Powered By Ampere GA107
The NVIDIA A2 Tensor Core GPU is designed particularly for inferencing and replaces the Turing-powered T4 Tensor Core GPU. By way of specs, the cardboard incorporates a variant of Ampere GA107 GPU SKU which provides 1280 CUDA cores and 40 Tensor cores. These cores run at a clock frequency of 1.77 GHz and are primarily based on the Samsung 8nm course of node. Solely the higher-end GA100 GPU SKUs are primarily based on the TSMC 7nm course of node.
Reminiscence design contains a 16 GB GDDR6 capability that runs throughout a 128-bit bus-wide interface, clocking in at 12.5 Gbps successfully for a complete bandwidth of 200 GB/s. The GPU is configured to function at a TDP between 40 and 60 Watts. Resulting from its entry-level design, it additionally is available in a small kind issue design with a Half-Top and Half-Size kind issue which is passively cooled. Resulting from its decrease TDP, it does not require any exterior energy connectors besides. The cardboard additionally incorporates a PCIe Gen 4.0 x8 interface as an alternative of the usual x16 hyperlink.
The NVIDIA A2 Tensor Core GPU offers entry-level inference with low energy, a small footprint, and excessive efficiency for NVIDIA AI on the edge. That includes a low-profile PCIe Gen4 card and a low 40-60W configurable thermal design energy (TDP) functionality, the A2 brings versatile inference acceleration to any server for deployment at scale.
NVIDIA Ampere Skilled GPU Lineup
GPU Identify | A100 | A40 | A30 | A16 | A10 | A2 |
---|---|---|---|---|---|---|
Course of Node | TSMC 7nm | Samsung 8nm | TSMC 7nm | Samsung 8nm | Samsung 8nm | Samsung 8nm |
GPU SKU | GA100-884 | GA102-895 | GA100-890 | 4x GA107 | GA102-890 | GA107 |
GPU Transistors | 54.2B | 28.3B | 54.2B | TBA | 28.3B | TBA |
CUDA Cores | 6912 | 10752 | 3584 | 2560 x4 | 9216 | 1280 |
Tensor Cores | 432 | 336 | 224 | 80 x4 | 288 | 40 |
Increase Clock | 1.41 GHz | 1.74 GHz | 1.44 GHz | 1.69 GHz | 1.69 GHz | 1.77 GHz |
FP32 Compute | 19.49 TFLOPs | 37.42 TFLOPs | 10.32 TFLOPs | 8.678 TFLOPs x4 | 31.24 TFLOPs | 4.5 TFLOPs |
FP64 Compute | 9.74 TFLOPs | 1.16 TFLOPs | 5.16 TFLOPs | 0.27 TFLOPs x4 | 0.97 TFLOPs | 0.14 TFLOPs |
FP16 Compte | 77.97 TFLOPs | 37.42 TFLOPs | 10.32 TFLOPs | 8.67 TFLOPs x4 | 31.24 TFLOPs | 4.5 TFLOPs |
INT8 Tensor Compute | 624 TOPS | 598.6 TOPs | 330 TOPS | TBA | 500 TOPS | 36 TOPS |
TF32 Tensor Compute | 156 TFLOPS | 149.6 TOPs | 82 TFLOPS | TBA | 125 TF | 9 TFLOPS |
PCIe Interconnects | NVLink 3 12 Hyperlinks | PCIe 4.0 x16 | PCIe 4.0 x16 + NVLink 3 (4 Hyperlinks) | PCIe 4.0 x16 | PCIe 4.0 x16 | PCIe 4.0 x8 |
Reminiscence Capability | 40 GB HBM2e | 48 GB GDDR6 | 24 GB HBM2e | 16 GB x4 GDDR6 | 24 GB GDDR6 | 16 GB GDDR6 |
Reminiscence Bus | 5120 bit | 384 bit | 3072 bit | 128 bit x4 | 384 bit | 128-bit |
Reminiscence Clock | 1215 MHz | 1812 MHz | 1215 MHz | 1812 MHz | 1563 MHz | 1563 MHz |
Bandwidth | 1.55 TB/s | 695.8 GB/s | 933.1 GB/s | 231.9 GB/s x4 | 600.2 GB/s | 200 GB/s |
TDP | 400W | 300W | 165W | 250W | 150W | 60W |
Kind Issue | SXM4 | PCIe Twin Slot, Full Size | PCIe Twin Slot, Full Size | PCIe Twin Slot, Full Size | PCIe Single Slot, FLHH | PCIe Single Slot, HLHF |