3nm With 20 ARM v9.2 CPU Cores, 1000 TOPS NVFP4 Blackwell GPU, LPDDR5x-9400 Memory Support, 140W TDP

NVIDIA has just detailed its GB10 Superchip with Blackwell GPU, which is being used to power several DGX AI Mini supercomputers.

NVIDIA GB10 Superchip Features an SoC and GPU Dielet Based on 3nm & Packaged Using 2.5D Tech

NVIDIA’s DGX Spark, the first system to be announced with the GB10 Superchip, has been making headlines. The system is NVIDIA’s foray into the “AI PC” segment, and ever since the announcement, several others have spawned their own GB10 “AI PC” platforms. Today at Hot Chips 2025, NVIDIA is giving a deep dive into its GB10 Superchip and how it scales the Blackwell architecture down to Mini developer and workstations.

The idea behind DGX Spark was the design a Mini AI Supercomputer with the Blackwell architecture. To make this happen, NVIDIA developed the GB10 Superchip, which combines innovations from datacenters, such as NVFP4, CUDA, SLANG, TensorRT, vLLM, CX-7 NIC, NVLINK C2C, TMEM, and more, down to a Mini PC platform which utilizes a small form factor, made possible using multi-die packaging tech, a very low-power C2C interface, and Unified Memory Architecture (UMA).

As a result, the DGX Spark Workstation was built, which offers the following key features and benefits:

  • GB10 Grace Blackwell Superchip: Accelerates AI, Data Science, Compute, Rendering & Visualization
  • 128GB Coherent Unified System Memory: Works with Large AI models up to 200 billion parameters, fine-tune models of up to 70 billion parameters
  • ConnectX-7 Networking: Connect two DGX Spark systems together to work with models of up to 405b parameters
  • DGX Base OS and NVIDIA AI Software Stack: Seamlessly move workloads from DGX Spark to DGX Cloud or any accelerated data center or cloud infrastructure
  • Flexible deployment configurations: Configure as an AI Workstation or a network-connected personal AI cloud
  • Great Desktop Experience: Multi-head display support and flexible connectivity
  • Compact, power-efficient design: Easily fits on any desk, powered by a standard wall outlet

So let’s dive into the specifications of the GB10 Superchip. First up, we have the SoC composition, which shows that the chip itself is composed of two dielets, an S-Dielet which houses the CPU, memory subsystem, etc, and a G-Dielet which houses the GPU core. These two dielets are packaged together using Advanced 2.5D packaging and are fabricated on TSMC’s 3nm process technology.

The CPU is based on the ARM Arch v9.2 architecture with 20 cores in total. There are 2 clusters of 10 cores each, and each core has a private L2 cache and a 16 MB L3 cache per cluster, so 32 MB in total.

The GPU is based on the GB100 Blackwell architecture and is considered an iGPU since it is on the same package and silicon. It features 5th Gen Tensor Cores with DLSS 4 support and RTX Ray Tracing cores. It produces up to 31 TFLOPs of FP32 and 1000 TOPS of NVFP4 (FP4) compute for AI workloads. The GPU also gets an additional 24 MB of L2 cache.

Moving into the memory system, the NVIDIA GB10 Superchip SOC features support for 256b LPDDR5x (UMA) with up to 9400 MT/s speeds, enabling up to 301 GB/s of raw bandwidth, and up to 128 GB of maximum capacities. The system fabric is a high-performance coherent fabric that offers support for CHI-E Coherency Protocol. The GPU has access to the entire system bandwidth of 600 GB/s (Aggregate) over the C2X interface.

There’s also 16 MB of System Level Cache, which serves as L4 for the CPU, and enables power-efficient data-sharing between the multiple engines on the SoC. The C2C interface is also high-bandwidth and low-power, enabled through NVIDIA’s NVLINK architecture.

On the connectivity side, NVIDIA’s GB10 Superchip SoC offers PCIe, USB, Ethernet over PCIe, and drives up to 4 concurrent displays (3 DP + 1 HDMI) at up to 4K @120Hz with DP Alt-mode, and up to 8K @ 120Hz with HDMI 2.1a. Security features include Dual Secure Root support, SROOT processor, OSROOT processor, and support for both fTPM and discrete TPM. The whole chip has a TDP of 140W.

Following is the block diagram of the NVIDIA GB10 Superchip SoC:

Scalability is also another fun aspect of the GB10 Superchip. You can connect multiple GB10 chips through NVIDIA’s ConnectX Technology and scale throughput, bandwidth, and DRAM capacities to support larger AI models. The ConnectX NIC is connected to the GB10 SoC using a PCIe Gen5 x8 interface, and the units communicate with each other using Ethernet.

NVIDIA calls the GB10 Superchip SoC a successful collaboration between them and Mediatek since the CPU IP is from Mediatek. The chip underwent extensive performance modeling of GPU memory traffic into Mediatek’s memory subsystem.

Now, what makes the GB10 Superchip so interesting is that one day, we are eventually going to see it roll down to consumer platforms such as laptops and Mini PCs. There have been several reports of N1X and N1 SoCs to be the first consumer-centric NVIDIA SoCs, and GB10 is our first look at what these chips are going to be and what they will have on offer.


Source link

Leave a Reply

Your email address will not be published. Required fields are marked *