"Million Token Monster" was born!
Yesterday, NVIDIA suddenly launched a big move and launched Rubin CPX, a brand new GPU designed for large-scale context reasoning.

Its performance is more than twice that of the Vera Rubin NVL144 platform and is a GB300 based on Blackwell Ultra 7.5 times the NVL72 rack-mounted system!
It has a single rack 8 EFLOPS NVFP4 computing power, 100TB high-speed memory and 1.7 PB/s memory bandwidth and 128GB of cost-effective GDDR7 video memory.
Compared with the NVIDIA GB300 NVL72 system, Rubin CPX brings 3 times the attention mechanism processing capability.
Performance beasts cannot be underestimated in monetization capabilities.
Each US$100 million invested, up to $5 billion in Token revenue can be brought!

Rubin CPX
Create a new CPX processor category
Rubin CPX is built on the Rubin architecture and is the first CUDA specially designed for massive contextual AI GPU, a model that is able to reason across millions of knowledge markers at the same time.
It can be said that Rubin CPX is a "special force" specially designed to crack the bottleneck of AI's "long context".
Its appearance has brought new breakthroughs in performance and efficiency in million-level inference scenarios to AI.
Relying on the new NVIDIA Vera Rubin NVL144 CPX platform, Rubin CPX is closely collaborative with NVIDIA Vera CPU and Rubin GPU, and can support multi-step reasoning, persistent memory and long-term context, which makes it more comfortable when facing complex tasks in the fields of software development, video generation, in-depth research, etc.
This also means that with the latest support of Rubin CPX, AI encoding will be upgraded from a simple code generation tool to a complex system that can understand and optimize large-scale software projects.
Similarly, it can meet the needs of long video and research applications, maintaining continuous consistency and memory at the millions of token levels.
These requirements are approaching the limit of current infrastructure.
NVIDIA founder and CEO Hwang Junxun said that Vera The Rubin platform will once again push the forefront of AI computing and will also create a new processor category for CPX.
"Just just as RTX subverts graphics and physical AI, Rubin CPX is the first CUDA GPU specially designed for massive contextual AI, and the model can reason across millions of tokens at once for reasoning."
At present, AI pioneer companies such as Cursor, Runway and Magic are actively exploring new possibilities for Rubin CPX in application acceleration.

30-50 times ROI
Rewrite the reasoning economy
Rubin CPX can bring 30-50 times ROI to enterprises through decoupled inference innovation and rewrite the reasoning economy.
The reasoning of the large model is mainly divided into two stages: context and generation.
Their requirements for infrastructure also have essential differences.
Context stage, mainly computing restricted, requires high throughput processing to ingest and analyze massive input data to produce the output result of the first token.
Decoupled inference can allow these two stages to process independently, thereby optimizing computing power and memory resources more targetedly, improving throughput, reducing latency, and enhancing the utilization of overall resources.

This cannot be separated from NVIDIA Dynamo, which plays a key role as the orchestration layer of the above components.

Rubin CPX is a "special accelerator" designed for inference of large language models (especially million token contexts).
Rubin CPX and NVIDIA Vera CPUs, and Rubin for generation stage processing GPUs work together to form a complete high-performance decoupled service solution for long context scenarios.
The launch of CPX marks the latest evolution of the decoupled inference infrastructure and sets a new benchmark for the inference economy.
In scale scenarios, NVIDIA Vera Rubin NVL144 CPX platform, which can bring 30–50x return on investment (ROI).
This means $100 million in capital expenditure (CAPEX), which can be converted to up to $5 billion in revenue.

Million Token Monster
Redefine the next generation of AI applications
Vera Rubin NVL144 CPX platform redefined the possibility of enterprises building next-generation generative AI applications.

NVIDIA Vera Rubin NVL144 CPX rack and pallet, equipped with Rubin context GPU (Rubin CPX), Rubin GPU and Vera CPU
Rubin CPX and NVIDIA Vera CPUs, Rubin GPU, jointly integrated into the new NVIDIA Vera Rubin NVL144 CPX platform.
NVIDIA Vera Rubin NVL144 The CPX platform adopts the latest GPU architecture and has extremely high computing power and energy efficiency ratio, and can achieve rack-level deployment based on the MGX architecture.
1. Computing power jump
NVIDIA MGX rack-type system, single rack integrates 144 Rubin CPX GPUs, 144 Block Rubin GPU and 36 Vera CPUs can provide 8 EFLOPS of NVFP4 computing power, and are equipped with 100TB of high-speed memory and 1.7 PB/s of memory bandwidth in a single rack.
2. Efficient processing and optimization of long sequences
Rubin CPX is optimized for efficient processing of long sequences and is the key to high-value reasoning use cases such as software application development and high-definition (HD) video generation.
3. Video memory upgrade
Single Rubin CPX GPU can provide up to 30 petaflops has NVFP4 computing power, which comes with 128GB of cost-effective GDDR7 video memory to accelerate the most demanding context-based workloads.
4.Accelerating attention mechanism
Compared with NVIDIA GB300 NVL72 system, Rubin CPX brings 3x attention mechanism processing capabilities, significantly improving the model's ability to process longer context sequences without slowing down.
5. Multiple configurations
Rubin CPX offers a variety of configurations including Vera Rubin NVL144 CPX and can be used with NVIDIA Quantum-X800 InfiniBand scale-out computing network.
It can also be used with NVIDIA Spectrum-XGS Ethernet technology and NVIDIA ConnectX®-9 SuperNICs™’s NVIDIA Spectrum-X™ Ethernet networking platform is used in conjunction with the use of SuperNICs for large-scale deployment.

Rubin CPX
Embraise NVIDIA full-stack AI ecosystem
Ecologically, Rubin CPX will get the complete NVIDIA AI stack support, including:
NVIDIA Rubin CPX is expected to be available at the end of 2026.
Its launch will unlock stronger capabilities for developers and creators around the world, redefining the possibility of enterprises building the next generation of generative AI applications.