Exploring Next

Exploring Next — Ep 463 w/ Justy & Cody — NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents | NVIDIA Technical Blog

NVIDIA’s Nemotron 3 Ultra (550B parameters, 55B active) targets long-running agent workflows with hybrid Mamba-Transformer layers, NVFP4 quantization, LatentMoE routing, and multi-token prediction. It claims 5x throughput and up to 30% cost savings on agent tasks via token efficiency, while posting leading scores on Agent Productivity PinchBench (91%), Long Context Ruler @1M (95%), and others. Open weights, open recipes, and a transparent RL data pipeline aim at broad fine-tuning and domain specialization.

Open source article

Full episode page with transcript →

Browse all Exploring Next episodes →