Nvidia Launches Nemotron 3 Nano Omni for Multimodal AI Inference

Nvidia introduced the Nemotron 3 Nano Omni, a multimodal AI model for vision, audio, images, and text processing. The model has been adopted by companies like Palantir, Foxconn, and others. It delivers 9x higher throughput than competing open models, boosting Nvidia's position in the AI market.

Nvidia unveiled the Nemotron 3 Nano Omni on April 28, a new open-source multimodal AI model designed for enterprise inference applications spanning vision, audio, image, and text processing in a unified architecture. The model features a hybrid Mamba-Transformer Mixture-of-Experts design with 30 billion total parameters and only 3 billion active per inference pass — an architecture that dramatically reduces compute requirements while maintaining high output quality. Key benchmarks show it outperforms competitors across six leaderboards including MMlongbench-Doc, OCRBenchV2, WorldSense, and VoiceBench.

The performance profile is particularly notable for real-time applications: Nemotron 3 Nano Omni achieves 9x higher output throughput versus Qwen3-Omni at comparable interactivity levels, and reaches 5,000 output tokens per second on a single Nvidia B200 GPU. Enterprise adopters at launch include Palantir, Foxconn, and Oracle, with Dell Technologies, DocuSign, and Infosys evaluating the model. The release is available through Hugging Face, Nvidia NIM, and OpenRouter, positioning it as accessible to both enterprise and developer communities.

The launch reinforces NVDA's strategy of building an end-to-end AI stack — from GPU hardware to foundation models — that creates ecosystem lock-in beyond silicon alone. By releasing Nemotron as an open model optimized for Nvidia's B200 architecture, the company drives additional demand for its data center GPU platforms while providing enterprises with a compelling alternative to closed models from OpenAI and Anthropic. The multimodal capability is particularly timely as enterprises increasingly require AI agents that can process diverse data types in single-pass workflows.