Nvidia launches Nemotron 3 Super to power enterprise AI agents

Nvidia has introduced a new reasoning-focused AI model that combines multiple neural network architectures in a bid to improve how enterprise systems handle complex tasks and automation.

The company said its Nemotron 3 Super model combines Mamba sequence modeling, transformer attention, and Mixture-of-Experts routing to support so-called “agentic” AI systems that can plan and execute multi-step workflows across enterprise applications.

In a statement, Nvidia said multi-agent systems can generate up to 15 times more tokens than standard chat interactions. This can lead to “context explosion,” which may cause agents to drift from the original goal and raise costs, as large reasoning models are used for each subtask.

“We are releasing Nemotron 3 Super to address these limitations,” Nvidia said. “The new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications such as software development and cybersecurity triaging.”

Nvidia said the model is released with open weights, datasets, and training recipes, allowing developers to modify it and deploy it on their own infrastructure.

The release reflects a broader shift in the AI industry as vendors move beyond chatbots toward models designed to power autonomous AI agents.

“Enhanced reasoning directly supports better task planning, error correction, and workflow decomposition, which collectively increase the reliability of AI agents for enterprise use,” said Jaishiv Prakash, director analyst at Gartner. “However, the success of agentic systems will not just depend on model capability but on the overall system architecture, including orchestration, data integration, context management, and governance.”

Architecture for enterprise efficiency

Nemotron 3 Super reflects Nvidia’s push to improve performance for enterprise AI workloads that involve sustained reasoning and long-context processing. The model’s hybrid architecture, analysts say, could help organizations run complex agent workloads more efficiently on existing infrastructure.

“Nemotron 3 Super combines Mamba’s linear-time sequence processing with Transformer attention and MoE routing, delivering higher throughput, lower latency, and better memory efficiency than pure transformers for long-context and multi-step workloads,” said Charlie Dai, VP and principal analyst at Forrester. “For enterprises, this translates into lower TCO, better utilization of on-prem or sovereign GPU clusters, and faster agent execution.”

Tulika Sheel, senior vice president at Kadence International, said the model’s architecture is designed to activate only a subset of parameters for each task, which helps improve efficiency.

“This design significantly improves throughput and lowers compute costs while maintaining accuracy,” Sheel said. “For enterprises, that can translate into faster inference, better performance on long-context workloads, and more cost-efficient deployment of large models.”

Open models reshape strategy

Open reasoning models are emerging as an option for enterprises seeking greater control over how AI systems are built and deployed. Research by McKinsey & Company attributes this interest to strong performance, ease of use, and lower implementation and maintenance costs compared with proprietary alternatives.

“As a result, many organizations may adopt a hybrid strategy, combining open models for internal workloads with proprietary models for external or high-performance tasks,” Sheel said. “Open reasoning models could push enterprises toward more customizable, self-hosted AI strategies rather than full reliance on proprietary platforms.”

Analysts also said that the ability to fine-tune and inspect models is becoming increasingly important as enterprises expand AI into regulated sectors such as finance, healthcare, and government.

“Open reasoning models give enterprises a credible alternative to proprietary foundation models by enabling fine-tuning, inspection, and on-prem deployment,” Dai said. “This supports customization for domain logic, regulatory compliance, and data residency, while reducing dependency on closed APIs and usage-based pricing.”

Go to Source

Author: