To many people, AI manifests one of sci-fi’s central plot points: built intelligence or machines that think and act independently of a human supervisor. But from my perspective, we haven’t quite achieved the true fulfilment of that vision.
For this reason, many thought leaders describe world models as AI’s next big paradigm shift. These models learn from the full physical environment — synthetic or real — and can understand the spatial and physics complexities of worlds, unlike LLMs, which are restricted to language and images.
AMI’s Yann LeCun is such a strong believer that he quit his role as chief AI scientist at Meta to found his own organization to advance world models. “I’ve not been making friends in various corners of Silicon Valley, including at Meta, saying that within three to five years, [world models] will be the dominant model for AI architectures, and nobody in their right mind would use LLMs of the type that we have today,” LeCun said.
Apparently, LLMs have achieved groundbreaking results. They can only improve with more compute and more data, which are increasingly expensive, unwieldy and deliver only diminishing returns.
World models are the vital prerequisite for AGI
I believe that world models have the potential to actualize many of the capabilities that sci-fi dreams about.
To truly achieve artificial general intelligence (AGI), world models will need to go beyond pattern recognition to capture how the world actually works. A system capable of general reasoning must understand relationships – physical, social and causal — well enough to transfer knowledge between unfamiliar situations.
Without that holistic perspective, a model may perform impressively when conditions perfectly match those described in its training, yet it will fail when those conditions suddenly change. To be effective “generally,” AI needs the ability to revise its internal understanding when it encounters new situations.
A comprehensive world model allows an agent to simulate outcomes, reason about constraints and adapt to new environments, turning static predictions into flexible problem-solving.
With the right levels of adaptability, an agent can update its beliefs, reinterpret context and devise new strategies rather than relying on static rules. This capacity mirrors human intelligence, where prior knowledge is continuously reshaped to handle new situations, from learning unfamiliar technologies to navigating entirely new cultures.
After all, real-world decisions are rarely isolated. Actions interact with physics, timing, goals and human behavior, all at once. To plan effectively, an AGI must anticipate consequences, identify causation, and integrate knowledge across domains. Replicating humans’ integrated understanding and open-ended problem solving is what separates narrow intelligence from general intelligence.
A world model is worlds apart from an LLM
In short, world models provide AI with common sense to understand how things operate in a given environment — and what might happen if conditions or objects are altered.
For example, Meta’s JEPA was built towards this goal, focusing on predicting abstract representations rather than raw pixels, and it serves as a key building block for future world models.
Large language models, or LLMs, seem very powerful today, but they are dwarfed by world models. World models are multimodal AI models that are self-learning, capable of general reasoning and spatially aware. LLMs are just very good at predicting what comes next in a pattern.
Here’s my take on the main differences between a world model and an LLM:
- Learning methods. World models use continuous reinforcement learning to train themselves by observing their environment and inferring missing data, such as the PlaNet model-based reinforcement learning system. In contrast, LLMs are inefficient and require extensive training on massive datasets.
- Spatial awareness. World models like Genie 3 interact dynamically with multidimensional environments, enabling them to imagine and generate 3D, 4D and 5D visualizations of consistent, interactive worlds. LLMs, on the other hand, don’t have any awareness of space.
- Deep understanding. World models extrapolate from partial information to understand concepts like cause and effect and object permanence, whereas LLMs are limited by a shallow understanding of the world. They can predict the next word based on learned patterns, but they don’t understand what that word means.
- Long-term planning. By executing thousands of simulations, agents like those based on the DreamerV3 model can find the optimal sequence to achieve a goal, allowing them to plan for different contingencies and make informed decisions in new circumstances. LLM long-term planning, on the other hand, is fragile and unreliable.
- Multimodal inputs and outputs. World models are able to consume inputs in diverse forms, and also produce outputs in many different modes. For example, World Labs’ Marble is a multimodal world model that can reconstruct and simulate 3D environments from still images. LLMs are restricted to 2D inputs and outputs.
How does a world model work?
A world model is made up of three connected modules:
- The perception module. This section takes raw sensory inputs such as images, video and proprioception and encodes them into a compact latent representation of the environment.
- The prediction module. This is a dynamics model which handles probability distribution and captures causality and temporal structure. It probabilistically predicts the next latent state and the expected results of any actions.
- The planning (control) module. This module uses the output of the prediction model to simulate future trajectories and select actions that optimize achievements towards a goal.
“At its core, a world model is an internal representation that an AI system constructs to simulate the external environment. By continuously processing sensory data, a robot builds a dynamic blueprint of its surroundings,” explains Aurorain founder Luhui Hu. “This fusion of perception, prediction and planning mirrors cognitive processes in humans, setting the stage for more advanced robotic behavior.”
World models open up immense possibilities
There seem to be almost no limits to the potential waiting within world models, even if we set aside AGI aspirations for the moment. Here are just a few of the many ways world models could impact our lives.
Immersive visual experiences
With world models, it is finally becoming possible to build convincing worlds that you can interact with and experience. These are the very first capabilities that are coming on line, thanks to models like those developed by Decart, which can even be used as playable, game engine-free simulations.
“Because what’s running your game or your environment is an AI, you can interact with it in the ways we’re used to interacting with AI,” says Dean Leitersdorf, Decart’s CEO and cofounder.
“You’d be able to say, ‘Hey, can you turn this into Elsa themed?’ And then, boom, everything becomes Elsa-themed. ‘And can you add a flying elephant?’ And there’s a flying elephant in the game. And it’s not just there as a picture. You can actually interact with it. You can, I don’t know, punch the elephant, it’ll punch you back, or whatever you can do with an elephant.”
Fast iteration for innovations
Interactive, consistent world generation has consequences that go far beyond entertainment.
Models like Marble and Oasis that can generate persistent, downloadable 3D environments from text prompts, photos, videos, 3D layouts, or panoramic images currently focus on gaming and VR, but they also open the door to robotics training in simulated environments.
Multi-dimensional computational modeling enables use cases like exploring molecular chemistry, developing novel biomedical treatments, probing the makeup of the universe, designing earthquake-proof buildings, understanding complex climate patterns and researching new materials.
Video that obeys real-world laws
Among the use cases for world models that are most exciting to me, creating hyper-realistic AI-generated video certainly stands out as especially compelling.
As AI systems improve their understanding of physical dynamics, the distinction between video generation and world models is becoming less clear.
Runway’s GWM-1 general world model is a good example. It simulates reality through autoregressive, frame-by-frame video generation, a step Runway positions toward “general world models” that fully replicate the physics of simulated environments. Luma AI’s Modify Video has a similar goal.
Safer, more accurate decisions
Because world models can extrapolate from partial information, rapidly simulate many possible outcomes of multiple decisions and accurately forecast consequences, they can significantly improve decision-making across a wide range of use cases.
Possibilities include complicated multi-factor economic modeling, understanding climate patterns that are currently unpredictable and supporting complex long-term planning for regional and international policy decisions.
They also improve the safety of self-driving cars by enabling them to predict the outcomes of actions, such as changing lanes, to avoid collisions.
Realistic robots
Robots that serve as lab assistants, carers, 24/7 industrial workers and explorers in inaccessible and/or hostile environments are an old sci-fi dream. World models can help overcome a serious ongoing obstacle to making “physical AI” possible: the lack of relevant training data.
NVIDIA’s Cosmos platform 2.5 was built to predict and generate physics-aware videos of future environment states, producing synthetic training data generation for autonomous vehicles and robotics at massive scale.
“Unlike language models, training data is scarce for today’s robotic research. World models will play a defining role in this,” says Fei-Fei Li, CEO and founder of World Labs. “As they increase their perceptual fidelity and computational efficiency, outputs of world models can rapidly close the gap between simulation and reality. This will in turn, help train robots across simulations of countless states, interactions and environments.”
World models rest at AI’s next frontier
With so much power and so many possibilities, world models promise to take AI technology to a great leap forward beyond LLMs, making some of our long-held sci-fi wishes come true.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Go to Source
Author: