RAG has become the default way to make AI systems useful, but it was never a true solution to reasoning. As new approaches emerge—from structured retrieval to world models—it’s becoming clear where RAG works, where it breaks, and what comes next.
.png)
We’ve spent the last two years pretending that if we just fed models better context, they would start to behave like they understand the world. And now the research is catching up to what many builders have already felt in production: they don’t.
Apple’s latest work on Retrieval-Augmented Generation forces a more honest look at what RAG actually is.
At its core, RAG assumes that the model is missing knowledge, not structure. So we retrieve documents, chunk them, embed them, rank them, and inject them into the prompt—hoping that somewhere in that pile the model will assemble the right answer. And sometimes it does. But the failure cases are consistent and revealing. The model pulls the right documents and still hallucinates. It mixes facts across sources that should never be combined. It fails to reason across multiple pieces of evidence because it has no mechanism for composing them beyond token-level pattern matching.
Apple’s contribution is not just better retrieval. It is an attempt to restructure how information is presented to the model so that it can actually reason over it. Instead of treating retrieved context as a flat list of passages, they introduce more structured representations and tighter coupling between retrieval and generation. The model is guided through the information rather than overwhelmed by it.
This shows up in practice in ways that matter. Take something as simple as a municipal policy assistant. In a traditional RAG setup, you ask a question about zoning regulations and the system retrieves a handful of policy documents. The model then tries to stitch together an answer, often blending clauses from different sections incorrectly.
With Apple’s approach, the retrieval layer organizes the information in a way that preserves relationships and constraints, so the model is less likely to produce something that sounds right but is legally wrong. The improvement is not just accuracy; it is reliability under composition, which is where most real-world systems break.
But even here, there is a limit. Apple is making RAG more disciplined, more structured, more aligned with how reasoning should happen. Yet the model itself still does not have an internal representation of the world it is reasoning about. It is still reacting to context rather than operating on a model of reality.
This is exactly the gap that Yann LeCun has been pointing to. His argument, and the direction behind the so-called world models or world machines, is that intelligence cannot emerge from systems that only predict the next token. These systems do not build persistent representations. They do not simulate outcomes. They do not understand cause and effect in any meaningful way.
You can see this in simple examples. Ask a language model to plan a multi-step physical task, like rearranging objects in a room, and it quickly loses coherence. It cannot track state over time because there is no underlying model of the environment. Everything is recomputed from scratch with each prompt.
LeCun’s approach flips the problem. Instead of asking how we can better feed information into a model, he asks how we can build models that learn the structure of the world directly. This involves learning latent representations that capture how things evolve, predicting future states, and grounding reasoning in something more stable than text correlations.
In practical terms, imagine a system that does not need to retrieve a policy document every time you ask a question about it, because it has already internalized the rules, constraints, and relationships within that domain. Or consider a logistics system that can simulate the impact of a road closure before it happens, rather than retrieving past examples and hoping they generalize.
This is not just more data or better prompts; it is a different kind of model altogether.
What makes this moment interesting is that both approaches are advancing at the same time, and they are solving different layers of the same problem.
Apple is addressing the immediate pain that anyone deploying AI systems faces today. RAG, as it is commonly implemented, is fragile. It breaks under edge cases, it produces answers that are hard to trust, and it requires constant tuning. By introducing more structure into the retrieval and reasoning process, Apple is making these systems usable in environments where correctness matters.
This is especially relevant in government and enterprise settings, where a wrong answer is not just inconvenient, it can have real consequences. You can imagine a city deploying an AI assistant for permit applications. With naive RAG, the assistant might give conflicting guidance depending on which documents are retrieved. With a more structured approach, the assistant can enforce consistency and better reflect the underlying rules.
At the same time, LeCun’s direction is less about fixing today’s systems and more about questioning whether they are the right foundation at all. If models cannot form internal world representations, then no amount of retrieval will give them true reasoning capabilities. You can keep adding context, refining pipelines, and improving embeddings, but you are still working around a limitation rather than removing it.
The analogy here is useful. RAG is like giving someone a stack of books every time they need to solve a problem, while world models are about teaching them how the world works so they can reason without constantly looking things up.
For builders, this creates a very practical tension. The systems that deliver value today will almost certainly rely on improved versions of RAG. You need grounding, you need access to up-to-date information, and you need mechanisms to control what the model says. Apple’s work moves this forward in a meaningful way by making the interaction between retrieval and reasoning more coherent.
But if you stop there, you risk building systems that are fundamentally limited in what they can do.
The next wave of capability will come from models that can hold and manipulate structured representations internally, reducing the need for constant retrieval and enabling more robust reasoning over time.
The real opportunity is in combining these perspectives without confusing them. Use structured RAG to make systems reliable and deployable today, but design with the assumption that the underlying models will evolve. This means thinking carefully about where knowledge lives, how it is represented, and how it can transition from being externally retrieved to internally modeled.
It also means being honest about what current systems can and cannot do. They can answer questions, summarize documents, and assist with workflows when given the right context. They cannot yet reason about the world in a way that is stable, persistent, and grounded in causality.
We’ve spent the last two years pretending that if we just fed models better context, they would start to behave like they understand the world. And now the research is catching up to what many builders have already felt in production: they don’t.
For builders, this creates a very practical tension. The systems that deliver value today will almost certainly rely on improved versions of RAG. You need grounding, you need access to up-to-date information, and you need mechanisms to control what the model says. Apple’s work moves this forward in a meaningful way by making the interaction between retrieval and reasoning more coherent.
But if you stop there, you risk building systems that are fundamentally limited in what they can do.
The next wave of capability will come from models that can hold and manipulate structured representations internally, reducing the need for constant retrieval and enabling more robust reasoning over time.
The real opportunity is in combining these perspectives without confusing them. Use structured RAG to make systems reliable and deployable today, but design with the assumption that the underlying models will evolve. This means thinking carefully about where knowledge lives, how it is represented, and how it can transition from being externally retrieved to internally modeled.
It also means being honest about what current systems can and cannot do. They can answer questions, summarize documents, and assist with workflows when given the right context. They cannot yet reason about the world in a way that is stable, persistent, and grounded in causality.