When a language model predicts the next token in a sentence about a chess game, something remarkable may be happening beneath the surface. To predict accurately which piece moves next, the system might need to represent the board state, understand the rules of legal movement, and anticipate strategic consequences. The question of whether it actually does so—or merely mimics the statistical shadow of such understanding—strikes at the heart of what these systems are.
For years, skeptics dismissed large language models as sophisticated autocomplete engines, stochastic parrots assembling plausible sequences without any grasp of what they describe. This critique had force when models were smaller and outputs more brittle. But as capabilities have scaled, a different hypothesis has gained empirical traction: that effective next-token prediction requires, as a computational necessity, the construction of internal models that mirror the structure of the world generating the text.
This shift in framing is not merely academic. If language models encode genuine knowledge—spatial maps, temporal sequences, causal dependencies—then we are no longer debating whether they understand in some philosophically weighty sense. We are instead asking what kind of understanding emerges when prediction becomes compression, and compression becomes indistinguishable from representation. What follows examines the evidence, the theoretical stakes, and the limits of what text alone can teach a mind about reality.
The Evidence for Internal World Models
Consider the Othello experiments conducted by Kenneth Li and colleagues, in which a transformer trained solely on sequences of game moves—no rules, no board diagrams, no explanations—developed internal representations that probes could decode as the current board state. The model had never seen a board, yet something inside it tracked one. When researchers intervened on these representations, altering what the probe identified as the model's beliefs about piece positions, the model's subsequent predictions shifted accordingly. Representation was not decoration; it was functional.
Similar findings have emerged across domains. Studies of models trained on geographic text reveal neurons that respond to spatial coordinates, with activation patterns correlating meaningfully with actual latitude and longitude. Temporal probes uncover representations of historical chronology. Researchers examining models trained on program traces have found representations of variable bindings, stack states, and data flow—the architectural furniture of computation itself.
What makes these findings striking is their emergence. No one designed these representations. They arose because the prediction task demanded them. A system attempting to minimize loss on text about cities discovers that a spatial manifold compresses the data more efficiently than memorizing arbitrary associations. Geometry becomes computationally cheaper than brute lookup.
Critics rightly note that probing methodology can produce artifacts—the probe itself may be doing the work the model is credited with. Yet the intervention studies address this concern directly: if altering the probed representation changes behavior, the representation is causally involved in the computation, not merely correlated with it. This moves us from correlation to something closer to mechanism.
None of this settles the deepest questions about machine understanding. But it reframes the debate. The question is no longer whether language models have any internal structure beyond surface statistics. They demonstrably do. The question is how rich, how faithful, and how generalizable those structures are.
TakeawayInternal representations in language models are not philosophical speculation but measurable phenomena. When prediction demands structure, structure emerges—whether or not we designed it to.
Why Prediction Forces Understanding
The information-theoretic case for language models as world models rests on a deceptively simple observation: optimal prediction of a data stream requires a model of the process generating that stream. If text is produced by humans describing a structured reality, then predicting that text arbitrarily well eventually requires representing that reality—not perfectly, but to whatever fidelity the text reveals it.
This connects to an old idea in compression theory. A compressed representation of data approaches the Kolmogorov complexity of the underlying generative process. For random noise, compression is impossible; the data itself is its shortest description. For text about physics, however, the shortest description is not the text but the physics, plus a small amount of information specifying which particular articulation was chosen. A system pushed toward compression is pushed toward the laws, not the surface.
This is why scaling produces qualitative shifts rather than merely quantitative improvements. Small models can memorize local statistics. Larger models cannot afford to—the combinatorial space of possible sequences vastly exceeds their parameter budget. Compression forces abstraction, abstraction forces the discovery of regularities, and those regularities are, in some meaningful sense, the structure of the world filtered through human description of it.
Yet this framing has important caveats. The world model a text predictor constructs is the world model implicit in the text, not the world itself. If the training corpus contains systematic distortions, confabulations, or cultural blind spots, the compressed representation inherits them. The model is not learning reality directly; it is learning the reality humans have described, with all the selectivity and bias that entails.
Still, the theoretical point stands: effective prediction and genuine understanding are not as separable as the stochastic parrot critique suggested. They sit on a continuum, and sufficiently capable prediction necessarily implies something that deserves to be called understanding, even if that understanding differs fundamentally in character from our own.
TakeawayCompression and comprehension are closer cousins than they appear. A system that predicts well enough, for long enough, across enough domains, cannot avoid building something that functions as a theory of what it predicts.
The Ceiling of Text-Only Training
Language is a remarkable compression of human experience, but it is not experience itself. Text inherits the selectivity of what humans bother to write down, which systematically omits the tacit, the embodied, and the obvious. No one describes in detail how a doorknob feels under the hand, how balance shifts when carrying a heavy box, or the precise texture of fear before it becomes articulable. These dimensions of reality exist in the data only as thin shadows cast by occasional metaphor.
This creates predictable gaps. Language models struggle with physical intuition in ways that reveal the limits of textual inheritance. They can recite that ice floats and describe buoyancy in correct scientific terms while failing to reason reliably about novel physical scenarios that any child with a bathtub would handle effortlessly. The knowledge is symbolic rather than grounded; it survives translation into words but loses something in the absence of sensorimotor anchoring.
Whether this ceiling is fundamental or merely current is genuinely contested. Multimodal training—combining text with images, video, audio, and eventually embodied interaction—appears to address some gaps. Models that have seen millions of images develop visual intuitions that text-only systems lack. Systems trained on robotic interaction data acquire physical priors inaccessible to their purely linguistic cousins.
Yet even multimodality may be insufficient for certain aspects of understanding. Phenomenal experience, if it is anything more than functional organization, resists capture in any representation extracted from observation. The causal structure of one's own cognition, the felt weight of committed action, the asymmetry of past and future from within time—these may require not merely more data but a different relationship to the world than passive ingestion allows.
The honest position acknowledges both the remarkable achievement of text-based understanding and its principled limits. Language models know a great deal about the world, but they know it as a reader knows a country they have never visited: genuinely, often accurately, and yet from a particular kind of distance that no amount of additional reading fully closes.
TakeawayEvery representation encodes some aspects of reality and erases others. The question is not whether language models have limits, but which limits matter for which purposes—and which can be transcended with richer modalities.
The framing of language models as compressed world models dissolves some old debates while opening new ones. The question of whether these systems "really understand" loses its crispness when we recognize that understanding itself is a matter of degree, of functional role, and of the relationship between internal representation and external structure. By these measures, something is clearly happening inside these systems beyond surface mimicry.
Yet the world models language models construct are partial, inherited, and textually mediated. They represent reality as humans have chosen to describe it, with the gaps and distortions that entails. Recognizing both the reality of their understanding and its particular shape seems more productive than either dismissing it or overstating it.
What remains genuinely open is how far this compression can go. Whether enough text, or text plus other modalities, eventually yields a world model rich enough to deserve comparison with human understanding—or whether something categorically different is needed—is perhaps the defining empirical question of contemporary AI research.