Occam's razor—the principle that simpler explanations should be preferred—has haunted epistemology for centuries. Philosophers invoke it constantly, scientists treat it as methodological bedrock, and yet no one has offered a fully satisfactory account of what simplicity actually means. Curve-fitting problems illustrate the difficulty starkly: given a finite set of data points, infinitely many polynomials of varying degree pass through them. We prefer the lower-degree polynomial, but justifying that preference without circularity has proven remarkably elusive.
Algorithmic information theory, pioneered independently by Ray Solomonoff, Andrei Kolmogorov, and Gregory Chaitin in the 1960s, offers a resolution that is both mathematically precise and philosophically profound. By identifying the complexity of an object with the length of the shortest program that produces it on a universal Turing machine, we obtain a machine-independent, objective measure of simplicity grounded in computation rather than in aesthetic intuition or ad hoc parameter counting.
The implications for formal epistemology are far-reaching. Solomonoff's universal prior, constructed from Kolmogorov complexity, provides a theoretically optimal solution to the problem of induction—one that converges to the true distribution generating any computable sequence. Yet the measure is itself uncomputable, raising deep questions about the boundary between ideal rationality and feasible inference. This article traces the formal architecture of these results and examines what they tell us about the nature of knowledge, prediction, and rational belief.
Complexity as Program Length
The Kolmogorov complexity of a finite binary string x, denoted K(x), is defined as the length of the shortest binary program p such that a fixed universal Turing machine U outputs x when given p as input, then halts. Formally: K(x) = min{|p| : U(p) = x}. This definition captures an intuitive idea with surprising precision: a string is simple if it can be described concisely, and complex if no description shorter than the string itself exists.
A critical result is the invariance theorem. If we choose a different universal Turing machine U', the resulting complexity measure K'(x) differs from K(x) by at most a constant that depends on U and U' but not on x. That is, |K(x) − K'(x)| ≤ c for all x. This means the definition is robust—up to an additive constant, Kolmogorov complexity is an intrinsic property of the string, not an artifact of the reference machine. For sufficiently long strings, the choice of formalism becomes negligible.
Why does this matter for epistemology? Consider competing hypotheses as programs that generate observed data. A hypothesis with lower Kolmogorov complexity literally requires fewer bits to specify. The preference for simplicity is thus recast as a preference for shorter descriptions in a universal programming language. This is not a vague aesthetic; it is a quantitative, formally defined bias toward compressibility.
The framework also yields a rigorous notion of randomness. A string x of length n is algorithmically random if K(x) ≥ n—that is, no program shorter than the string itself can produce it. Such strings are maximally complex and incompressible. Most strings of length n are random in this sense, because there are far fewer short programs than long strings. This gives us a precise partition: structured, law-like data is compressible; noise is not.
Kolmogorov complexity thus formalizes the intuition behind Occam's razor without relying on subjective judgments about elegance or parsimony. It replaces the question which hypothesis is simpler? with the well-defined question which hypothesis has the shorter minimal description? The shift from informal heuristic to formal measure is exactly the kind of progress that formal epistemology aspires to achieve.
TakeawaySimplicity is not an aesthetic preference—it is the length of the shortest computation that produces your hypothesis, and this measure is essentially unique across all sufficiently powerful formal systems.
Solomonoff Induction
Solomonoff's theory of inductive inference, first proposed in 1964 and refined over subsequent decades, constructs a universal prior probability distribution over hypotheses using Kolmogorov complexity. The idea is elegant: assign to each hypothesis (modeled as a program p for a universal Turing machine) a prior probability proportional to 2−|p|, where |p| is the length of the program in bits. Shorter programs receive exponentially higher prior weight. The resulting distribution, often denoted M, is called the universal semimeasure.
The central convergence theorem establishes that Solomonoff's universal prior is optimal for prediction in the following sense. Suppose data is generated by any computable probability distribution μ. Then the predictions made by M converge to the predictions made by μ with probability 1 under μ. More precisely, the expected total KL-divergence between μ and M is bounded by K(μ)·ln 2, where K(μ) is the Kolmogorov complexity of the true distribution. This is a remarkable result: a single prior, defined without knowledge of the true generating process, dominates every computable alternative.
The philosophical significance is profound. Solomonoff induction provides a complete and formally justified solution to the problem of induction—at least within the class of computable environments. It vindicates Bayesian epistemology's core claim that rational belief revision via conditionalization is optimal, while simultaneously providing the prior that Bayesian approaches notoriously leave unspecified. The universal prior is not arbitrary; it emerges from the structure of computation itself.
Solomonoff's framework also illuminates why Occam's razor works. The 2−|p| weighting encodes a built-in simplicity bias that is not merely a heuristic but a provably necessary feature of optimal prediction. Complex hypotheses are not excluded—they simply receive lower initial credence. As evidence accumulates, the posterior concentrates on the true hypothesis regardless of its complexity. The simplicity bias accelerates convergence but does not prevent eventual accuracy.
Frank Ramsey's vision of subjective probability as the foundation of rational action finds its most rigorous expression here. Solomonoff induction shows that there exists a uniquely privileged way to assign degrees of belief over computable hypotheses—one that is demonstrably better, in a precise information-theoretic sense, than any computable alternative. The universal prior is not one Bayesian prior among many; it is, in a deep formal sense, the prior for inductive reasoning.
TakeawaySolomonoff's universal prior is the only prior you would ever need for prediction—it provably converges to the truth for any computable data source, turning Occam's razor from a philosophical intuition into a mathematical theorem.
Incomputability and Its Epistemological Weight
The power of Kolmogorov complexity comes with a fundamental limitation: K(x) is not computable. No algorithm can take an arbitrary string x as input and output K(x). The proof proceeds by reduction to the halting problem. If we could compute K(x), we could enumerate strings in order of complexity and generate the first string of complexity exceeding any given bound—contradicting the fact that such a string must have low complexity by virtue of being so generated. This is a clean diagonal argument of the kind that recurs throughout computability theory.
The incomputability of K has immediate consequences for Solomonoff induction. The universal prior M is a semicomputable function—it can be approximated from above but not computed exactly. This means that Solomonoff induction, despite being the theoretical gold standard for prediction, cannot be implemented by any real computational agent. It exists as an ideal, a limit concept against which practical inference methods can be measured but never fully realized.
Some epistemologists regard this incomputability as a devastating objection. If the ideal reasoner cannot exist even in principle as a finite computational process, what use is the theory? The response from formal epistemology is instructive: theoretical ideals routinely outrun practical computation, and this does not diminish their normative force. Bayesian decision theory, Nash equilibrium, and logical omniscience are all uncomputable in the general case, yet they structure our understanding of rationality, strategic interaction, and deductive reasoning respectively.
Moreover, Kolmogorov complexity admits computable approximations that are practically powerful. The Minimum Description Length (MDL) principle, developed by Jorma Rissanen, replaces the universal Turing machine with a restricted model class and uses two-part code lengths as a tractable proxy for Kolmogorov complexity. Lempel-Ziv compression algorithms provide another approximation. These methods inherit the philosophical insights of algorithmic information theory while remaining implementable.
The incomputability result thus has a dual epistemological significance. On one hand, it establishes a hard ceiling on what any finite agent can know about complexity—we cannot, in general, determine the simplest explanation for a given data set. On the other hand, it reveals that the structure of optimal induction is mathematically well-defined even when not effectively realizable. The gap between ideal and feasible rationality is itself a formal object worthy of study, and Kolmogorov complexity gives us the tools to measure it precisely.
TakeawayThe fact that perfect simplicity measurement is provably impossible does not weaken the theory—it sharpens our understanding of the exact boundary between what ideal rationality demands and what finite minds can achieve.
Kolmogorov complexity transforms the ancient problem of simplicity from a philosophical puzzle into a precise mathematical concept. By grounding simplicity in computation—the length of the shortest program generating a given output—algorithmic information theory provides the formal backbone that Occam's razor has always lacked.
Solomonoff's universal prior takes this foundation and constructs from it a complete theory of inductive inference, one that provably converges to the truth for any computable environment. The price is incomputability, but the payoff is a normative standard against which all practical inference methods can be evaluated. Formal epistemology gains both a benchmark and a boundary.
What emerges is a picture of rationality that is at once triumphant and humble. We can characterize the ideal reasoner with mathematical precision. We can prove that no computable reasoner matches it. And we can measure, formally, how close our best approximations come. That triple achievement—ideal, limit, and gap—is the lasting contribution of algorithmic information theory to the theory of knowledge.