Consider a learning algorithm that has observed ten thousand sunrises. It predicts the sun will rise tomorrow. Now consider a second algorithm that has observed the same data but predicts the sun will not rise tomorrow. Both algorithms are perfectly consistent with every observation collected so far. No purely deductive argument can distinguish between them. This is Hume's problem of induction, and after nearly three centuries, it remains logically unsolved.
The challenge cuts deeper than intuition suggests. Every attempt to justify induction—appealing to past success, invoking uniformity of nature, or grounding it in probability—ultimately presupposes the very inductive principle it seeks to establish. The circularity is not a technical oversight. It is a structural feature of the problem. Deductive logic alone cannot bridge the gap between finite observation and universal generalization. Full stop.
And yet computation depends on induction. Machine learning systems generalize from training data to unseen inputs billions of times per second. Spam filters, medical diagnostics, protein folding predictors—all perform inductive inference as a matter of routine engineering. Something has clearly been rehabilitated, even if the philosophical problem persists. The question is what, exactly, and how. Two formal frameworks—PAC learning theory and Solomonoff induction—offer precise answers that sidestep the philosophical circularity without pretending to resolve it. They redefine what it means for induction to work.
The Philosophical Challenge: Hume's Indestructible Argument
David Hume's argument against the rational justification of induction, articulated in 1739, proceeds with deceptive simplicity. All inductive inferences assume that the future will resemble the past. But this assumption is itself an empirical claim—one that can only be supported by induction. Therefore, any justification of induction is circular. The argument is valid, and every serious attempt to refute it has failed.
Philosophers have not lacked creativity in their responses. Karl Popper tried to eliminate induction entirely, replacing it with falsificationism—science proceeds by refuting hypotheses, not confirming them. But falsificationism cannot explain how we choose which hypotheses to test, or how we decide that a well-corroborated theory is preferable to an untested one. These decisions are inductive at their core. Peter Strawson argued that asking for a justification of induction is like asking whether the law is legal—the question is malformed. But computational systems that perform induction need more than a dissolution of the question; they need actual procedures that produce reliable outputs.
The deeper issue is that Hume's argument targets deductive justification of induction—the demand that inductive conclusions follow necessarily from premises. Under this standard, induction is guaranteed to fail, because by definition, inductive conclusions go beyond what the premises logically entail. No finite dataset deductively implies any universal law. The problem is not that we lack a good argument. It is that the standard of justification being applied is categorically inappropriate for the task.
This is where the computational perspective becomes essential. Computer science does not ask whether induction is justified in some absolute philosophical sense. It asks a more tractable question: under what formal conditions can an algorithm generalize from finite data with quantifiable reliability? This reframing does not resolve Hume's problem. It dissolves the need to resolve it. The guarantees provided are conditional—if certain assumptions hold about the data-generating process, then certain performance bounds follow. The assumptions themselves are not justified a priori. They are structural commitments that can be empirically evaluated.
What makes this move powerful rather than evasive is that it is explicit about its assumptions. Classical induction hides its presuppositions behind intuitions about the uniformity of nature. Computational learning theory writes them down as mathematical axioms and derives consequences. The circularity does not vanish, but it becomes transparent, bounded, and subject to formal analysis. This is a genuinely different epistemic situation than the one Hume confronted.
TakeawayHume's argument is irrefutable on its own terms because it demands deductive certainty for an inherently non-deductive process. The computational turn does not solve the problem—it replaces an impossible demand with a tractable one.
PAC Learning: Rehabilitating Induction Through Probabilistic Guarantees
In 1984, Leslie Valiant introduced the Probably Approximately Correct (PAC) learning framework, and it fundamentally changed what it means to say an inductive procedure works. The framework does not claim that a learning algorithm will find the truth. It claims something weaker but formally precise: given enough data, the algorithm will, with high probability, produce a hypothesis that is approximately correct. The two parameters—probability of failure (δ) and approximation error (ε)—are explicitly specified and controllable.
Here is the formal structure. Let H be a hypothesis class and D be an unknown distribution over instances. A learning algorithm is PAC if, for any target concept in H, any distribution D, and any chosen ε and δ, there exists a sample size m such that after observing m examples drawn from D, the algorithm outputs a hypothesis with error at most ε with probability at least 1 − δ. The required sample size m is polynomial in 1/ε, 1/δ, and the complexity of the hypothesis class.
The philosophical significance is subtle but profound. PAC learning does not justify induction by proving the future will resemble the past. Instead, it provides a conditional guarantee: if the target concept lies within the hypothesis class and if examples are drawn independently from a fixed distribution, then generalization is formally guaranteed. These are strong assumptions, but they are stated openly. The framework is honest about what it presupposes.
Critically, the guarantee is distribution-free—it holds for any data distribution, not just well-behaved ones. This is a significant departure from classical statistical approaches that assume specific distributional forms. The VC dimension, introduced by Vapnik and Chervonenkis, measures the complexity of a hypothesis class and determines the sample complexity bound. Finite VC dimension is both necessary and sufficient for PAC learnability. This gives us an exact characterization of which inductive problems are solvable and which are not.
The result is a form of induction that Hume could not have anticipated: one that makes no claim to certainty, specifies exactly how wrong it might be and with what probability, and derives its guarantees from combinatorial properties of hypothesis spaces rather than from metaphysical assumptions about nature. It is induction stripped of philosophical pretension and rebuilt on mathematical bedrock. The circularity is not eliminated—the assumption that future data comes from the same distribution is still inductive—but it is quarantined into a single, explicit, testable assumption.
TakeawayPAC learning does not justify induction philosophically—it engineers around the philosophical gap by converting vague confidence in generalization into precise, parameterized, mathematically derived bounds on error.
Solomonoff Induction: The Optimal Predictor You Cannot Run
Ray Solomonoff's 1964 theory of inductive inference takes a radically different approach. Rather than bounding error for specific hypothesis classes, it constructs a universal prior probability distribution over all computable hypotheses. Given an observed binary string, Solomonoff induction assigns to each possible continuation a probability proportional to the sum of 2−|p| over all programs p that produce that string, where |p| is the length of the program in bits. Shorter programs contribute more. This is Occam's razor made formally precise.
The theoretical properties are remarkable. Solomonoff induction converges to the true distribution generating the data, provided that distribution is computable. The total expected divergence—measured by Kullback-Leibler divergence—between the Solomonoff predictor and the true distribution is bounded by the Kolmogorov complexity of the true distribution. In plain terms: the predictor starts making accurate predictions quickly, and the more structured the environment, the faster it converges.
This framework provides what is arguably the strongest formal vindication of induction available. It shows that a single, fixed prediction method can learn any computable environment without being told in advance which environment it is in. The prior encodes a preference for simplicity—shorter programs are weighted more heavily—but this preference is not an arbitrary aesthetic choice. It follows from the structure of computation itself: there are exponentially more long programs than short ones, so any universal enumeration necessarily concentrates probability mass on simpler hypotheses.
The devastating catch is that Solomonoff induction is uncomputable. The universal prior requires summing over all programs that produce a given output, which is equivalent to solving the halting problem. No physical machine can implement it. This is not a merely practical limitation—it is a fundamental result of computability theory. AIXI, Marcus Hutter's extension of Solomonoff induction to the reinforcement learning setting, inherits the same uncomputability.
Yet the uncomputability is itself philosophically instructive. It tells us that optimal induction—the best possible prediction strategy—exists as a well-defined mathematical object but cannot be realized by any finite computational process. Every practical learning algorithm is an approximation of Solomonoff induction, trading optimality for tractability. PAC learning, neural networks, Bayesian inference with computable priors—all of these occupy different points on the tradeoff curve between Solomonoff's ideal and computational feasibility. The uncomputable optimum serves as a fixed point against which all practical induction can be measured and understood.
TakeawaySolomonoff induction proves that a formally optimal inductive method exists and that it is impossible to compute—establishing both the deepest vindication and the deepest limitation of induction simultaneously.
Hume was right: induction cannot be deductively justified. But computational learning theory reveals that this is not the devastating blow it appeared to be. The demand for deductive justification was the wrong demand. PAC learning and Solomonoff induction offer two distinct responses—one providing tractable probabilistic guarantees under explicit assumptions, the other defining an optimal but uncomputable ideal.
Together, they show that induction does not need to be justified in the classical sense. It needs to be characterized—its assumptions made explicit, its limitations quantified, its guarantees derived rather than asserted. This is what computation contributes to a problem that philosophy alone could not solve.
The implication for AI research is direct: every learning system embodies a stance on the problem of induction, whether its designers recognize it or not. The hypothesis class, the prior, the loss function—these are all answers to Hume's question, expressed in code rather than prose. Understanding them as such is not philosophical indulgence. It is engineering clarity.