PAC Learning Theory: How Many Examples Suffice for Knowledge?

white framed sunglasses close-up photography

8 min read

Valiant's PAC framework formalizes inductive learning by requiring that a hypothesis be probably approximately correct—within ε error with probability at least 1 − δ—after a bounded number of examples.

For finite hypothesis classes, the sample complexity grows only logarithmically with the number of hypotheses, showing that large hypothesis spaces need not demand proportionally more evidence.

The Vapnik-Chervonenkis dimension measures the combinatorial complexity of a hypothesis class and provides a necessary and sufficient condition for PAC learnability.

The Fundamental Theorem of PAC Learning establishes that generalization from finite evidence is guaranteed if and only if the hypothesis class has finite VC dimension, making learnability a structural property rather than an empirical assumption.

These results decompose the philosophical problem of induction into two precise, checkable conditions—distributional stability and bounded hypothesis complexity—offering formal epistemology a quantitative framework for evaluating when generalization is justified.

The problem of induction has haunted epistemology since Hume: how can finite observations justify general claims about unobserved cases? Philosophers have offered numerous qualitative responses—appeals to natural kinds, inference to the best explanation, pragmatic vindication—but none has provided a precise, quantitative answer to the most basic version of the question: how many examples do you actually need before generalization becomes rational?

Computational learning theory, and in particular Leslie Valiant's Probably Approximately Correct (PAC) framework, offers something remarkable. It supplies exact bounds on the number of observations required to learn a concept from examples, given explicit tolerances for error and uncertainty. The framework does not eliminate inductive risk. Instead, it quantifies it with mathematical precision, converting an ancient philosophical impasse into a tractable engineering problem.

This matters far beyond computer science. PAC theory demonstrates that learnability is not a uniform property of all possible generalizations but depends critically on the structural complexity of the hypothesis class under consideration. The key measure of that complexity—the Vapnik-Chervonenkis dimension—turns out to formalize something epistemologists have long intuited: that simpler hypothesis spaces require less evidence to learn from. What follows is an examination of how these results bear on foundational questions about induction, evidence, and the conditions under which generalization from experience is epistemically justified.

Probably Approximately Correct: Formalizing Inductive Success

Valiant's 1984 PAC framework begins with a deceptively simple setup. A learner receives labeled examples drawn independently from an unknown probability distribution D over some instance space X. There exists a target concept c—some subset of X—that the learner is trying to identify. The learner selects a hypothesis h from a predefined hypothesis class H, aiming to approximate c well with respect to D.

The crucial innovation is the double parameterization of success. The learner need not get the concept exactly right; it must only find a hypothesis whose error under D is at most ε (the accuracy parameter). And it need not succeed every time; it must succeed with probability at least 1 − δ (the confidence parameter). A concept class is PAC-learnable if there exists an algorithm that, for any ε > 0 and any δ > 0, outputs such a hypothesis using a number of examples polynomial in 1/ε and 1/δ.

For finite hypothesis classes, the result is elegant and immediate. If |H| is finite, then drawing m ≥ (1/ε)(ln|H| + ln(1/δ)) examples suffices to guarantee PAC learning via the consistent hypothesis strategy—simply pick any h ∈ H that agrees with all observed data. The proof relies on a union bound argument: the probability that any bad hypothesis (error > ε) survives m examples decays exponentially in m, and summing over all bad hypotheses yields the bound.

Notice what this achieves epistemologically. The sample complexity bound is distribution-free—it holds for any underlying distribution D. The learner need not assume uniformity, stationarity, or any specific distributional form. This is a direct response to a key Humean worry: the framework does not require assuming that the future resembles the past in any particular way. It requires only that training and test data share a distribution, a far weaker and more defensible assumption.

The logarithmic dependence on |H| is equally significant. It tells us that the number of examples needed to learn grows only with the logarithm of the number of hypotheses. Doubling the size of the hypothesis space adds only a constant to the sample requirement. This already hints at a deep structural insight: what matters for learnability is not how many hypotheses you entertain, but the effective complexity of the class they form.

Takeaway
Inductive success need not be certain or exact to be epistemically valuable. PAC theory shows that precisely quantified confidence and accuracy—probably approximately correct—is both achievable and sufficient for rational generalization.

VC Dimension and the Geometry of Learnability

Finite hypothesis classes are the easy case. The deeper question concerns infinite hypothesis classes—linear classifiers, polynomial functions, neural networks—where the logarithm of cardinality is undefined. The Vapnik-Chervonenkis dimension provides the answer. Introduced by Vladimir Vapnik and Alexey Chervonenkis in 1971, the VC dimension of a hypothesis class H is the largest set of points that H can shatter—that is, classify in all 2ⁿ possible ways.

Consider linear classifiers in ℝ². Three non-collinear points can be shattered: for every possible labeling of three points as positive or negative, some line separates them correctly. But no set of four points can be shattered by lines—there always exists a labeling no line can achieve. So the VC dimension of linear classifiers in the plane is exactly 3. More generally, the VC dimension of hyperplanes in ℝ^d is d + 1, giving a precise measure of how expressive the class is.

The Fundamental Theorem of PAC Learning establishes that a hypothesis class is PAC-learnable if and only if its VC dimension is finite. When the VC dimension is d, the sample complexity is Θ((d + ln(1/δ))/ε). This is a necessary and sufficient characterization—no other structural property of H determines learnability. The result unifies the finite and infinite cases: for finite classes, the VC dimension is at most log₂|H|, recovering the earlier bound up to constants.

The philosophical import of this theorem is substantial. It tells us that whether a domain of inquiry admits reliable generalization from evidence is a structural fact about the hypothesis class, not about the evidence itself or the distribution generating it. A class of hypotheses either admits finite-sample learning or it does not, and this is determined entirely by its combinatorial geometry—how many distinct classification patterns it can produce.

This reframes the problem of induction in a fundamental way. Hume asked whether induction is ever justified. The PAC framework answers: generalization is justified relative to a hypothesis class of bounded VC dimension. The justification is conditional, but the conditions are precise and checkable. An epistemologist can now ask not whether induction works in general, but whether the specific class of hypotheses under consideration has finite VC dimension—and if so, exactly how many observations rational confidence requires.

Takeaway
Learnability is not a property of evidence or of nature's uniformity—it is a structural property of the hypothesis class. The VC dimension tells you exactly how complex a space of hypotheses can be before finite evidence becomes insufficient for generalization.

Philosophical Implications: Induction Conditional on Complexity

What does PAC learning theory actually resolve about the philosophical problem of induction? It does not answer Hume's challenge head-on—it does not prove that unobserved cases will resemble observed ones. But it accomplishes something arguably more useful: it decomposes the problem into tractable components. The justification of induction becomes conditional on two explicit assumptions—that training and test data share a distribution, and that the target concept lies within a hypothesis class of bounded complexity.

The first assumption—shared distribution—is itself philosophically interesting. It is weaker than assuming the uniformity of nature. It does not require that the future be like the past in any qualitative sense. It requires only that the mechanism generating examples does not change between the learning phase and the deployment phase. This is a form of stationarity, but a minimal one, and it is precisely the assumption that, when violated, should undermine our confidence in generalization.

The second assumption—bounded VC dimension—formalizes the role that constraints on hypothesis space play in making induction possible. Unrestricted induction over all possible Boolean functions has infinite VC dimension and is provably unlearnable. This is the formal analogue of Goodman's new riddle: without constraints on which predicates are admissible, any finite evidence is compatible with infinitely many incompatible generalizations. VC theory quantifies exactly how much constraint is needed to escape this predicament.

There is a deeper lesson about the relationship between computational learning theory and Bayesian epistemology. PAC bounds are frequentist in character—they hold uniformly over all distributions—while Bayesian approaches encode background knowledge through priors. Yet the structural insights converge: in both frameworks, learning requires that the space of hypotheses be appropriately constrained. A Bayesian with a prior that spreads mass over hypotheses of unbounded complexity will also fail to converge. PAC theory makes this constraint explicit and measurable.

Finally, sample complexity results offer a novel perspective on epistemic humility. They tell us not merely that we might be wrong, but how wrong we might be and how likely that degree of error is, given a specific amount of evidence. This is a richer form of fallibilism than philosophy has traditionally articulated. It transforms vague appeals to human limitations into precise, falsifiable claims about the relationship between evidence, hypothesis complexity, and the reliability of generalization.

Takeaway
PAC theory does not solve the problem of induction—it dissolves it into two checkable conditions: distributional stability and bounded hypothesis complexity. Where those conditions hold, induction is not just pragmatically useful but mathematically guaranteed to succeed.

Computational learning theory does not replace traditional epistemology—it sharpens it. By translating qualitative questions about justification into quantitative questions about sample complexity, PAC theory reveals that the problem of induction is not monolithic. It fractures along a precise axis: the combinatorial complexity of the hypothesis class.

The VC dimension gives epistemologists something they have long lacked—a measure of when and why generalization from finite evidence succeeds. Learnability is not an article of faith or a pragmatic bet. It is a structural theorem, conditional on assumptions we can state, examine, and in many cases verify.

Where Hume saw an impassable gap between observation and generalization, formal epistemology now sees a bridge with known load limits. The question is no longer whether induction is justified, but for which hypothesis classes, at what sample sizes, and with what guarantees.