The Weight of Evidence: Fisher, Good, and the Measure of Evidential Support

white framed sunglasses close-up photography

6 min read

Keynes identified that probability alone fails to capture how much evidence supports a credence, introducing the concept of evidential weight.

I.J. Good formalized weight as the logarithm of the likelihood ratio, yielding a measure with crucial additive properties across independent evidence.

Log-likelihood ratios connect Bayesian inference to information theory, making weight literally measurable in bits of discriminatory information.

Weight is decision-theoretically inert for terminal choices but decisive for information acquisition, since it determines the value of further inquiry.

Orthodox Bayesian representations using a single probability function are expressively incomplete; rational belief requires tracking both credence and its evidential foundations.

Consider two scenarios. In the first, you assign probability 0.5 to a coin landing heads because you've never seen it flipped. In the second, you assign probability 0.5 because you've flipped it ten thousand times and observed exact balance. The credences are identical; the epistemic situations are not. What distinguishes them?

This puzzle, articulated with characteristic precision by John Maynard Keynes in his 1921 Treatise on Probability, exposes a fundamental limitation of any purely probabilistic representation of belief. Probability quantifies the balance of evidence for and against a hypothesis. But it remains silent on the quantity and quality of that evidence—what Keynes called its weight.

The formal epistemological tradition has produced several rigorous proposals for measuring evidential weight, most notably I.J. Good's log-likelihood ratio framework, which descends from R.A. Fisher's likelihood methods. These approaches reveal that weight and probability are mathematically and conceptually distinct, and that rational agents must track both. The implications extend beyond academic taxonomy: in decision contexts involving information acquisition, the value of further evidence cannot be captured by probability alone. To understand why, we need to examine the formal structure of evidential support itself.

Probability Versus Weight: Keynes's Distinction Formalized

Keynes's insight begins with a deceptively simple observation. Let P(h|e) denote the probability of hypothesis h given evidence e. As we accumulate additional relevant evidence e', the conditional probability P(h|e ∧ e') may increase, decrease, or remain unchanged. But something else changes monotonically: our epistemic state becomes better grounded.

Keynes proposed that weight V(h|e) increases with the addition of any relevant evidence, regardless of whether that evidence favors or disfavors h. This stands in sharp contrast to probability, which represents a balance and can move in either direction. Formally, if e' is relevant to h given e—meaning P(h|e ∧ e') ≠ P(h|e)—then V(h|e ∧ e') > V(h|e).

The two-coins puzzle illustrates the practical import. Both situations yield P(heads) = 0.5, but the second has dramatically higher weight. A Bayesian who reports only credence has compressed away epistemically crucial information. The agent with high-weight evidence is far more resistant to revision by new data; their posterior is anchored by accumulated likelihood ratios that any single observation must overcome.

Subsequent work by Branden Fitelson, James Joyce, and others has refined this distinction. Joyce's analysis treats weight as related to the resilience of a probability assignment—how much it would shift under hypothetical further evidence. This connects weight to higher-order uncertainty: probability captures first-order belief, while weight captures something about our confidence in that belief itself.

The philosophical payoff is substantial. Many disagreements that appear to be about probabilities are actually about weights. Two agents may agree that P(h) = 0.7 while disagreeing radically about how much evidence justifies that figure—and consequently about what further inquiry is warranted.

Takeaway
A probability of 0.5 from ignorance and a probability of 0.5 from overwhelming balanced evidence are epistemically different states. Credence alone cannot distinguish them; weight can.

Good's Log-Likelihood Ratio and Additive Weight

I.J. Good, working in the cryptanalytic tradition that included Turing at Bletchley Park, proposed a precise mathematical operationalization of weight. The weight of evidence that observation e provides for hypothesis h against alternative ¬h is defined as the logarithm of the likelihood ratio: W(h:e) = log[P(e|h)/P(e|¬h)].

This definition has remarkable formal properties. First, it equals the difference in log-odds before and after observing e: W(h:e) = log[O(h|e)] − log[O(h)], where O denotes odds. Bayes's theorem, expressed in log-odds form, becomes simple addition. Second—and this is the decisive virtue—weights from independent pieces of evidence add: W(h:e₁ ∧ e₂) = W(h:e₁) + W(h:e₂) when e₁ and e₂ are conditionally independent given h and ¬h.

This additivity is what makes weight, in Good's sense, behave like a genuine measure. Information theorists will recognize the connection: when logarithms are taken base 2, weight is measured in bits, and Good's framework links directly to Kullback-Leibler divergence and Shannon information. Each piece of evidence contributes its quantum of discriminatory information toward distinguishing competing hypotheses.

Notice that Good's measure is signed: evidence favoring h contributes positive weight, evidence against contributes negative weight. This differs from Keynes's original conception, where any relevant evidence increased weight monotonically. The reconciliation lies in distinguishing signed weight of evidence for a specific hypothesis (Good) from absolute weight as evidential bearing (Keynes). Both quantities are coherent; they answer different questions.

The practical applications are extensive. Forensic statistics uses likelihood ratios as the orthodox measure of evidential strength. Medical diagnostics expresses test informativeness through likelihood ratios precisely because of their additive composition across independent findings. Sequential hypothesis testing, due to Wald, exploits the same additive structure to determine stopping rules.

Takeaway
Logarithms convert Bayesian multiplication into addition, making evidence behave like a currency that accumulates linearly. This is why log-likelihood ratios are the natural unit of evidential support.

Decision-Theoretic Implications: When Weight Matters

If a rational agent's decisions depend only on probabilities and utilities—as standard expected utility theory maintains—then weight appears decision-theoretically inert. Given the same credence, two agents should choose identically regardless of how that credence was formed. This conclusion, however, holds only for terminal decisions: choices made without the option of further inquiry.

The picture transforms when we consider information acquisition. I.J. Good himself proved a foundational result: a rational Bayesian with the option to gather costless evidence before deciding should always prefer to do so, because the expected utility of acting after observation weakly dominates acting now. The expected gain depends on how informative the evidence is—which is to say, on its expected weight.

This is where weight reasserts itself decision-theoretically. An agent with credence 0.5 based on negligible evidence faces a fundamentally different inquiry decision than one with credence 0.5 based on saturated evidence. The first agent stands to gain substantially from further investigation; the second has reached diminishing returns. Value of information calculations explicitly invoke the expected log-likelihood ratio of available observations.

There is a further consideration that strict Bayesianism struggles to accommodate: resilience under adversarial scrutiny. In contexts of legal proof, scientific publication, or policy justification, a credence supported by minimal evidence is appropriately discounted. The epistemic community implicitly demands not just probability but warrant—the weight backing the probability. Formal frameworks like Dempster-Shafer theory and imprecise probability attempt to encode this directly into the belief structure.

The upshot for formal epistemology is that the orthodox Bayesian representation—a single probability function over a sigma-algebra—is expressively incomplete. A more adequate model tracks both first-order credence and the evidential foundations that support it. Whether this is achieved through interval-valued probabilities, sets of priors, or explicit weight functions remains a live research question.

Takeaway
Probability tells you what to believe; weight tells you how much to trust what you believe—and therefore whether to investigate further before you act.

The distinction between probability and weight is not a baroque refinement of Bayesian theory but a recognition of something the orthodox formalism omits. Probability captures the balance of evidence; weight captures its substance. Keynes saw this in 1921; Good gave it precise mathematical form through log-likelihood ratios; contemporary formal epistemology continues to elaborate the consequences.

What emerges is a richer picture of rational belief. The Bayesian agent is not merely a probability function but an evidentially situated reasoner whose credences carry histories of justification. Two agents may share posteriors while differing in everything that matters about how they should proceed—what to investigate, when to commit, how much weight to place on their own conclusions.

Formal methods clarify rather than dissolve the underlying philosophical insight. Knowledge has dimensions that a single number cannot capture. Recognizing this is the beginning of a more honest account of what it means to be rationally entitled to belief.