In 1854, physician John Snow traced a cholera outbreak in London to a single water pump on Broad Street. He didn't just notice that people who drank from the pump got sick more often—that was mere correlation. What made his work groundbreaking was the additional reasoning that linked contaminated water to disease transmission.
The phrase "correlation doesn't imply causation" has become almost a cliché. But understanding why this is true—and what actually does establish causation—remains philosophically challenging. Science constantly confronts this gap between noticing patterns and understanding mechanisms. The difference matters enormously, from medical treatments to policy decisions.
Confounding Variables: The Hidden Third Factor
Imagine you discover that ice cream sales and drowning deaths rise together each summer. Should we ban ice cream to save lives? Obviously not. A third factor—hot weather—causes both. People swim more and buy more ice cream when it's warm. The ice cream has no causal connection to the drownings whatsoever.
This is the problem of confounding variables: hidden common causes that create correlations between phenomena that have no direct relationship. The correlation is real, but the causal story we might tell is completely wrong. Confounders lurk everywhere in observational data. Countries with more Nobel laureates consume more chocolate per capita. But chocolate doesn't make scientists—wealth likely explains both.
Identifying confounders requires thinking beyond the data itself. You need background knowledge about how the world works. No statistical technique alone can reveal whether a hidden factor explains your correlation. This is why purely data-driven approaches have limits. The pattern in the numbers cannot tell you whether it reflects genuine causation or a lurking confounder.
TakeawayA correlation between A and B might mean A causes B, B causes A, or some hidden C causes both. The numbers alone cannot distinguish these possibilities.
Temporal Priority: Causes Must Come First
One fundamental requirement for causation seems almost too obvious to mention: causes must precede their effects. You cannot explain yesterday's headache by pointing to today's aspirin. Time's arrow gives us a crucial tool for distinguishing correlation from causation.
When we observe that education levels correlate with income, temporal analysis helps. People typically complete their education before earning their highest salaries. This ordering is consistent with education causing higher earnings—though it doesn't prove it, since other factors present early in life might cause both.
But temporal priority alone is insufficient. Post hoc ergo propter hoc—"after this, therefore because of this"—remains a fallacy even when the timing is perfect. The rooster crows before sunrise, but doesn't cause it. Many phenomena have regular sequences without causal connections. Still, if B consistently occurs before A, we can rule out A causing B. Time provides a necessary condition for causation, not a sufficient one.
TakeawayIf effect precedes supposed cause, causation is impossible. But just because A comes before B doesn't mean A caused B—sequence is necessary but not sufficient for causation.
Interventionist Criteria: The Power of Manipulation
Philosopher James Woodward developed an influential account of causation based on intervention. The core idea: A causes B if manipulating A (while holding other factors fixed) changes B. This captures something essential about causal relationships that mere correlation misses.
Consider testing whether a drug works. We don't just observe who takes it and who recovers—confounders would contaminate that data. Instead, we intervene: randomly assign some patients to receive the drug and others a placebo. If the drug group recovers more often, and we've controlled for other factors, the manipulation reveals causation.
This is why randomized controlled trials are the gold standard in medicine. Random assignment breaks the link between treatment and potential confounders. When we intervene on one variable and observe changes in another, we've done something observation alone cannot accomplish. We've actively probed the causal structure of the world. Not all causal questions permit intervention—we cannot randomize people to smoking—but the interventionist framework clarifies what would, in principle, establish causation.
TakeawayCausation means that intervening on one factor changes another. Where intervention is possible, it reveals causal structure that passive observation cannot.
Moving from correlation to causation requires more than better statistics. It demands theoretical reasoning about confounders, attention to temporal ordering, and—where possible—active intervention. Each element addresses a different way correlations can mislead us.
John Snow's cholera investigation succeeded because he combined all three: he identified alternative explanations, traced the timing of cases, and ultimately recommended removing the pump handle—an intervention that stopped the outbreak. Understanding causation means knowing what questions to ask, not just what patterns to find.