Effect Sizes: Why 'Statistically Significant' Isn't Enough

a red, white and blue object on a black surface

5 min read

Statistical significance only tells you whether an effect is likely real, not whether it's large enough to matter in practice.

With large enough samples, scientists can detect differences so tiny they have no practical importance whatsoever.

Effect size metrics like Cohen's d quantify magnitude, giving you a way to compare findings across different studies and fields.

Context determines whether a small effect is trivial or transformative — a 1% risk reduction across millions of people can save thousands of lives.

Asking both 'Is this real?' and 'Is this big enough to care about?' is one of the most powerful habits for evaluating scientific claims.

Imagine a headline claiming that a new supplement significantly improves memory. Sounds impressive, right? But what if the actual improvement was half a second on a reaction-time test — a difference so small you'd never notice it in daily life? The word "significant" in science doesn't mean what most people think it means.

Statistical significance tells you whether an effect is real — whether it's unlikely to be a fluke. But it says nothing about whether the effect is big enough to matter. This gap between "detectable" and "meaningful" is one of the most misunderstood ideas in science, and understanding it can transform how you evaluate every claim you encounter.

Practical Importance: Detectable Doesn't Mean Meaningful

Here's the core issue: with a large enough sample, you can detect absurdly tiny differences. Run a study with a million participants and you might find that people who eat blue M&Ms score 0.01 points higher on a happiness survey. That result could be statistically significant — meaning it's probably not due to random chance — but it's also completely meaningless in practice. The difference is real but trivial.

Statistical significance is essentially a filter for noise. It answers one narrow question: "Could this result have appeared by accident?" If the answer is "probably not," you get the magic label of p < 0.05. But this tells you nothing about the size of the effect, its practical relevance, or whether anyone should change their behavior because of it.

This is why scientists increasingly insist on reporting effect sizes alongside p-values. Think of it this way: statistical significance is like a metal detector beeping. It tells you something is there. But it doesn't tell you whether you've found a gold coin or a bottle cap. Effect size is what tells you whether the find is worth picking up.

Takeaway
Statistical significance answers whether an effect exists. Effect size answers whether it matters. Always ask both questions before letting a finding change your mind.

Magnitude Measurement: Putting Numbers on How Much Something Matters

So how do scientists actually measure whether an effect is big or small? One of the most common tools is called Cohen's d, which expresses the difference between two groups in standardized units. A Cohen's d of 0.2 is generally considered small, 0.5 is medium, and 0.8 is large. These benchmarks aren't perfect, but they give you a starting vocabulary for thinking about magnitude.

Let's make this concrete. Suppose a tutoring program raises math scores by 2 points on a 100-point test. That's a Cohen's d of about 0.1 — a tiny effect. Now suppose a different program raises scores by 15 points. That's a Cohen's d around 0.8 — a large effect. Both results might be statistically significant, but only one would justify spending money to implement the program widely.

Other effect size measures include correlation coefficients (how tightly two things move together) and odds ratios (how much more likely something becomes). The specific metric matters less than the habit of asking: "How big is this effect, and does the size justify action?" Without this question, you're flying blind — trusting a beeping metal detector without ever looking at what's in the ground.

Takeaway
Effect size metrics like Cohen's d translate raw results into a universal language of magnitude. Learning to read that language lets you compare findings across entirely different fields and questions.

Context Interpretation: When Small Effects Are Big and Big Effects Are Small

Here's where it gets interesting: a "small" effect size isn't always unimportant. Context changes everything. A medication that reduces heart attack risk by just 1% sounds trivial — until you realize it's being given to 50 million people. That 1% translates to 500,000 fewer heart attacks. Scale can turn a tiny effect into an enormous impact. Conversely, a "large" effect found in 12 college students in a psychology lab might evaporate when tested in the broader population.

The field also matters. In particle physics, scientists demand massive effects and extreme statistical thresholds before claiming a discovery. In education research, an effect size of 0.3 might be celebrated because even modest improvements across millions of students add up. There's no universal cutoff for "big enough" — it depends on stakes, costs, and alternatives.

This is the real skill in scientific reasoning: interpreting effect sizes in context. You need to ask who is affected, how many people are involved, what it costs to act on the finding, and what happens if you ignore it. A single number never tells the whole story. But a single number combined with thoughtful context tells you almost everything you need to make a good decision.

Takeaway
The importance of an effect depends not just on its size but on its context — who it affects, how many people are involved, and what's at stake. Never judge a number in isolation.

Next time you see a headline trumpeting a "significant" finding, pause and ask two follow-up questions: How big is the effect? and Does that size matter in this context? These two questions alone will make you a sharper reader of science than most.

Statistical significance is the beginning of understanding, not the end. The real insight lives in magnitude and context — in knowing not just that something is there, but whether it's worth your attention. That distinction is one of the most powerful thinking tools science has to offer.