Why Most Experiments Fail—And Why That's Actually Good Science

depth of field photography of three round fruits

6 min read

Null results provide critical information by bounding effect sizes and eliminating false leads, yet they are systematically undervalued in the scientific community.

The file drawer problem, identified by Robert Rosenthal in 1979, describes how selective publication of positive results creates a distorted scientific literature that overestimates true effects.

When only statistically significant findings get published, chance alone guarantees that some nonexistent effects will appear real, inflating published effect sizes by roughly a factor of two.

Registered reports, where journals commit to publishing results before data collection, produce null results at five to six times the rate of traditional publications.

Structural reforms including pre-registration, open data mandates, and dedicated null results journals are gradually building a more complete and honest scientific record.

Imagine a pharmaceutical company runs a clinical trial on a promising new drug. After months of careful data collection, the result comes back: no statistically significant effect. The drug doesn't outperform the placebo. The team is disappointed. The study goes unpublished. And the next research group, unaware this trial ever happened, starts the whole expensive process over again.

This scenario plays out thousands of times each year across every scientific discipline. We have a cultural obsession with breakthroughs and discoveries—the experiments that find something. But the experiments that find nothing? They quietly vanish, taking valuable information with them.

The irony is sharp. Null results—experiments where the hypothesis isn't supported—are among the most useful data points science can produce. They narrow the search space, calibrate our expectations, and keep us honest about what we actually know. The problem isn't that most experiments fail. The problem is that we've built a publication system that pretends they never happened.

The Hidden Value of Finding Nothing

When an experiment produces a null result, it feels like a dead end. But statistically, it's anything but. A well-designed study that fails to detect an effect provides critical information about the boundaries of that effect. If a drug trial with 10,000 participants finds no significant benefit, that tells us something powerful: if any benefit exists at all, it's almost certainly too small to matter clinically.

This is the concept of bounding the effect size. Null results don't just say "we didn't find anything." They say "the true effect, if it exists, is probably smaller than X." That's enormously useful for other researchers deciding where to invest their time and funding. It's the difference between a blank map and a map that says "we checked here—there's nothing."

Consider Thomas Edison's famous remark about finding thousands of ways that don't work. That's not just motivational rhetoric—it's sound statistical reasoning. Each failed experiment in a systematic search eliminates a region of possibility. The information content of a null result depends on the experiment's statistical power: a high-powered study that finds nothing is far more informative than a small, underpowered one. A null result from a study with 95% power to detect a meaningful effect is essentially saying the effect isn't there.

Null results also serve as replication checks. When one lab reports a dramatic finding and three others can't reproduce it, those three null results are collectively telling us something important about the reliability of the original claim. Without them, the original finding stands unchallenged—and potentially misleading—in the literature.

Takeaway
A well-powered experiment that finds nothing isn't a failure—it's a map marking where the treasure isn't, saving everyone who comes after from digging in the same empty spot.

The File Drawer Problem: Science's Silent Distortion

In 1979, psychologist Robert Rosenthal gave this phenomenon a name: the file drawer problem. His reasoning was straightforward. If journals overwhelmingly publish positive results—studies where the hypothesis was confirmed—then for every published finding, there could be dozens of unpublished null results sitting in researchers' file drawers. The published literature becomes a funhouse mirror, reflecting only the hits and hiding the misses.

The statistical consequences are severe. Imagine 20 independent labs each testing whether a particular supplement improves memory. By chance alone, at the standard significance threshold of p < 0.05, one of those 20 labs will find a statistically significant positive result even if the supplement does absolutely nothing. If only that one lab publishes and the other 19 file their null results away, the scientific record now contains convincing-looking evidence for a nonexistent effect.

This isn't hypothetical. Meta-analyses—studies that combine results from multiple experiments—consistently find that published effect sizes are inflated compared to the true effects. A landmark analysis of psychological research estimated that published effects were roughly twice as large as what large-scale replication attempts found. The culprit wasn't fraud. It was selection bias baked into the publication system itself.

The incentive structure makes it worse. Researchers need publications to earn grants, tenure, and career advancement. Journals want exciting findings that attract citations. Reviewers are more enthusiastic about novel discoveries than confirmations of the null hypothesis. Every actor in the system is rationally responding to incentives—and the collective result is a literature that systematically overstates what science actually knows.

Takeaway
When only positive results get published, the scientific literature stops being a record of what's true and starts being a highlight reel—and highlight reels are terrible tools for making decisions.

Fixing the Record: Registered Reports and the Push for Transparency

The scientific community hasn't ignored this problem. Over the past decade, a suite of reforms has emerged targeting publication bias at its root. The most promising is the registered report—a format where researchers submit their study design and analysis plan to a journal before collecting data. The journal evaluates the methodology and commits to publishing the results regardless of outcome. Positive or null, the study sees the light of day.

This flips the traditional incentive structure entirely. Peer review focuses on whether the question is important and the methods are sound, not whether the results are exciting. Early evidence suggests it works: registered reports produce null results at roughly five to six times the rate of traditional publications in the same journals. That's not because the science is worse—it's because the filter has been removed.

Pre-registration is a lighter-weight cousin of registered reports. Researchers publicly log their hypotheses and analysis plans before running experiments, typically on platforms like the Open Science Framework or ClinicalTrials.gov. This doesn't guarantee publication, but it creates a verifiable record that makes it harder to quietly bury inconvenient null results or retroactively reframe exploratory analyses as confirmatory ones.

Dedicated journals have also emerged. The Journal of Negative Results in Biomedicine, PLOS ONE's commitment to methodological soundness over novelty, and various discipline-specific null results outlets are slowly building a home for the data that traditional journals reject. Combined with growing funder mandates for open data and transparent reporting, the ecosystem is shifting—unevenly, and not yet fast enough, but in a direction that treats all well-conducted evidence as worth preserving.

Takeaway
The most powerful reform in modern science isn't a new statistical technique—it's the simple idea that we should decide a study is worth publishing before we know what it found.

Science doesn't advance only through discoveries. It advances through the systematic elimination of wrong answers. Every null result, properly reported, makes the remaining landscape of possibilities a little clearer and a little more honest.

The publication bias problem isn't just an academic concern. It affects which drugs reach the market, which policies get funded, and which public health recommendations you follow. When null results disappear, everyone downstream makes decisions based on incomplete evidence.

The fix is cultural as much as structural. We need to value the researcher who carefully demonstrates that something doesn't work just as much as the one who finds something that does. Both are doing science. Only one has been getting credit.