In 2004, economists Michael Kremer and Edward Miguel published findings that seemed almost too good to be true. Their study of school-based deworming in Kenya showed that treating intestinal parasites didn't just improve children's health—it produced dramatic long-term economic gains, with treated individuals earning 25% more as adults.

The study became a cornerstone of evidence-based development. It launched massive deworming campaigns, influenced billions in aid spending, and helped establish the randomized controlled trial as development's gold standard. GiveWell, the influential charity evaluator, ranked deworming among the most cost-effective interventions on the planet.

Then came the reanalyses. When other researchers tried to verify the original findings, they encountered problems. Statistical corrections, different analytical choices, and additional data painted a murkier picture. What followed wasn't just a technical dispute—it became a proxy war over how development should evaluate evidence and make decisions under uncertainty.

The Original Claims That Changed Development Policy

The Kenya deworming study emerged from the Primary School Deworming Project, which randomly assigned schools to receive treatment at different times starting in 1998. This created natural treatment and control groups that researchers could follow over decades.

The initial results were striking. Treated children attended school more frequently, and the benefits appeared to spill over to untreated children in the same schools and even neighboring communities. Worms were so prevalent, and treatment so cheap, that the cost-effectiveness calculations were remarkable—potentially just a few dollars per additional year of schooling generated.

But the truly influential findings came from long-term follow-ups. When researchers tracked down the original participants as adults, treated individuals worked more hours, earned substantially higher wages, and showed better life outcomes across multiple measures. The implied rate of return on a 50-cent deworming pill dwarfed almost any other development investment.

These findings weren't published in obscure journals—they appeared in top economics outlets and were championed by Nobel Prize winners. The study became the example development economists cited when explaining why rigorous evidence matters. It suggested that simple, cheap interventions could generate transformative returns, if only we measured carefully enough to detect them.

Takeaway

Extraordinary claims require extraordinary scrutiny. When a study's findings seem dramatically better than comparable interventions, that's precisely when independent verification becomes most important.

When Replication Revealed Cracks in the Foundation

The controversy ignited in 2015 when epidemiologists at the London School of Hygiene and Tropical Medicine attempted to replicate the original analysis. Using the same data, they found computational errors, questionable statistical choices, and results that weren't robust to reasonable alternative specifications. Some findings disappeared entirely when corrected.

The original authors disputed these critiques vigorously, arguing that the reanalysis made inappropriate methodological choices. Other statisticians weighed in on both sides. The technical back-and-forth became so heated and complex that even experts struggled to adjudicate the competing claims.

Meanwhile, a separate Cochrane systematic review—considered the gold standard for synthesizing medical evidence—examined over 40 deworming trials worldwide. Their conclusion was sobering: no substantial evidence that mass deworming programs improved nutrition, school attendance, or cognitive performance on average.

The disconnect was jarring. How could the Kenya study show such remarkable effects while the broader literature found little impact? Possible explanations included: Kenya had unusually high worm burdens, the long-term follow-up captured effects other studies missed, publication bias inflated the broader literature's null findings, or the original study's results were statistical noise amplified by flexible analytical choices.

Takeaway

No single study, however well-designed, can definitively establish what works. Development decisions should weight systematic reviews and replication patterns more heavily than any individual finding, especially when results conflict.

What the Controversy Teaches About Evidence Standards

The deworming debate exposed a fundamental tension in development economics. The field had embraced randomized controlled trials as a solution to the unreliable evidence that plagued earlier development research. But RCTs, it turned out, weren't immune to researcher degrees of freedom, publication incentives, and the challenge of generalizing from specific contexts.

Some observers drew a cautious conclusion: raise evidence standards, require pre-registration of analyses, demand multiple replications before scaling interventions. Others worried this approach would paralyze development practice, leaving proven-harmful status quos in place while waiting for perfect evidence that never arrives.

The organizations that had championed deworming faced difficult choices. GiveWell responded by dramatically downgrading their estimates of deworming's cost-effectiveness—but still kept it on their recommended list, arguing that even much smaller effects justified the tiny cost. This revealed something important: evidence-based development isn't binary. It requires making decisions under uncertainty, with explicit reasoning about how much confidence different findings deserve.

The controversy also highlighted the difference between academic and operational standards of evidence. Academics debate statistical significance and effect sizes. Program implementers need to know: should we spend limited resources on this intervention or something else? These are related but distinct questions, and the deworming debate showed how poorly the field had thought through their relationship.

Takeaway

Evidence-based development requires explicit frameworks for decision-making under uncertainty, not just standards for what counts as good evidence. The question isn't whether a study is perfect, but whether available evidence justifies action compared to alternatives.

The deworming controversy didn't destroy evidence-based development—it matured it. The field learned that single studies, even excellent ones, are starting points for understanding rather than final answers. Replication, systematic reviews, and healthy skepticism are features, not bugs.

For practitioners, the lesson isn't to abandon deworming or dismiss the original research. It's to hold all evidence—especially evidence we find compelling—to consistent standards while remaining willing to act on imperfect information when the stakes are high.

The ultimate irony is that the deworming debate itself demonstrates why evidence-based development matters. Without rigorous scrutiny, neither the original findings nor the critiques would have surfaced. The process was messy, contentious, and uncomfortable—and that's exactly how science is supposed to work.