The development sector has made remarkable progress in demanding rigorous impact evidence before scaling interventions. Randomized controlled trials have become standard practice for major funders, and systematic reviews guide billions in aid allocation. Yet this victory for evidence-based policy contains a troubling blind spot: knowing that something works tells you almost nothing about whether you should fund it.

Consider two hypothetical deworming programs. Program A reduces school absenteeism by 15% at $2 per child treated. Program B reduces absenteeism by 25% at $50 per child. Impact evaluations would correctly identify Program B as more effective. But with a fixed budget of $10,000, Program A reaches 5,000 children while Program B reaches only 200. The aggregate impact of Program A dwarfs that of the more effective intervention. This arithmetic seems obvious, yet development organizations systematically fail to apply it.

The underinvestment in cost-effectiveness analysis represents one of the most consequential failures in modern development practice. Organizations commission expensive impact evaluations, then allocate resources based on effect sizes alone, ignoring the denominator that determines actual lives improved per dollar spent. The result is a sector that congratulates itself on methodological rigor while potentially leaving enormous value on the table. Understanding why this happens—and how to fix it—requires examining both the technical challenges of cost measurement and the institutional incentives that perpetuate inefficiency.

Beyond Impact to Efficiency: Why Effect Sizes Alone Cannot Guide Resource Allocation

The fundamental insight of cost-effectiveness analysis seems almost embarrassingly simple: divide impact by cost to determine value per dollar. Yet this simplicity masks why development organizations so consistently avoid the calculation. Effect sizes feel scientific and precise; cost-effectiveness ratios feel contingent and debatable. The preference for the former over the latter reflects deep institutional biases that prioritize apparent rigor over actual usefulness.

When a randomized trial reports that cash transfers increased consumption by 30%, this finding appears portable and generalizable. The number can be compared to other studies, aggregated in meta-analyses, and cited in policy documents. But consumption gains per dollar spent depend on transfer size, targeting costs, delivery mechanisms, and administrative overhead—all factors that vary enormously across contexts. The effect size travels well; the efficiency ratio stays local.

This distinction creates perverse incentives. Researchers earn citations and career advancement by publishing novel impact findings, not by conducting the unglamorous work of comprehensive cost accounting. Funders demonstrate accountability by showing interventions passed rigorous evaluation, not by proving resources were allocated optimally across their portfolio. The entire incentive structure rewards proving something works rather than determining whether it works well enough to justify funding over alternatives.

The consequences compound across the sector. Organizations champion their most impressive effect sizes while burying programs with modest impacts but exceptional efficiency. Funders compare proposals based on expected outcomes without adjusting for cost variations that can span orders of magnitude. The development community has built an elaborate apparatus for answering the wrong question.

GiveWell's approach offers a counterexample. Their estimates suggest top charities achieve outcomes at costs that differ by factors of 10 to 100 compared to average development programs. This variation dwarfs differences in effect sizes across most interventions. Yet the sector continues to obsess over whether an intervention achieves statistical significance while ignoring whether it achieves meaningful value relative to alternatives.

Takeaway

An intervention's effect size tells you whether it works; only cost-effectiveness tells you whether it deserves funding over alternatives—and ignoring this distinction can waste orders of magnitude more resources than backing interventions that fail entirely.

Cost Measurement Challenges: The Practical Obstacles to Meaningful Efficiency Analysis

Even organizations committed to cost-effectiveness analysis face formidable obstacles in measuring true program costs. The denominator in any efficiency calculation is far more difficult to establish than the numerator. Impact evaluations have developed sophisticated methods for estimating causal effects, but cost estimation remains methodologically underdeveloped and practically inconsistent.

Start with the most basic question: what costs should be included? Direct program expenses appear straightforward—materials, personnel, transportation. But overhead allocation introduces immediate ambiguity. Should headquarters staff time be charged to field programs? At what rate? How should shared infrastructure be apportioned across multiple interventions? Organizations answer these questions differently, making cross-program comparisons unreliable even when both report costs.

Opportunity costs create deeper complications. When a government health ministry implements a new screening program, the direct costs may be minimal—training materials, some additional supplies. But the nurses conducting screenings cannot simultaneously perform other duties. The true cost includes displaced activities, yet these rarely appear in program budgets. Academic evaluations typically report only incremental costs, dramatically understating what full-scale implementation would require.

Implementation intensity adds another dimension of incomparability. A nutrition program might show impressive effect sizes when delivered by highly trained staff with intensive supervision and small caseloads. Scale the same program with lower-cost personnel, reduced oversight, and larger ratios—and both costs and impacts change unpredictably. The cost-effectiveness ratio from a research trial may bear little relationship to what a government achieves at scale.

Finally, time horizons complicate any calculation. Many development interventions—education, early childhood programs, infrastructure—generate benefits over decades. Discounting future benefits to present value requires assumptions about discount rates, benefit persistence, and counterfactual trajectories. Reasonable methodological choices can shift cost-effectiveness estimates by factors of two or three, often overwhelming differences between programs.

Takeaway

Before trusting any cost-effectiveness comparison, interrogate what costs were included, how overhead was allocated, whether opportunity costs appear, and what implementation intensity produced the measured effects—these methodological choices often matter more than the final numbers.

Comparative Benchmarks: Frameworks for Interpreting Ratios Across Outcome Domains

Suppose you've overcome the measurement challenges and produced credible cost-effectiveness estimates for your programs. A new problem emerges: how do you compare ratios measuring fundamentally different outcomes? Dollars per disability-adjusted life year saved cannot be directly compared to dollars per additional year of schooling completed or dollars per percentage point reduction in poverty. Yet development portfolios must allocate across these domains.

The health economics community has developed the most mature frameworks for cross-domain comparison. Metrics like QALYs (quality-adjusted life years) and DALYs attempt to reduce diverse health outcomes to a common unit of healthy life. While philosophically contested—they require valuing different health states and implicitly comparing lives across conditions—these measures enable at least rough comparisons across interventions targeting different diseases and disabilities.

Development economics lacks equivalent consensus metrics for non-health outcomes. Education researchers sometimes calculate returns in lifetime earnings, but these estimates require heroic assumptions about labor markets, discount rates, and attribution. Poverty measures face similar challenges—consumption gains today cannot be straightforwardly compared to reduced vulnerability tomorrow or increased agency that never appears in household surveys.

One pragmatic approach uses revealed preferences from existing allocation decisions. If major funders consistently value averting one child death at roughly $5,000 in low-income contexts, this implicit threshold provides a benchmark. Programs achieving better ratios than this threshold demonstrate comparative advantage; those performing worse require justification. The benchmark isn't philosophically defensible as the true value of life, but it enables practical portfolio decisions.

GiveWell's methodology offers another template. Rather than seeking universal metrics, they develop explicit models converting diverse outcomes to a common unit—typically, lives saved equivalent. A deworming program's educational benefits get translated to income gains, then compared to mortality interventions using explicit conversion factors. The conversions are debatable, but making assumptions transparent enables productive disagreement about values rather than hiding them in incomparable metrics.

Takeaway

When comparing cost-effectiveness across different outcome domains, make your value assumptions explicit—whether using standard thresholds from health economics, revealed preferences from funder behavior, or explicit conversion factors—because hidden assumptions about relative values are still assumptions.

The development sector's underinvestment in cost-effectiveness analysis represents a failure of institutional design rather than individual competence. Researchers, funders, and implementers all respond rationally to incentives that reward demonstrated impact over demonstrated efficiency. Changing this requires redesigning what gets measured, published, and rewarded.

Practical steps exist for organizations willing to lead. Require cost reporting alongside impact evaluation as a condition of funding. Develop standardized cost templates that enable meaningful comparison across grantees. Invest in the unglamorous work of establishing benchmark ratios for common intervention types. Promote careers for researchers who specialize in efficiency analysis rather than novel impact findings.

The ultimate argument for cost-effectiveness analysis is moral, not technical. Every dollar spent on a less efficient intervention is a dollar not spent on a more efficient one. In a world of constrained resources and unlimited need, refusing to maximize impact per dollar is not methodological humility—it is a choice to help fewer people than we could.