Every year, billions of dollars flow into health systems across low- and middle-income countries, yet the quality of care delivered at the point of service remains stubbornly uneven. A growing body of experimental evidence suggests that how we compensate and motivate health workers may matter as much as how many we train or deploy. Performance-based financing—tying some portion of remuneration to measurable outputs or outcomes—has become one of the most debated instruments in the development health toolkit.
The logic is deceptively simple: reward the behaviors you want to see more of, and rational agents will supply them. But health systems are not textbook markets. Health workers operate under severe resource constraints, face multitasking problems that standard principal-agent models struggle to capture, and serve populations whose needs are difficult to contractualize. The gap between incentive design on paper and incentive effects in practice is where the most important empirical questions live.
This article examines the experimental evidence on performance pay in development health systems—not as an advocacy exercise, but as a rigorous assessment of what we actually know. We consider the design space for incentive mechanisms, synthesize impact findings across multiple randomized evaluations, and confront the uncomfortable evidence on gaming, distortion, and unintended consequences. The goal is to move beyond the binary debate of whether incentives "work" and toward a more sophisticated understanding of when, how, and for whom they generate value.
The Design Space: More Than Just Bonuses
Performance-based financing in health is not a single intervention—it is a family of design choices, and the specific configuration matters enormously for downstream effects. The most basic distinction is the unit of incentive: individual health workers, health facilities as organizations, or regional administrative units. Each choice embeds different assumptions about where agency resides and how production functions operate. Individual bonuses assume that personal motivation is the binding constraint; facility-level rewards assume coordination and collective effort matter more.
Beyond the unit, designers must choose the basis of payment. Output-based schemes reward measurable service delivery—immunizations administered, antenatal visits completed, deliveries attended. Outcome-based schemes attempt to tie payment to health results—reductions in child mortality, improvements in nutritional status. The former are easier to verify but may incentivize volume over value. The latter are more aligned with ultimate goals but introduce attribution problems and long feedback loops that weaken the incentive signal.
Non-financial incentives represent a parallel design dimension that randomized evidence increasingly takes seriously. Public recognition, career advancement opportunities, peer comparison dashboards, and intrinsic motivation mechanisms can complement or substitute for financial rewards. The Rwanda performance-based financing scheme, one of the most rigorously evaluated programs, combined facility-level financial bonuses with enhanced autonomy over resource allocation—making it difficult to isolate which design feature drove observed effects.
A critical and often underappreciated design parameter is the verification architecture. Who measures performance, how frequently, and with what consequences for measurement error? Independent verification by external agents is costly but reduces manipulation. Self-reporting is cheap but invites gaming. Community-based monitoring introduces local accountability but may reflect power dynamics rather than objective quality. The Haut-Katanga experiment in the DRC demonstrated that verification intensity itself can be a binding constraint on program effectiveness.
Finally, the interaction between incentive design and existing institutional context determines whether a scheme strengthens or undermines the health system. In settings where baseline salaries are adequate and supply chains function, marginal incentives may sharpen effort allocation. In settings where workers are unpaid for months and essential medicines are unavailable, performance pay may simply reward those who were already positioned to deliver—or worse, redirect scarce administrative capacity toward measurement rather than service provision.
TakeawayPerformance-based financing is not one intervention but a constellation of design choices—unit, basis, verification, and institutional context—each of which independently shapes whether the program improves care or merely reshuffles where effort goes.
What the Experiments Actually Show
The headline finding from the most influential randomized evaluations—Rwanda, Argentina's Plan Nacer, Cameroon, and several others—is that performance-based financing can increase the utilization of targeted health services. Rwanda's landmark RCT showed significant increases in institutional deliveries, preventive care visits for young children, and some improvements in quality-of-care process measures. These results were meaningful and have rightfully shaped policy discourse. But the nuance beneath the headlines deserves equal attention.
A recurring pattern across evaluations is that quantity effects are more robust than quality effects. Programs tend to increase the number of services delivered—more vaccinations, more prenatal visits, more facility-based births—while evidence on whether those services are delivered competently is weaker and more mixed. The Cameroon PBF evaluation found improvements in some structural quality indicators like drug availability but limited effects on clinical process quality. This distinction matters enormously: a prenatal visit where no blood pressure is measured and no danger signs are assessed may check a performance box while failing the patient.
Perhaps the most important methodological contribution of recent experimental work is the use of factorial designs that decompose the incentive effect from the resource effect. When a facility receives performance-based payments, it receives both a motivational signal and additional revenue. Cameroon's evaluation included an arm that provided equivalent unconditional financing, revealing that much of the observed improvement could be attributed to additional resources rather than the incentive structure itself. This finding fundamentally challenges the theoretical case for performance pay as distinct from simple budget increases.
Heterogeneity of treatment effects across settings is substantial and policy-relevant. Programs tend to show larger effects in facilities with higher baseline capacity—better-staffed, better-supplied, and better-managed. This creates a troubling equity implication: the facilities least equipped to deliver quality care may benefit least from incentive schemes, potentially widening within-system disparities. The Zambia RCT documented precisely this pattern, with performance bonuses yielding minimal improvements in the most resource-constrained facilities.
On distal health outcomes—the metrics that ultimately justify health system investments—the evidence remains thin. Few evaluations have been powered to detect changes in mortality, morbidity, or nutritional status, and those that have attempted it generally find modest or null effects. This is not necessarily a failure of the programs; detecting population-level health outcome changes requires large samples, long time horizons, and attribution strategies that most PBF evaluations were not designed to provide. But it does mean that the strongest empirical claims we can make remain at the level of service delivery processes rather than health impact.
TakeawayThe evidence shows performance-based financing reliably increases the volume of targeted services, but the crucial question—whether this translates into better health—remains largely unanswered, and much of the observed effect may stem from additional resources rather than incentive motivation itself.
Gaming, Distortion, and the Multitask Problem
The theoretical concern most frequently raised against performance pay in multitask environments is Holmstrom's multitask problem: when agents perform multiple tasks but only some are incentivized, effort migrates toward measured activities at the expense of unmeasured ones. Health workers do not simply deliver discrete, contractible services—they counsel, they triage, they manage chronic conditions, they respond to emergencies, they maintain community trust. Incentivizing a subset of these functions risks degrading the rest.
Empirical evidence on task distortion is growing and concerning. Several evaluations have documented significant declines in non-incentivized services when incentive schemes are introduced. In one well-documented case, health workers increased performance on rewarded indicators while reducing time spent on patient counseling and community outreach activities that were not measured. The net welfare effect of such reallocation is ambiguous—it depends on the relative value of incentivized versus non-incentivized activities—but it is rarely formally assessed in program evaluations.
Cream-skimming—the selective targeting of easier-to-serve populations to maximize performance metrics—represents another documented distortion. When health workers or facilities are rewarded for aggregate performance indicators, they face incentives to concentrate effort on patients who are most likely to generate countable outputs. In immunization-focused schemes, this can mean prioritizing already-accessible populations while further neglecting hard-to-reach communities. The patients who most need the health system may become the least profitable to serve.
Data manipulation and strategic reporting constitute perhaps the most direct form of gaming. When financial rewards depend on reported numbers, the incentive to misreport is structurally embedded. Independent verification audits in multiple PBF programs have uncovered significant discrepancies between reported and verified service delivery, sometimes exceeding 20 percent inflation. The cost of maintaining verification systems sufficient to contain this problem can be substantial—in some implementations, verification costs consume a meaningful fraction of the total incentive budget, raising questions about cost-effectiveness.
The deepest concern, and the hardest to measure experimentally, is the potential erosion of intrinsic motivation. A substantial body of evidence from behavioral economics suggests that external rewards can crowd out internal motivation—the sense of professional duty, community commitment, and vocational identity that sustains health workers through conditions that no bonus could adequately compensate. If performance pay signals that effort is primarily transactional, it may undermine the very motivational foundations that sustain health systems in resource-poor settings. This crowding-out effect is theoretically well-established but empirically elusive in development health contexts, precisely because it operates on timescales and through channels that standard RCT designs are poorly equipped to detect.
TakeawayThe multitask nature of health work means that any incentive scheme is also an implicit tax on unmeasured activities—and the most important dimensions of care quality may be precisely those that resist measurement and contractualization.
Performance-based financing in development health systems is neither the transformative silver bullet its advocates once promised nor the distortionary menace its critics fear. It is a complex institutional intervention whose effects depend profoundly on design specifics, implementation context, and the counterfactual it is compared against.
The honest synthesis of experimental evidence points toward a more modest and more useful conclusion: incentive schemes can shift measurable service delivery at the margin, but they are not substitutes for functioning supply chains, adequate baseline financing, or competent health workforce management. The decomposition findings—showing that unconditional resources explain much of the observed effect—should fundamentally reframe the policy conversation.
For program designers and evaluators, the frontier is not whether to use incentives but how to design measurement systems that capture what matters without distorting what they cannot capture. That is a harder problem than writing bonus formulas—and a more important one.