Development practitioners face a persistent design question that carries enormous fiscal and human consequences: should anti-poverty programs deliver a single well-targeted intervention, or bundle multiple components into a comprehensive package? The intuition behind bundling is compelling—poverty is multidimensional, so the response should be too. But intuition is not evidence, and the cost differentials between single-component and multifaceted programs are substantial enough to demand rigorous empirical scrutiny.
The last decade has produced a remarkable body of experimental evidence on this question, anchored by multi-country randomized evaluations of the so-called graduation approach—integrated programs that combine productive asset transfers, skills training, consumption support, savings facilitation, and regular coaching. These trials, spanning South Asia, Sub-Saharan Africa, and Latin America, offer some of the most methodologically robust findings in development economics. They also raise uncomfortable questions about cost-effectiveness and scalability.
What emerges from this evidence is not a simple verdict for or against bundling. Instead, the data reveal a more nuanced picture: complementarities between program components are real but context-dependent, the marginal contribution of individual elements varies dramatically across settings, and the operational complexity of comprehensive programs introduces implementation risks that randomized trials—conducted under research-grade conditions—may systematically understate. Understanding when bundling adds genuine value requires disaggregating the bundle itself.
Graduation Approach Evidence: What the Multi-Country Trials Actually Show
The canonical evidence on multifaceted anti-poverty programs comes from the six-country randomized evaluation coordinated by Banerjee, Duflo, Goldberg, Karlan, Osei, Parienté, Shapiro, Thuysbaert, and Udry, published in Science in 2015. The graduation model—originally developed by BRAC in Bangladesh—was tested in Ethiopia, Ghana, Honduras, India, Pakistan, and Peru. Each site implemented a bundled intervention comprising a productive asset transfer, technical skills training, regular life-skills coaching, temporary consumption support, and access to savings mechanisms. The results were striking: statistically significant positive impacts on consumption, food security, assets, and income persisted at least one year after program support ended across nearly all sites.
The durability of these effects is what distinguishes the graduation evidence from many other anti-poverty evaluations. Follow-up studies in Bangladesh and India have tracked participants for seven and ten years respectively, finding that treatment effects not only persist but in some cases grow over time. In the Indian context, Banerjee, Duflo, and Sharma documented sustained asset accumulation and income gains a full decade after the intervention, suggesting the program shifted participants onto a different economic trajectory rather than providing a temporary boost.
However, the magnitude of effects varied considerably across countries. In Honduras, impacts were modest and statistically insignificant on several key outcomes. In Ghana, certain components appeared to contribute little beyond what the asset transfer alone achieved. This heterogeneity matters enormously for policy design—it suggests that the graduation model is not uniformly effective and that local economic conditions, market structures, and baseline poverty profiles mediate impact in ways that a single pooled estimate obscures.
A critical methodological caveat: these trials evaluated the full bundle against a control, not individual components against each other. This design answers the question "does the package work?" but not "which elements of the package drive the results?" Without factorial designs that randomly vary the inclusion of each component, we cannot cleanly attribute outcomes to specific program elements. The few studies that have attempted component-level decomposition—notably Banerjee, Duflo, and Sharma's work in West Bengal—suggest asset transfers and coaching carry disproportionate weight, but the evidence base for component attribution remains thin.
The internal validity of these trials is high—randomization was well-implemented, attrition rates were manageable, and pre-analysis plans constrained specification searching. But external validity is an open question. These programs were implemented by well-resourced NGOs with significant research team oversight. Whether government agencies operating at national scale can replicate these conditions is far from certain, and the gap between efficacy under trial conditions and effectiveness in routine implementation is a recurring challenge in development evaluation.
TakeawayMulti-country graduation trials demonstrate that bundled anti-poverty programs can produce durable welfare gains, but the experimental designs that established this evidence largely cannot tell us which specific components within the bundle are doing the heavy lifting—a gap with profound implications for efficient program design.
Complementarity vs Redundancy: When Components Reinforce and When They Don't
The theoretical case for bundling rests on complementarity—the idea that the combined effect of multiple interventions exceeds the sum of their individual effects. A productive asset without the skills to manage it may be sold or lost. Training without capital cannot be applied. Consumption support prevents distress sales of the asset during the vulnerable early period. Coaching sustains motivation and problem-solving. Each element reinforces the others, and removing any one could undermine the whole. This is a coherent economic argument, rooted in poverty trap models where households face multiple binding constraints simultaneously.
But complementarity is an empirical claim, not a logical necessity. The alternative hypothesis—redundancy—holds that one or two components account for most of the impact and the remaining elements add cost without proportional benefit. Evidence from several settings supports this possibility. Blattman, Fiala, and Martinez's work in Uganda found that unconditional cash grants alone produced large and sustained gains in income and consumption for young adults, without any bundled training or coaching. The Haushofer and Shapiro GiveDirectly evaluation in Kenya found similarly robust effects from cash transfers alone, raising the question of whether the additional programmatic infrastructure of graduation models is cost-justified.
The most informative designs for adjudicating this debate are factorial experiments that randomize access to individual components. These remain rare in development economics due to their sample size requirements and operational complexity. One notable exception is the work of Bouguen, Filmer, Leight, and Tiongson in the Philippines, which varied the delivery of livelihoods training and asset transfers independently. They found significant complementarities—the combined treatment outperformed either component alone by a margin exceeding the sum of individual effects. But this finding is context-specific and cannot be assumed to generalize.
Cost-effectiveness analysis sharpens the stakes considerably. Graduation programs typically cost between $1,500 and $5,000 per household at purchasing power parity—several times more than unconditional cash transfers delivering equivalent monetary value. If a $500 cash transfer achieves 70% of the impact of a $4,000 graduation package, the simpler intervention dominates on cost-effectiveness grounds even though the comprehensive program produces larger absolute effects. The relevant policy question is rarely "what maximizes impact for one household?" but rather "what maximizes impact per dollar across the eligible population?"
Recent work by Bedoya, Coville, Haushofer, Hidalgao, and Shapiro directly compares graduation-style programs against equivalent-cost cash transfers in Rwanda. Preliminary results suggest the bundled program outperforms cash on some dimensions—particularly psychosocial outcomes and food security—but the differences are modest relative to the substantially higher delivery costs of the multifaceted approach. This kind of head-to-head comparison, with careful cost accounting, is exactly what the field needs more of to move beyond ideological commitments to bundling or simplicity.
TakeawayComplementarity between program components is possible but not guaranteed—the critical question is not whether a bundled program works, but whether its incremental impact over simpler, cheaper alternatives justifies the incremental cost, and answering that requires factorial designs the field has been slow to produce.
Scalability Trade-offs: From Research-Grade Implementation to Government Systems
Even if we accept that graduation-style bundling produces genuine complementarities in certain contexts, a formidable challenge remains: operational scalability. Comprehensive programs are logistically demanding. They require coordinated delivery of multiple services—asset procurement and distribution, sequential training curricula, regular individual coaching visits, savings group facilitation, and consumption support transfers—often to remote and hard-to-reach populations. Each component introduces points of potential implementation failure, and the probability of at least one component failing increases multiplicatively with program complexity.
The coaching component is particularly difficult to scale. In research implementations, coaches are typically well-trained, well-supervised, and carry manageable caseloads. Government-run programs operating at scale face entirely different human resource constraints. India's National Rural Livelihoods Mission—which incorporates elements of the graduation approach—serves millions of households but with coaching quality and frequency that bear little resemblance to the original BRAC model. The gap between the evaluated intervention and the scaled version is wide enough to question whether impact estimates from trials remain informative at all.
This is not merely a theoretical concern. Muralidharan and Niehaus have documented the "voltage drop" phenomenon in Indian development programs—the systematic attenuation of treatment effects as interventions move from NGO-led pilots to government implementation at scale. The drivers are predictable: weaker staff selection and training, reduced monitoring intensity, political interference in targeting, procurement delays, and the loss of institutional culture that characterized the implementing organization in the original trial. For multifaceted programs with many moving parts, the voltage drop is likely to be especially severe.
An emerging alternative is adaptive bundling—using diagnostic tools or machine learning to identify which constraints bind for specific households, then tailoring the intervention package accordingly. Rather than delivering a uniform bundle to all participants, programs could allocate coaching only to those who need behavioral support, training only to those with skill deficits, and asset transfers calibrated to local market opportunities. This approach preserves the logic of addressing multiple constraints while reducing unnecessary expenditure on redundant components. Early-stage work by Bryan, Chowdhury, and Morten on migration facilitation suggests that targeting interventions to specific constraint profiles can dramatically improve cost-effectiveness.
The policy frontier is moving toward a more disciplined eclecticism—acknowledging that bundling can add value while insisting on empirical justification for each included component and realistic assessment of implementation capacity. Governments and donors increasingly demand cost-effectiveness evidence rather than accepting impact estimates in isolation. The question is no longer whether comprehensive programs can work under ideal conditions, but whether their demonstrated benefits survive the transition to the institutional environments where they must ultimately operate—and whether those benefits justify their costs relative to simpler alternatives that are easier to deliver well at scale.
TakeawayThe greatest risk in multifaceted program design is not that bundling fails in principle, but that operational complexity degrades implementation quality at scale—making the simplicity of an intervention a feature, not a limitation, when designing for real-world government delivery systems.
The evidence on multifaceted poverty interventions resists easy summary. Graduation-model trials have demonstrated that bundled programs can produce durable, meaningful welfare improvements—a genuine achievement in a field littered with disappointing evaluations. But the same evidence base reveals significant heterogeneity across contexts and leaves the question of component-level attribution largely unresolved.
The field needs more factorial designs that isolate component contributions, more head-to-head comparisons against equivalent-cost simpler alternatives, and—critically—more evaluations conducted under routine implementation conditions rather than research-grade oversight. Without these, policy decisions about program design rest on incomplete evidence and untested assumptions about scalability.
Bundling is a design choice, not a virtue. It should be justified by evidence of complementarity, constrained by realistic assessment of implementation capacity, and benchmarked against the cost-effectiveness of simpler approaches. The goal is not the most comprehensive program—it is the most impactful use of scarce development resources.