A vocational training program in Kenya shows remarkable results: participants increase their earnings by 40%, find formal employment at twice the baseline rate, and report higher life satisfaction. The randomized controlled trial is methodologically impeccable. Policymakers celebrate. Funding flows. The program scales nationally.

Three years later, evaluators return to find the intervention's impact has evaporated. Wages for the trained occupations have fallen. Employers report being overwhelmed with identically-credentialed applicants. Workers who would have entered these jobs without training have been displaced into lower-paying alternatives. The program that succeeded brilliantly at small scale has produced no net welfare gains—and possibly caused harm.

This pattern haunts development economics. We have grown sophisticated at measuring treatment effects in controlled environments, yet we remain dangerously naive about what happens when successful interventions encounter the full complexity of economic systems. The distinction between partial equilibrium analysis—where your intervention is too small to move markets—and general equilibrium effects—where scaling fundamentally reshapes prices, wages, and opportunities—represents perhaps the most consequential gap in our evaluation toolkit. Understanding this gap is not merely academic. It determines whether billions in development spending creates lasting change or simply reshuffles who captures existing opportunities.

Partial vs General Equilibrium: The Invisible Threshold

Every randomized controlled trial operates under an implicit assumption: the intervention is small enough that it doesn't change the underlying economic environment. Train 500 tailors in a city of two million, and the garment labor market absorbs them without noticing. The wages they receive, the prices customers pay, the demand for tailoring services—all remain essentially unchanged by your modest injection of skilled workers.

This is partial equilibrium analysis, and it's extraordinarily useful for measuring direct treatment effects. We can credibly identify how much the training changed outcomes for participants compared to controls, precisely because we've held the economic environment constant. The logic is clean, the identification strategy sound, the results interpretable.

But scale that training program to reach 50,000 tailors—or imagine what happens if every training organization in the country adopts your successful model—and the assumption collapses. General equilibrium effects emerge when interventions become large enough to move markets. The supply of trained tailors shifts outward. Wages fall to clear the new labor surplus. Customers benefit from lower prices; existing tailors see their earnings erode. Some potential tailors who would have trained without the program now face worse prospects because the subsidy went to others.

The fundamental challenge is that RCTs cannot directly measure general equilibrium effects, because they require holding constant precisely what changes at scale. We observe the impact on treated individuals relative to untreated individuals within the same market environment. We cannot observe what happens when the market environment itself transforms.

Some researchers have attempted creative solutions: randomizing at the market level rather than individual level, using saturation designs that vary treatment intensity across communities, building structural models that extrapolate from partial to general equilibrium. Each approach has merit but also severe limitations. Market-level randomization requires enormous sample sizes and faces spillover problems. Saturation designs can identify local general equilibrium effects but may miss economy-wide responses. Structural models depend on assumptions about market structure that are difficult to validate.

Takeaway

When evaluating an intervention's scalability, explicitly identify which market prices, wages, and quantities your analysis holds fixed—these are precisely the parameters that may shift when the program expands beyond its pilot phase.

Labor Market Distortions: The Hidden Displacement Tax

Labor market interventions are particularly vulnerable to general equilibrium effects because employment is fundamentally rivalrous. A job taken by one person cannot simultaneously be held by another. Skills training, wage subsidies, and employment services all share a troubling characteristic: their measured success may partly reflect displacement rather than job creation.

Consider the mechanics carefully. A skills training program increases participants' productivity and employability. In a partial equilibrium world with fixed labor demand, these participants compete more effectively for existing positions. They get jobs; their measured outcomes improve. But their success comes partly at the expense of workers who would have obtained those positions absent the intervention. The control group in your RCT—which doesn't receive training—may actually experience worse outcomes than they would have without any program at all, because they're now competing against subsidized competitors.

Standard RCT methodology cannot detect this displacement. Control group members still represent counterfactual outcomes in a world where treatment group members received training. But they don't represent counterfactual outcomes in a world where no training program existed. The distinction matters enormously for welfare calculations.

Wage effects compound the problem at scale. When you train enough workers in a particular skill, you shift the supply curve outward. If labor demand has any elasticity at all, equilibrium wages fall. This benefits employers and consumers of goods produced by these workers. It harms all workers in that occupation—including those you trained. The brilliant 40% earnings gain from your pilot dissolves because the baseline wage has dropped. Your trained workers might still earn more than untrained workers, but everyone earns less than workers did before the scaled intervention.

Some interventions are more vulnerable than others. Programs targeting occupation-specific skills face higher displacement risk because they increase labor supply in narrow market segments. Programs building general human capital—numeracy, literacy, problem-solving—spread beneficiaries across many occupations, reducing supply pressure in any single market. Similarly, interventions in labor markets with highly elastic demand (growing sectors, tradeable goods production) generate smaller displacement effects than those targeting inelastic occupations.

Takeaway

Before scaling employment programs, estimate labor demand elasticity in target occupations—interventions that succeed in elastic, growing sectors may fail catastrophically when applied to saturated or declining labor markets.

Anticipating Market Responses: A Framework for Scale-Up Risk

Not all interventions face equal general equilibrium risk. Developing systematic frameworks for predicting which programs will maintain their effectiveness at scale—and which will see their impacts erode or reverse—is essential for responsible development practice.

The first diagnostic dimension is market scope. Interventions targeting thick, integrated markets face higher general equilibrium risk than those operating in thin, segmented ones. A nationwide digital payments platform will reshape financial intermediation across the economy. A village savings group operates largely independently of formal finance, creating localized effects that don't aggregate into market-wide price changes. Ask: How connected is this intervention's market to broader economic systems?

The second dimension concerns supply versus demand interventions. Programs that increase supply—training workers, subsidizing production, expanding credit—tend to face more severe general equilibrium effects than those stimulating demand. Supply increases in competitive markets translate largely into price reductions, with gains accruing to consumers rather than producers. Demand stimulation can generate multiplier effects that partially offset displacement. Interventions that simultaneously shift both supply and demand may preserve their effectiveness better than pure supply-side programs.

Third, consider factor specificity. Interventions creating assets or skills with highly specific uses concentrate their effects in narrow markets, amplifying price impacts. Programs building general-purpose capabilities—financial literacy, basic numeracy, physical infrastructure—diffuse their effects across many markets, reducing pressure on any single price vector. The more specific the intervention's output, the more carefully you must model market absorption capacity.

Finally, examine institutional response pathways. Some interventions trigger adaptive responses from firms, governments, or households that either amplify or dampen initial effects. Wage subsidies may encourage firms to create jobs they wouldn't otherwise offer (amplification) or may simply subsidize hires they would have made anyway (dampening). Mapping these response pathways requires detailed institutional knowledge that pure experimental methods cannot provide—another argument for combining experimental evidence with structural economic reasoning.

Takeaway

Create a systematic checklist scoring interventions on market integration, supply-demand balance, factor specificity, and institutional response potential—high scores on multiple dimensions signal interventions requiring general equilibrium modeling before scaling.

The general equilibrium problem exposes a fundamental tension in evidence-based development practice. Our most rigorous methods—randomized controlled trials—are optimized for measuring effects that may not survive the very success we're trying to achieve. Partial equilibrium estimates can guide pilot design, but they cannot guarantee scaled impact.

This doesn't mean abandoning experimental methods. It means combining them with structural reasoning about market responses, displacement effects, and institutional adaptation. It means designing saturation experiments that deliberately vary treatment intensity to observe local general equilibrium effects. It means being honest about what our evidence can and cannot tell us.

The development economics profession has made enormous progress in rigorous impact evaluation. The next frontier is developing equally rigorous methods for predicting—not just measuring—how successful interventions transform the economic systems into which they scale. Until we bridge this gap, we risk building development policy on foundations that dissolve precisely when we need them most.