How do you measure something as diffuse as "good governance" across centuries when the people living under those institutions never bothered to rate them on a scale of one to ten? This is the central puzzle of quantitative institutional analysis—and it is far from trivial. Institutions shape everything from capital accumulation to demographic transitions, yet they resist the kind of direct measurement that prices, wages, and mortality rates permit. The entire edifice of modern growth economics rests on claims about institutional quality, but the empirical foundations of those claims deserve more scrutiny than they typically receive.
The challenge is not merely archival. It is fundamentally methodological. We need variables that can be coded consistently across polities and centuries, that capture meaningful variation in how societies enforce contracts, protect property, and constrain arbitrary power. Get the measurement wrong, and every downstream regression is compromised—garbage in, garbage out, regardless of how sophisticated the econometric technique.
What follows is an examination of three critical dimensions of this problem: the proxy indicators researchers have developed to stand in for institutional quality, the influential colonial-origins literature that exploits historical accidents as natural experiments, and the persistent identification challenges that haunt any attempt to isolate institutional effects from confounding factors. Each dimension reveals both the ingenuity and the fragility of quantitative approaches to one of economic history's most consequential questions.
Proxy Indicators: Making the Intangible Measurable
Institutional quality cannot be observed directly in historical data. No medieval king published a rule-of-law index. So researchers build proxies—observable quantities that should correlate with the underlying institutional characteristic of interest. The logic is straightforward: if property rights are secure, capital should be cheap; if courts enforce contracts reliably, litigation patterns should reflect confidence in adjudication rather than desperation. The art lies in selecting proxies that are both available in the historical record and defensibly connected to the institutional dimension being measured.
Interest rate spreads are among the most widely used proxies. The reasoning runs as follows: the gap between sovereign borrowing costs and a risk-free benchmark reflects the market's assessment of default risk, which in turn depends on fiscal discipline, political stability, and the credibility of contractual commitments. Stasavage's work on early modern European sovereign debt demonstrates that constitutional constraints on executive power were associated with lower borrowing costs—a measurable footprint of institutional quality. Similarly, Epstein used interest rate convergence across Italian city-states to track the integration effects of institutional reform.
Litigation rates offer a different window. High rates of commercial litigation in a society with functioning courts may signal confidence in the legal system—parties expect adjudication to work. Conversely, very low litigation in the presence of active commerce may indicate that parties have given up on formal enforcement and rely instead on informal mechanisms, reputation networks, or coercion. The interpretive difficulty is obvious: the same data point can support opposite conclusions depending on context. This is why multi-proxy approaches, combining interest rates, litigation records, land registration patterns, and political stability indices, are methodologically preferable.
Political stability indices—coded from records of coups, civil conflicts, regime changes, and executive turnover—provide yet another dimension. Barro and others have shown strong correlations between political instability and poor economic outcomes, but the coding decisions matter enormously. Does a peaceful constitutional transition count the same as a violent overthrow? How do we handle periods of foreign occupation? Every coding rule is a theoretical commitment disguised as a practical decision, and small changes in classification can shift regression coefficients substantially.
The deeper problem with all proxy approaches is construct validity. We assume that interest rate spreads measure institutional quality, but they also capture war risk, monetary instability, information asymmetries, and market liquidity. No single proxy is clean. The best quantitative work acknowledges this explicitly, runs robustness checks across multiple proxies, and treats measurement error as a first-order concern rather than an afterthought. When researchers find results that hold across diverse proxies with different sources of measurement error, our confidence in the underlying institutional story rises considerably.
TakeawayA proxy is only as good as the theory connecting it to the thing you cannot observe. When measuring institutional quality, using multiple imperfect proxies with different sources of error is more honest—and more informative—than relying on any single indicator.
Colonial Origins: Natural Experiments in Institutional Divergence
The most influential attempt to quantify institutional effects on long-run development exploits a grim historical accident. Acemoglu, Johnson, and Robinson's 2001 paper argued that European settler mortality rates in colonial territories determined whether colonizers established extractive institutions or inclusive, property-rights-protecting ones. Where settlers died in large numbers—from malaria, yellow fever, and other tropical diseases—Europeans set up extractive regimes designed to transfer resources to the metropole. Where settlers could survive and settle permanently, they replicated European institutions with broader protections. The statistical instrument is settler mortality; the dependent variable is modern income per capita.
The logic of the instrumental variable is elegant. Settler mortality is plausibly exogenous to modern economic outcomes except through its effect on institutional development. Diseases that killed Europeans in 1800 do not directly cause poverty in 2000—but the institutions established under those mortality conditions persisted and, the argument goes, shaped subsequent economic trajectories. The first-stage regression shows that settler mortality strongly predicts current institutional quality (measured by expropriation risk indices). The second stage shows that instrumented institutional quality strongly predicts income levels.
This research design has been enormously productive and enormously contested. Albouy's critique challenged the mortality data itself, arguing that many of the original estimates were drawn from non-representative military campaigns and that correcting the data weakened the instrument substantially. Glaeser and colleagues questioned whether the instrument truly captures institutions rather than human capital—settlers brought skills and knowledge, not just legal frameworks. If what matters is the composition of the population rather than the rules they established, the institutional interpretation is undermined.
Subsequent work has extended and refined the colonial-origins approach. Dell's study of the mita forced labor system in Peru used the geographic boundary of the institution's jurisdiction as a regression discontinuity, finding persistent negative effects on consumption and public goods provision centuries after the mita was abolished. Banerjee and Iyer compared land revenue systems imposed by the British in India—some areas got landlord-based systems, others got individual cultivator-based systems—and found significant differences in agricultural productivity and public goods investment that persisted well after independence.
What makes the colonial-origins literature so valuable for quantitative institutional analysis is not that it has definitively proven institutions cause growth—it has not, as we will see—but that it established a methodological standard. It demonstrated that historical variation could be exploited econometrically, that instrumental variables could address endogeneity in institutional research, and that archival data on mortality, colonial administration, and land tenure could be coded and analyzed with the same rigor applied to price series and trade statistics. The bar was raised, and subsequent claims about institutional effects had to meet it.
TakeawayHistorical accidents that determined institutional forms—settler mortality, colonial boundaries, administrative assignments—provide some of the strongest empirical leverage for estimating institutional effects, precisely because they were not chosen based on the economic outcomes we now observe.
Causation Challenges: The Identification Problem That Won't Go Away
Even the best instrumental variable strategies face a fundamental challenge: institutions do not exist in isolation. They co-evolve with culture, geography, human capital, technology, and political power. Separating the causal contribution of institutions from these confounding factors is arguably the hardest identification problem in all of social science. The endogeneity runs in every direction. Rich societies can afford better institutions. Better-educated populations demand accountability. Favorable geography enables surplus, which enables state formation, which enables institutional development. Untangling these threads requires more than clever instruments.
The exclusion restriction—the assumption that the instrument affects the outcome only through the proposed channel—is almost never testable directly. In the settler mortality case, the assumption is that colonial-era disease environments affect modern income only through their effect on institutions. But disease environments also shaped population density, urbanization patterns, agricultural systems, and trade networks. Each of these alternative channels represents a potential violation of the exclusion restriction. Researchers can argue that these channels are quantitatively small or that they themselves operate through institutions, but these arguments are theoretical, not statistical.
Panel data approaches offer a partial escape from cross-sectional endogeneity. By tracking the same polity over time and exploiting within-unit variation in institutional quality, researchers can control for time-invariant confounders—geography, deep cultural traits, and other slow-moving variables. But institutions themselves are slow-moving, and the institutional changes that do occur are rarely random. Legal reforms happen because political conditions change; political conditions change because economic pressures mount. The within-unit variation that panel methods exploit may itself be endogenous to exactly the processes we are trying to study.
Difference-in-differences designs, regression discontinuities at administrative boundaries, and synthetic control methods have all been applied to institutional questions with varying degrees of success. Each handles some threats to identification while remaining vulnerable to others. The regression discontinuity at the mita boundary, for instance, is compelling if we believe that communities just inside and just outside the boundary were otherwise comparable—but centuries of differential treatment may have generated spillovers, migration patterns, and cultural adaptations that blur the boundary's cleanness.
The honest conclusion is that no single study has definitively established the causal effect of institutions on economic development. What we have instead is a body of evidence, accumulated across diverse settings, time periods, and identification strategies, that consistently points in the same direction: institutional quality matters, and it matters a great deal. The convergence of evidence across imperfect methods is itself informative. When different designs, each with different vulnerabilities, yield similar conclusions, our posterior probability that institutions causally affect development should update substantially—even if no individual study is airtight.
TakeawayIn institutional economics, no single empirical strategy can definitively establish causation. Confidence comes not from any one study but from the convergence of many imperfect approaches pointing in the same direction—a principle worth applying to any complex causal question.
Quantifying institutional quality across historical contexts remains one of the most methodologically demanding exercises in economic history. Every proxy carries measurement error. Every natural experiment has boundary conditions. Every identification strategy leaves some endogeneity unresolved. These are not reasons for despair—they are reasons for rigor.
The cumulative weight of evidence from interest rate analysis, colonial-origins research, regression discontinuities, and panel studies forms a mosaic that is far more persuasive than any single tile. The field has moved from vague assertions that "institutions matter" to precise, testable, and falsifiable claims about which institutions matter, how much they matter, and through what mechanisms they operate.
The frontier lies in better data—digitized archival records, machine-coded legal texts, georeferenced administrative boundaries—and in research designs that combine multiple identification strategies within the same study. The question is no longer whether institutions can be measured quantitatively. It is how precisely, and with what residual uncertainty, we can do so.