Occupational Classification: Making Historical Jobs Comparable

gray asphalt road between green trees under white sky during daytime

6 min read

Comparing historical occupations requires classification schemes that balance granularity against cross-temporal and cross-national comparability.

HISCO, PST, HISCAM, and HISCLASS encode different theoretical priorities, and the choice among them materially affects empirical findings.

Status ranking methods translate categorical occupations into continuous scores, enabling regression-based analysis of mobility and persistence.

Sectoral reclassification reveals that England's structural transition out of agriculture preceded industrialization, supporting the Industrious Revolution hypothesis.

Methodological transparency and sensitivity analysis across schemes are essential safeguards against findings that are artifacts of coding choices.

Consider a seemingly simple question: was a seventeenth-century English yeoman wealthier than a nineteenth-century factory foreman? To answer quantitatively, we must first solve a prior problem—how do we even categorize these occupations in ways that permit meaningful comparison across centuries and continents?

This is the foundational methodological challenge of quantitative social history. Census enumerators, parish registers, and guild records preserve millions of occupational titles, but the data arrive in a chaotic vernacular. A chandler in 1650 London sold candles; by 1850 the term could indicate a ship's supplier or a grocer. A weaver might be an independent artisan, a putting-out worker, or a factory operative—economically distinct roles under one linguistic umbrella.

The stakes are considerable. Our estimates of social mobility, structural change, inequality, and demographic transition all rest on how we aggregate individual occupational entries into analytical categories. Different classification choices can reverse conclusions about whether industrialization widened or narrowed opportunity. Recent decades have seen serious infrastructural investment in this problem—HISCO, SOCPO, HISCLASS—yielding schemes that enable cross-national datasets spanning three centuries. Understanding their logic, and their tradeoffs, is prerequisite to any serious empirical work in economic and social history.

Classification Systems and Their Tradeoffs

The dominant international standard is HISCO—the Historical International Standard Classification of Occupations—developed by van Leeuwen, Maas, and Miles in 2002. HISCO adapts the ILO's ISCO-68 framework to historical sources, coding roughly 1,600 occupational titles into a five-digit hierarchical structure. Its explicit goal is cross-national comparability: a Swedish smed, an English blacksmith, and a French forgeron all map to code 83.120.

Competing schemes reflect different analytical priorities. The PST system (Primary, Secondary, Tertiary), developed by Wrigley and colleagues for the Cambridge Group, privileges sectoral analysis and is optimized for tracking structural transformation. Armstrong's five-class scheme, derived from the 1951 UK Registrar-General's classification, remains popular for British social history despite its anachronistic origin.

Every scheme confronts the same fundamental tradeoff: granularity versus comparability. A highly detailed scheme preserves information about specific trades but fragments samples, producing empty cells and inflating statistical noise. Aggregation enables robust comparison but obscures meaningful distinctions—lumping a master goldsmith with a journeyman tinsmith discards most of what we want to measure.

Further complications arise from by-employment (individuals holding multiple occupations), gendered invisibility (women's work systematically underreported), and status ambiguity (does gentleman denote occupation, rank, or aspiration?). Source-driven coding decisions propagate through every downstream analysis; replication studies by Clark and others have shown that occupational coding choices alone can shift measured wage premiums by 15-30 percent.

The practical implication: methodological transparency is non-negotiable. Published datasets should document coding rules, ambiguity resolution, and the proportion of unclassifiable cases. Sensitivity analysis using multiple schemes is increasingly standard practice.

Takeaway
No classification is neutral—each encodes theoretical assumptions about what distinctions matter. The honest researcher treats scheme choice as a parameter to test, not a technical preliminary to ignore.

Status Ranking and Quantifying Social Hierarchy

Classification sorts occupations into categories; ranking orders them along a continuum of social standing. The analytical payoff is substantial: with numerical status scores, we can compute intergenerational mobility coefficients, measure assortative mating, and trace status persistence across centuries using standard regression techniques.

HISCAM, constructed by Lambert and colleagues, derives scores from the social distance between occupations observed in marriage and intergenerational linkage patterns—a methodological descendant of the Cambridge Scale. Its logic is sociological: occupations are proximate in status to the extent that their holders marry each other and reproduce each other's positions. Scores are standardized to a mean of 50 and range roughly 1-99.

HISCLASS, by contrast, is a twelve-category schema based on manual/non-manual division, skill level, supervisory authority, and sector. It is categorical rather than continuous, making it suited for class-analytic frameworks but less tractable for regression-based mobility estimation. SOCPO (Social Power) offers a middle path with five ordered levels based on economic and organizational power.

The choice among these depends on the theoretical question. Measuring long-run status persistence—the central concern of Clark's The Son Also Rises—requires continuous scores and benefits from HISCAM's fine gradations. Clark's surname-based estimates suggest intergenerational status correlations of 0.70-0.80, dramatically higher than conventional father-son estimates of 0.30-0.40, though this finding depends heavily on how status is operationalized.

Critically, ranking methods embed assumptions about what constitutes status—income, prestige, power, or marital endogamy—that need not align. A village schoolteacher and a prosperous butcher may rank similarly on income but diverge sharply on prestige. Triangulating across scoring systems is essential for robust inference.

Takeaway
Status is not a thing waiting to be measured but a construct produced by measurement choices. Different scoring methods illuminate different dimensions of the same underlying hierarchy.

Measuring Structural Change Across Centuries

Once occupations are classified consistently, their distributional shifts become a powerful lens on economic transformation. The canonical framework is Colin Clark's three-sector model: primary (extractive), secondary (manufacturing), tertiary (services). Tracking the movement of labor shares across these sectors provides the empirical backbone for theories of economic development.

The English case, reconstructed by Wrigley, Shaw-Taylor, and the Cambridge Group using PST coding of parish registers and censuses, reveals a striking pattern. Agricultural employment in England had already fallen to roughly 35-40 percent by 1700—far earlier than the classical Industrial Revolution narrative suggests. The structural transition to an industrial economy was thus substantially complete before the steam engine became economically dominant.

This finding, replicable across multiple coding schemes, forces revision of standard chronologies. It supports what economic historians now call the Industrious Revolution hypothesis: that reallocation of labor out of agriculture and into specialized production preceded and enabled mechanization, rather than following from it.

Comparative structural analysis extends the insight. HISCO-coded data from the Netherlands, Sweden, Belgium, and Catalonia allow direct quantitative comparison of transition paths. Divergences emerge: Sweden's agricultural share remained above 50 percent until the 1880s, while the Netherlands had de-agrarianized by the seventeenth century through specialization in trade and proto-industry. These are not minor variations but fundamentally different developmental trajectories.

Methodological caveats remain consequential. Rural by-employment means census snapshots overstate agricultural specialization in early periods. Women's domestic production is systematically undercounted in most sources, potentially biasing tertiary-sector estimates downward by substantial margins.

Takeaway
Structural change is slower, earlier, and more heterogeneous than textbook narratives suggest. The Industrial Revolution looks less like a rupture and more like the visible acceleration of a long prior reallocation.

Occupational classification appears on the surface to be clerical infrastructure—a tedious preliminary to the real analysis. In fact, it is where the most consequential interpretive decisions are made. The choice of HISCO over PST, HISCAM over HISCLASS, continuous versus categorical treatment—each shapes what patterns become visible and which remain obscured.

The past two decades have transformed the field. Shared schemes, digitized datasets, and public coding protocols now enable replication and cumulative progress of a kind unavailable to earlier generations of economic historians. The remaining frontiers are substantive: improving coverage of female labor, integrating by-employment, and extending harmonized classifications to non-European contexts where colonial categories distort the record.

For the serious researcher, the methodological posture is straightforward: treat classification as a modeling choice, report sensitivity to alternatives, and remain skeptical of findings that hinge on a single scheme. The numbers speak, but only through the categories we impose on them.