Performance Measurement: What Gets Measured Gets Managed—Badly

5 min read

Performance metrics frequently undermine the goals they were designed to promote, a pattern visible across hospitals, schools, police departments, and corporations.

Goodhart's Law captures the core dynamic: once a measure becomes a target, organizations optimize the measure rather than the underlying outcome.

Gaming behaviors follow predictable patterns including threshold manipulation, selection effects, definitional drift, and full goal displacement.

Better system design uses multiple indicators, separates learning from accountability, rotates metrics, and treats measurement as a political question.

Measurement systems are tools of governance that distribute power and reshape organizational behavior, not neutral windows onto reality.

Every modern organization measures something. Hospitals track wait times, schools chase test scores, police departments count arrests, and corporations monitor quarterly returns. The logic seems airtight: you cannot improve what you do not measure. Management orthodoxy has elevated this principle to near-religious status.

Yet a curious pattern emerges across institutions. The more aggressively organizations pursue their metrics, the more frequently their actual missions seem to suffer. Hospitals that excel at wait-time targets sometimes discharge patients prematurely. Schools that maximize test scores often abandon teaching. Police departments that hit arrest quotas may erode community trust.

This is not coincidence or poor implementation. It reflects something deeper about how measurement systems interact with human organizations operating under political and economic pressure. Understanding why measurement so often backfires—and what structural conditions produce these failures—reveals important truths about how power flows through bureaucracies and how policy goals get translated into operational reality.

Goodhart's Law in Practice

Charles Goodhart, a British economist, observed in 1975 that any statistical regularity tends to collapse once pressure is placed upon it for control purposes. Anthropologist Marilyn Strathern later sharpened this into a more memorable form: when a measure becomes a target, it ceases to be a good measure. The principle now bears Goodhart's name and applies far beyond monetary policy.

The mechanism is straightforward but powerful. A metric initially correlates with something we care about because it captures one dimension of underlying performance. Patient satisfaction surveys correlate with care quality. Test scores correlate with learning. Arrest rates correlate with public safety. But once the metric becomes a formal target tied to budgets, promotions, or political legitimacy, the correlation begins to break down.

Why? Because organizations and individuals shift from producing the underlying outcome to producing the metric directly. The two activities overlap initially, but diverge as optimization pressure intensifies. Energy that once went toward genuine improvement gets redirected toward measurement performance. The proxy detaches from what it was supposed to proxy for.

This is not a failure of effort or ethics—it is a structural feature of how measurement interacts with incentive systems. Even well-intentioned actors operating in good faith will gradually reshape their behavior around what is being counted. The metric becomes the territory, and the actual territory fades from view.

Takeaway
Every measurement system contains the seeds of its own corruption. The question is not whether your metrics will be gamed, but how quickly the gap between measure and reality will open up.

Gaming and Goal Displacement

Once metrics matter, organizations deploy remarkable creativity in optimizing them. The patterns are systematic enough to catalog. Threshold gaming occurs when actors focus effort precisely at the cutoff that triggers rewards—schools concentrating resources on students near the passing line while ignoring those clearly above or below. Selection effects appear when organizations choose easier cases to handle, like surgeons avoiding complex patients to preserve success rates.

More subtle is definitional drift, where organizations redefine categories to flatter their numbers. A hospital reclassifies admissions as observations to reduce reported readmission rates. A police department reclassifies felonies as misdemeanors to show falling crime. The reported reality shifts without any change in underlying conditions.

Then there is goal displacement—the deepest form of corruption. Here, the measurement does not merely capture the wrong things; it actively replaces the original purpose. The organization forgets what it was trying to do and begins to believe the metric is the mission. Teachers come to see test preparation as education itself. Police view arrest counts as public safety. The substitution becomes invisible because everyone participates in it.

These behaviors are rational responses to the incentive structures imposed from above. Blaming individuals misses the point. When careers, funding, and institutional survival depend on hitting numbers, sophisticated gaming is the predictable result. The dysfunction is built into the system, not the people.

Takeaway
Goal displacement is rarely conscious. By the time an organization realizes it has substituted the measurement for the mission, the substitution feels like common sense.

Measurement System Design

If measurement is unavoidable yet inherently corrupting, the question becomes how to design systems that fail more slowly and informatively. Several structural principles help. First, use multiple indicators that resist simultaneous gaming. A single metric is easily optimized; a balanced portfolio that includes inputs, processes, outputs, and outcomes creates tradeoffs that limit pure gaming strategies.

Second, separate measurement for learning from measurement for accountability. When the same numbers serve both functions, the accountability pressure contaminates the learning function. Frontline workers stop reporting honest data because it might be used against them. Distinct systems with different access rules can preserve diagnostic value even as accountability metrics get gamed.

Third, rotate metrics periodically and audit qualitatively. Static targets give organizations time to engineer around them. Periodic changes—combined with deep qualitative review of what the numbers actually represent—keep the underlying reality visible. This is expensive and politically uncomfortable, which is why it is rarely done.

Finally, recognize that measurement system design is itself a political question. Who chooses the metrics? Who defines the categories? Who has access to the raw data? These decisions allocate power and shape which interests get represented in organizational behavior. Treating measurement as a technical exercise obscures its political character and ensures that powerful stakeholders will quietly shape it to their advantage.

Takeaway
Good measurement systems are not those that resist gaming entirely—none do—but those that surface their own distortions quickly enough to be corrected before the gap becomes catastrophic.

Performance measurement is not neutral. It is a technology of governance that reshapes the behavior of the organizations it touches, often in ways that defeat its stated purposes. The faith that better metrics will produce better outcomes ignores how metrics interact with incentive structures and political pressures.

This does not mean abandoning measurement. Modern organizations cannot operate without it. But it means treating metrics with appropriate skepticism—as politically contested artifacts rather than objective windows onto reality. Every dashboard reflects choices about what matters, choices that benefit some stakeholders over others.

The deeper insight is that organizations are not machines to be optimized but political systems to be navigated. Measurement is one tool among many for distributing power and shaping behavior, and like all such tools, it produces unintended consequences that often dwarf its intended effects.