Why Blood Test Reference Ranges Don't Define Health and Disease

white and blue medication pill on pink textile

5 min read

Reference intervals are statistically derived from population distributions, meaning 5 percent of healthy individuals will fall outside them by mathematical definition.

Laboratory methods, demographics, and reference populations vary between institutions, making universal cutoffs inherently imprecise.

Intraindividual biological variation is often much smaller than population variation, so personal baselines frequently reveal meaningful change invisible to population ranges.

Bayesian reasoning—combining pre-test probability with test performance—remains the appropriate framework for interpreting laboratory values.

Targeted, clinically driven testing outperforms reflexive comprehensive panels by reducing false positives and downstream iatrogenic harm.

A patient receives laboratory results with a single value flagged in red. Anxiety follows. Yet that flag represents neither diagnosis nor disease—it simply indicates a number falling outside a statistically constructed interval. The distinction matters enormously in clinical practice.

Reference ranges have become so embedded in medical culture that they are frequently treated as binary indicators of health versus pathology. Clinicians and patients alike can fall into the trap of chasing numbers rather than interpreting biology. This reductionist approach misrepresents what laboratory medicine actually offers.

Understanding how reference intervals are derived, what they statistically represent, and how they should function within clinical reasoning is essential for evidence-based interpretation. The goal is not to dismiss laboratory values but to contextualise them appropriately—as one data point among many in a comprehensive clinical picture rather than as verdicts delivered by machines.

How Reference Intervals Are Actually Constructed

The Clinical and Laboratory Standards Institute (CLSI) guideline EP28-A3c establishes the methodology most laboratories follow for deriving reference intervals. The standard approach involves sampling at least 120 ostensibly healthy individuals, measuring the analyte of interest, and defining the reference interval as the central 95 percent of observed values—typically bounded by the 2.5th and 97.5th percentiles.

This construction is fundamentally statistical rather than physiological. By definition, 5 percent of entirely healthy individuals will fall outside the reference range on any given test. Order twenty independent analytes on a healthy person, and probability dictates that roughly one will flag abnormal. This is mathematical inevitability, not emerging pathology.

Additionally, the reference population itself introduces limitations. Selection criteria for apparently healthy subjects vary between laboratories, as do analytical methods, instrumentation, and calibration. A ferritin of 12 ng/mL may fall within one laboratory's range and below another's. Demographic factors—age, sex, ethnicity, altitude, pregnancy status—further complicate universal cutoffs.

Some intervals are partitioned for these variables; many are not. Reference ranges for testosterone, creatinine, and haemoglobin demonstrate meaningful demographic stratification, while others apply blanket cutoffs that obscure clinically relevant subgroup differences.

Takeaway
A reference range describes where 95 percent of a sampled population falls—it does not describe where health ends and disease begins. Statistical abnormality and clinical abnormality are fundamentally different concepts.

The Case for Individual Biological Setpoints

Intraindividual biological variation is often substantially smaller than interindividual variation. Research by Fraser and colleagues on biological variability has demonstrated that for analytes such as TSH, creatinine, and serum calcium, an individual's values cluster tightly around a personal setpoint, while the population range spans considerably wider.

This has profound implications for interpretation. A TSH shifting from 0.8 to 3.8 mIU/L remains within the typical reference range of 0.4 to 4.0, yet represents a near fivefold change for that individual—potentially signalling meaningful thyroid dysfunction. Without prior baselines, this trajectory is invisible to the clinician relying solely on population cutoffs.

The concept of the reference change value (RCV) formalises this insight. The RCV quantifies the difference between sequential results that can be considered statistically significant given analytical and biological variability. For many analytes, this threshold is crossed well before population reference limits are breached.

Longitudinal monitoring—comparing a patient to their own historical values—frequently provides more sensitive and specific information than single cross-sectional comparisons. This principle underlies surveillance strategies in oncology, endocrinology, and nephrology, where trajectory often matters more than absolute value.

Takeaway
You are not the average of a reference population; you are a system with your own equilibrium. Meaningful change can occur entirely within normal limits and be missed when population norms replace individual baselines.

Integrating Laboratory Values with Clinical Context

Bayesian reasoning provides the appropriate framework for laboratory interpretation. The post-test probability of disease depends on the pre-test probability—derived from history, examination, and epidemiology—combined with the likelihood ratio of the test result. A mildly abnormal value in a low pre-test probability context frequently represents noise rather than signal.

Consider the incidentally discovered elevated ALT in an asymptomatic patient. The differential ranges from laboratory artefact and recent exercise to nonalcoholic fatty liver disease and hepatitis. Without clinical context, the isolated value is nearly uninterpretable. With context, it becomes actionable information that either prompts further workup or reassurance.

This principle extends to the problem of incidentalomas in laboratory medicine. The more tests ordered, the higher the probability of statistical outliers demanding explanation. Panels and screening batteries increase the likelihood of false positives, triggering cascades of follow-up testing, patient anxiety, and occasionally iatrogenic harm from unnecessary interventions.

Evidence-based practice therefore advocates for targeted testing driven by clinical questions rather than reflexive comprehensive panels. Each test ordered should have a plausible pre-test probability of yielding clinically actionable information, and each result should be interpreted in light of the specific patient before the clinician—not the abstract population from which the reference interval was derived.

Takeaway
A laboratory value is evidence, not verdict. Its meaning emerges only when interpreted against the specific clinical scenario that prompted the test.

Reference ranges serve a legitimate purpose: they provide statistical anchors that help clinicians flag values warranting attention. They do not, however, diagnose disease or certify health. Treating them as binary thresholds misrepresents both the statistics underlying their construction and the biology they purport to describe.

Evidence-based interpretation requires integrating population norms with individual baselines, clinical context, and the pre-test probability of the condition in question. This is the cognitive work that laboratory automation cannot replace.

As personalised medicine advances and longitudinal health data accumulates, the primacy of the population reference range will continue to erode in favour of more individualised frameworks. The future of laboratory interpretation lies not in better cutoffs, but in better context.