If you wanted to understand the history of a city, you could start at its founding and work forward. Or you could start with the people living there today and trace their family trees backward until they converge. Coalescent theory takes the second approach—but with genes instead of people.
Developed in the early 1980s by John Kingman, this mathematical framework flipped population genetics on its head. Instead of modeling evolution forward from some ancestral population, it traces genetic lineages backward from a present-day sample until they merge into common ancestors. The shift in perspective turned out to be profoundly powerful.
Coalescent theory has become one of the most important tools in modern evolutionary biology. It lets researchers reconstruct demographic histories, estimate population sizes from thousands of years ago, and understand why the evolutionary story told by one gene can differ dramatically from the story told by another. Here's how looking backward illuminates the path forward.
The Coalescent Process: Merging Lineages in Reverse
Imagine sampling two copies of a gene from a population today. Each copy was inherited from a parent in the previous generation, who inherited it from their parent, and so on. If you trace both copies backward, generation by generation, eventually they'll share a common ancestral copy. That moment of merging is called a coalescent event—the point where two lineages coalesce into one.
The rate at which lineages coalesce depends primarily on population size. In a small population, any two gene copies are more likely to have been inherited from the same individual in the recent past. In a large population, it takes longer on average for lineages to find a common ancestor. Specifically, the expected time for two lineages to coalesce in a diploid population of size N is 2N generations. A population of 10,000 individuals means an average wait of 20,000 generations for two randomly chosen gene copies to share an ancestor.
With more than two samples, the process works sequentially. If you start with k lineages, the most recent coalescent event reduces them to k−1. Then another event reduces that to k−2, and so on, until all lineages merge into a single most recent common ancestor, or MRCA. The math shows that the most recent coalescent events happen quickly—when many lineages exist, the chance of any two sharing an ancestor is high—while the final coalescence between the last two lineages takes the longest.
This creates a characteristic tree shape. The branches near the tips are short and crowded, while the deepest branch—connecting the last two lineages—is disproportionately long. This isn't a quirk; it's a fundamental signature of the coalescent process in a constant-sized population, and deviations from this expected shape are precisely what reveal evolutionary forces at work.
TakeawayEvolution doesn't just push populations forward—it leaves a traceable signature when read in reverse. The time it takes gene copies to find a common ancestor is a direct reflection of the population's size.
Gene Trees Within Species Trees: When Histories Disagree
Here's something that initially troubled biologists: if you sequence the same gene in three closely related species, the gene tree—the branching pattern of that particular gene's history—doesn't always match the species tree. Species A and B might be each other's closest relatives, but for a particular gene, A might be more closely related to C. This isn't an error. It's a predictable consequence of how the coalescent process works within branching populations.
The problem is called incomplete lineage sorting. When an ancestral population splits into two species, the gene copies within that population already have their own history—their own coalescent tree. If the ancestral population was large or the time between successive speciation events was short, some gene lineages may not have coalesced before the next split occurred. Those "unsorted" lineages can then sort into a pattern that contradicts the species-level relationships.
This isn't rare. Estimates suggest that roughly 30% of the human genome shows a genealogical history that groups us more closely with gorillas than with chimpanzees at the level of individual genes, even though the species tree unambiguously places humans and chimps as closest relatives. The species relationship is real—it's supported by the majority of the genome—but individual genes carry their own idiosyncratic histories shaped by the coalescent process operating within ancestral populations.
Recognizing this distinction was transformative. Early phylogenetic methods assumed that the history of a gene was the history of the species. Modern methods explicitly model the coalescent process occurring within the branches of the species tree, accounting for the fact that gene histories are embedded within, but not identical to, species histories. This produces far more accurate reconstructions of how species are actually related.
TakeawayA single gene tells its own story, not necessarily the species' story. Evolutionary relationships emerge not from any one gene's history but from the statistical weight of thousands of independent genealogies.
Inferring Demographic History: Reading the Past in Present-Day DNA
Because coalescent times depend on population size, the shape of a gene genealogy encodes demographic information. A population that was historically small will have gene trees where lineages coalesce rapidly—short internal branches compressed into a brief window of time. A population that was historically large will show the opposite: long branches reflecting the extended wait for common ancestors in a vast gene pool.
Researchers exploit this by sampling many genes and examining the distribution of coalescent times across the genome. A population bottleneck—a sharp reduction in size—leaves a distinctive signature: an excess of coalescent events clustered around the time of the bottleneck, because fewer individuals meant faster merging of lineages. Population expansions produce the reverse pattern, with many long terminal branches as lineages persist independently in the growing population.
Methods like PSMC (Pairwise Sequentially Markovian Coalescent) can reconstruct population size changes over time from a single diploid genome. By examining the two copies of each chromosome that every diploid individual carries, the method identifies where those copies coalesced at different points along the genome. Regions that coalesced recently imply a small population at that time; regions that coalesced long ago imply a large one. Applied to a single human genome, PSMC has revealed population crashes during ice ages and expansions during warmer periods—a demographic autobiography written in DNA.
These approaches have rewritten our understanding of species we thought we knew well. We now know that cheetahs passed through a severe bottleneck roughly 10,000 years ago, explaining their famously low genetic diversity. We can detect ancient admixture events between human populations and trace the expansion of our species out of Africa. The coalescent framework turned every genome into a historical document, waiting to be read.
TakeawayYour DNA doesn't just encode instructions for building a body—it records the demographic upheavals your ancestors survived. Coalescent theory is the key to reading that record.
Coalescent theory achieved something rare in science: it made a hard problem dramatically simpler by changing the direction of inquiry. Looking backward from a sample, rather than forward from an origin, revealed structure that forward-time models obscured.
The framework unified disparate observations—gene tree discordance, patterns of genetic diversity, signatures of ancient bottlenecks—under a single mathematical roof. It showed that the present-day genome is not a blank slate but a layered archive of population history.
Every genome you carry is a document written by the coalescent process, recording population sizes, migrations, and splits stretching back hundreds of thousands of generations. The trick was learning to read it in reverse.