Repeat Expansion Diseases: When DNA Stutters Cause Neurodegeneration

A woman in a bathing suit holding her hair

9 min read

Over fifty neurological diseases arise from the pathological expansion of short tandem DNA repeats, which grow unstable once they cross a critical structural threshold.

Expansion is driven not only by replication slippage but by the genome's own DNA repair and recombination machinery, making it an ongoing somatic process that worsens throughout life.

Expanded repeats cause disease through multiple simultaneous mechanisms including toxic protein aggregation, RNA-mediated splicing dysregulation, epigenetic gene silencing, and unconventional RAN translation.

Somatic expansion rate in vulnerable tissues like the brain has emerged as a primary modifier of disease onset, fundamentally reframing these conditions as progressive mutational diseases rather than static genetic defects.

Emerging therapies targeting MSH3-dependent expansion, allele-selective antisense oligonucleotides, CRISPR-based repeat contraction, and structure-binding small molecules are converging on a multilayered treatment strategy.

The human genome is littered with short tandem repeats—stretches where a motif of two to six nucleotides is duplicated in succession, sometimes dozens or hundreds of times. Most of these repeat tracts are innocuous, even functional. But when the copying machinery stumbles and a tract expands beyond a critical threshold, the consequences can be devastating. Over fifty neurological and neuromuscular diseases, from Huntington's disease to fragile X syndrome to myotonic dystrophy, trace their origins to this single class of mutation: the pathological expansion of tandem repeats.

What makes these diseases uniquely insidious is their dynamic nature. Unlike a point mutation that is inherited in a fixed state, expanded repeats are unstable across generations and even within the tissues of a single individual. A parent carrying a borderline expansion can transmit a dramatically longer tract to their child—a phenomenon called genetic anticipation, where disease onset grows earlier and severity increases with each generation. The repeat doesn't just encode a defect. It encodes a tendency toward greater defect.

Understanding repeat expansion diseases demands thinking about DNA not merely as a static information archive but as a physical polymer whose structural properties—its ability to form hairpins, slip-strand structures, and R-loops—directly determine its mutational fate. The information doesn't just matter for what it says. It matters for how it folds. And that folding, in turn, determines whether the genome can faithfully propagate itself or whether it stutters into pathology.

Expansion Mechanisms: How the Genome Loses Count

The core problem underlying repeat expansion is deceptively simple: during DNA replication, the two strands of a repeat tract can misalign. When a nascent strand slips backward on the template, forming a loop of extra repeats that gets incorporated into the new molecule, the tract grows. This replication slippage is the foundational mechanism, but it alone cannot account for the massive expansions—from tens to thousands of repeats—observed in diseases like Friedreich's ataxia or fragile X syndrome.

Several additional pathways amplify the problem. During DNA repair, particularly through the mismatch repair (MMR) and base excision repair (BER) systems, processing of oxidized bases or small loops within repeat tracts can paradoxically introduce further expansions. The repair machinery, designed to restore fidelity, instead becomes an engine of instability. Studies in mouse models have shown that knocking out key MMR components like MSH3 can dramatically reduce somatic expansion, confirming that repair processes are not bystanders but active drivers of tract growth.

The physical chemistry of the repeat sequence itself is central to this instability. Trinucleotide repeats such as CAG, CTG, and CGG form stable secondary structures—hairpins stabilized by non-canonical base pairs, slipped-strand DNA, and in some cases R-loops where the nascent RNA transcript hybridizes with the template strand, displacing the non-template DNA. These structures stall replication forks, recruit repair enzymes inappropriately, and resist the proofreading exonuclease activity that would normally trim misaligned strands. The repeat tract is, in essence, a structural trap for the genome maintenance machinery.

Homologous recombination and transcription-coupled processes add further dimensions of instability. Actively transcribed repeat loci are more prone to expansion, likely because transcription opens the duplex and promotes R-loop formation. In germline cells, where massive intergenerational expansions occur, the combination of high transcriptional activity and meiotic recombination creates a perfect storm for runaway tract growth. The threshold phenomenon—where tracts below a certain length are stable and those above it expand exponentially—reflects a tipping point where secondary structure formation becomes kinetically favored.

This mechanistic understanding has a critical clinical implication: expansion is not a one-time mutational event but an ongoing somatic process. In Huntington's disease, for example, CAG tracts in striatal neurons continue to expand throughout life, and the rate of somatic expansion in the brain is now understood to be a primary modifier of disease onset. The genome doesn't just carry the mutation. It progressively worsens it, decade by decade, in precisely the tissues most vulnerable to its effects.

Takeaway
Repeat expansion is not a static inherited defect but an ongoing mutational process driven by the genome's own maintenance machinery—the very systems meant to preserve fidelity become engines of progressive instability once a structural threshold is crossed.

Toxic Mechanisms: Three Paths from Expansion to Cellular Destruction

The pathological consequences of expanded repeats are not monolithic. Depending on the gene context, repeat sequence, and location within the transcript, expanded tracts cause disease through at least three distinct but often overlapping molecular mechanisms. The first and most classically understood is toxic protein gain-of-function. In polyglutamine diseases like Huntington's disease and several spinocerebellar ataxias, CAG expansions in coding regions produce proteins with abnormally long glutamine tracts. These polyQ stretches misfold, aggregate into intranuclear and cytoplasmic inclusions, sequester essential cellular proteins, and disrupt proteostasis networks. The expanded protein becomes a molecular sponge for critical transcription factors, chaperones, and ubiquitin-proteasome components.

The second mechanism operates at the RNA level. In myotonic dystrophy type 1, the CTG expansion lies in the 3' untranslated region of the DMPK gene. The expanded CUG-repeat RNA forms stable hairpin structures that accumulate in nuclear foci, sequestering RNA-binding proteins such as MBNL1, a key splicing regulator. The result is widespread splicing dysregulation across hundreds of transcripts, producing a multisystem disease that affects muscle, heart, brain, and endocrine organs. The pathology arises not from a single toxic protein but from a global disruption of RNA processing—a trans-dominant effect mediated entirely by an RNA structural element.

The third mechanism involves transcriptional silencing through chromatin remodeling. In fragile X syndrome, the CGG expansion in the 5' UTR of FMR1 triggers hypermethylation of the promoter and adjacent CpG island, heterochromatin formation, and complete transcriptional shutdown of the gene. The disease is thus a loss-of-function condition caused by epigenetic silencing, not by a toxic gene product. The expanded repeat converts an active gene into a permanently silent locus through a chromatin state that is remarkably resistant to reactivation. Similarly, in Friedreich's ataxia, GAA expansions in an intron of FXN form triplex DNA structures and R-loops that recruit repressive histone marks and reduce frataxin expression.

A more recently appreciated mechanism adds further complexity: repeat-associated non-AUG (RAN) translation. Expanded repeats can be translated in all reading frames without a canonical start codon, producing a cocktail of homopolymeric peptides—polyglutamine, polyalanine, polyserine, and others—that are individually toxic. RAN translation has been demonstrated in C9orf72-linked ALS/FTD, where a GGGGCC hexanucleotide expansion in a non-coding region generates dipeptide repeat proteins that impair nucleocytoplasmic transport, stress granule dynamics, and DNA repair. This mechanism dissolves the traditional distinction between coding and non-coding expansions: even repeats far from any open reading frame can produce toxic polypeptides.

In most repeat expansion diseases, multiple toxic mechanisms operate simultaneously. Huntington's disease involves polyglutamine aggregation, toxic RNA species, and RAN translation products. This mechanistic convergence explains the therapeutic challenge: blocking one pathway may not be sufficient when several parallel routes to cellular dysfunction remain active. It also explains why these diseases are so clinically heterogeneous—the balance among toxic mechanisms varies by tissue, cell type, and expansion size, producing distinct patterns of neurodegeneration even within a single disease.

Takeaway
Expanded repeats don't cause disease through a single mechanism—they can poison cells through toxic proteins, dysregulated RNA processing, epigenetic gene silencing, and unconventional translation simultaneously, which is why these diseases are so resistant to simple therapeutic strategies.

Therapeutic Strategies: Contracting the Repeat or Neutralizing Its Effects

The recognition that somatic expansion is a primary disease driver has opened an entirely new therapeutic paradigm: preventing further expansion or actively contracting the repeat tract. The most advanced strategy in this space targets MSH3, the mismatch repair component essential for somatic CAG expansion in Huntington's disease. Antisense oligonucleotides (ASOs) and small interfering RNAs designed to reduce MSH3 expression are in preclinical and early clinical development. By dampening the repair pathway that drives expansion, these approaches aim to halt disease progression at its mutational root—an intervention upstream of all downstream toxic mechanisms.

Parallel efforts target the toxic gene products directly. Allele-selective ASOs that degrade the expanded huntingtin mRNA while sparing the normal allele have shown proof of concept, though the non-selective ASO tominersen encountered setbacks in Phase III trials due to inflammatory complications, underscoring the need for precise allele discrimination and optimized delivery. For myotonic dystrophy, ASOs and small molecules that disrupt CUG RNA hairpins or block MBNL1 sequestration aim to restore normal splicing patterns. In fragile X syndrome, epigenetic editing strategies using dCas9 fused to demethylases seek to reactivate the silenced FMR1 locus by removing the repressive methylation marks laid down by the expansion.

CRISPR-based approaches represent perhaps the most ambitious frontier. Programmable nucleases can, in principle, excise the expanded repeat tract entirely, restoring a normal or near-normal allele. Proof-of-concept studies in patient-derived cells and mouse models have demonstrated contraction of CAG, CTG, and GAA tracts using Cas9 or paired nickases flanking the expansion. However, delivery to the central nervous system, potential off-target editing, and the risk of large chromosomal rearrangements at repetitive loci remain formidable barriers to clinical translation. Base editing and prime editing variants that avoid double-strand breaks may offer safer alternatives for modulating repeat length.

An emerging class of interventions targets the structural biology of the repeat itself. Small molecules that selectively bind expanded CUG or CAG RNA hairpins can prevent protein sequestration and RAN translation without requiring genetic modification. Compounds identified through high-throughput screening and rational design, such as the dimeric ligand developed against CUG repeats in myotonic dystrophy, demonstrate that the abnormal structures formed by expanded repeats are druggable targets. This pharmacological approach sidesteps the delivery challenges of genetic medicines while addressing the toxic RNA mechanism directly.

The therapeutic landscape for repeat expansion diseases is thus converging on a multilayered strategy: prevent somatic expansion to slow disease progression, degrade or neutralize toxic RNA and protein species to reduce ongoing damage, and ultimately contract or excise the expansion to achieve a durable genetic correction. No single modality is likely to be sufficient. The diseases took decades to manifest through multiple parallel mechanisms, and their treatment will likely require combinations that address those mechanisms in concert. What has changed is that, for the first time, each layer of this strategy has candidate interventions in active development.

Takeaway
The most transformative therapeutic insight in repeat expansion diseases is that the mutation is not fixed at birth—it worsens over a lifetime—which means slowing or halting somatic expansion could be as impactful as correcting the original genetic defect.

Repeat expansion diseases challenge a fundamental assumption embedded in classical genetics: that a mutation, once inherited, remains constant. These disorders reveal that the genome is a dynamic system where the boundaries between genotype and phenotype blur across time and tissue. The mutation is the process, not merely the starting condition.

The convergence of mechanistic understanding—from replication slippage to RAN translation to somatic instability as a disease modifier—has transformed these conditions from genetic curiosities into a testing ground for the most advanced genetic medicines in development. ASOs, CRISPR editing, epigenetic reprogramming, and structure-targeting small molecules are all being directed at this single class of mutation.

What repeat expansion diseases ultimately teach us is that biological information is inseparable from biological structure. The sequence stutters because the polymer folds. The fold stalls the machinery. The stalled machinery worsens the stutter. Breaking any link in that cycle is now within reach.