For over three decades, TCP congestion control operated under a deceptively simple premise: packet loss signals congestion. Algorithms like Reno, NewReno, and CUBIC built elaborate machinery atop this assumption, treating every dropped segment as a whispered warning from the network. The approach worked remarkably well when buffers were small and bandwidths modest.
Modern networks have broken these assumptions in consequential ways. Deep buffers in switches, routers, and middleboxes absorb packets long before they drop, masking congestion signals until queues are already pathologically full. Meanwhile, high bandwidth-delay product paths—transcontinental links, satellite connections, cellular last-miles—punish loss-based algorithms with throughput collapse disproportionate to the underlying physical conditions.
What has emerged in response is not an incremental patch but a paradigm shift. Google's BBR (Bottleneck Bandwidth and Round-trip propagation time), along with a growing family of model-based and delay-sensitive algorithms, reconceives congestion control as an estimation problem rather than a reactive one. The question is no longer when did we lose a packet? but what is the network's actual capacity, and how close are we operating to it? This renaissance deserves careful examination—not merely for its engineering elegance, but for what it reveals about the costs of inherited assumptions in systems designed at scale.
The Bufferbloat Pathology
Bufferbloat is the condition in which excessive queueing in network buffers produces latencies that dwarf the underlying propagation delay. A path with a 20ms round-trip time can, under sustained load, exhibit effective RTTs exceeding 1000ms—two orders of magnitude of induced delay arising not from physics but from buffering policy.
The mechanism is a direct consequence of loss-based congestion control's operating principle. Algorithms like CUBIC expand their congestion window until packet loss occurs. In a network with a bottleneck buffer of size B and bandwidth C, loss signals arrive only when the buffer is fully occupied, at which point the standing queue contributes B/C additional delay to every packet in flight.
The feedback loop is perverse. Larger buffers, deployed ostensibly to absorb bursts and prevent loss, actively worsen the problem by delaying the congestion signal that loss-based algorithms require. The sender interprets the absence of loss as permission to send faster, filling the oversized buffer and producing latency that ruins interactive applications sharing the path.
Queue management disciplines like CoDel and PIE address bufferbloat at the router by actively signaling congestion before buffers saturate. But these require deployment at every bottleneck—an uneven reality across the internet. The endpoint-centric alternative is to stop treating loss as the primary signal altogether.
The analytical insight is that buffer occupancy and throughput are decoupled above a certain threshold. Once the bottleneck link is saturated, additional in-flight data accrues only as queueing delay, contributing nothing to goodput. An algorithm that recognizes this can extract full bandwidth without inducing standing queues.
TakeawayA system that signals through failure will always lag reality. Engineering congestion control around loss is engineering around the symptom rather than the quantity of interest.
BBR's Model-Based Approach
BBR reframes congestion control as a problem of continuously estimating two network properties: the bottleneck bandwidth BtlBw and the round-trip propagation time RTprop. Together these define the bandwidth-delay product (BDP), which represents the optimal amount of in-flight data—enough to saturate the path without creating a queue.
The estimation is windowed and adversarially conservative. BBR tracks the maximum delivery rate observed over a recent interval (typically 10 RTTs) as its bandwidth estimate, and the minimum RTT observed over a longer window (typically 10 seconds) as its propagation delay estimate. The maximum filter on bandwidth captures peak achievable throughput; the minimum filter on RTT captures the unqueued baseline.
Operation proceeds through explicit phases. During ProbeBW, BBR cyclically pulses its sending rate above the estimate to discover increased capacity, then drains any queue it may have created. Periodically it enters ProbeRTT, reducing in-flight data sharply to re-measure true propagation delay uncontaminated by queueing. This decoupling of probing from exploitation is the algorithm's structural innovation.
The pacing rate and congestion window are derived from the model rather than accumulated through additive-increase heuristics. BBR paces packets at the estimated bandwidth and caps in-flight data near the BDP. The result is that BBR operates near Kleinrock's optimal point—the knee where throughput is maximized and latency is minimized simultaneously—rather than at the cliff where buffers overflow.
On paths with shallow buffers and random loss, where CUBIC interprets non-congestive loss as congestion and collapses, BBR's indifference to loss yields dramatic throughput improvements. Google reported YouTube throughput gains of 4% globally and up to 14% in regions with poor connectivity after BBR deployment—a material effect at that scale.
TakeawayControl based on an explicit model of the plant outperforms control based on reactive signals, provided the model captures the quantities that actually matter.
Fairness and Coexistence
A congestion control algorithm does not operate in isolation; it shares bottlenecks with flows running other algorithms, and the fairness properties of this coexistence determine whether deployment is safe at scale. BBR's relationship with loss-based TCP is considerably more complex than its raw throughput numbers suggest.
The first-generation BBR (BBRv1) exhibited known fairness pathologies. In shallow-buffered environments, BBR would aggressively claim bandwidth that CUBIC flows, sensitive to loss, would cede. In deep-buffered environments, the dynamic inverted: CUBIC flows would fill buffers, and BBR—unable to distinguish buffer-induced RTT inflation from propagation delay in heavily queued conditions—could be starved or mislead by inflated RTprop estimates.
Intra-protocol fairness also proved nontrivial. Multiple BBR flows sharing a bottleneck could settle into unequal equilibria where flows with slightly different synchronization of their ProbeBW cycles captured disproportionate bandwidth. The assumption that symmetric algorithms produce symmetric outcomes does not hold when estimation windows and probing phases interact.
BBRv2, and subsequently BBRv3, incorporate loss and ECN signals as auxiliary inputs, constraining the algorithm to respond when the network explicitly signals distress. The design philosophy shifts from pure model-based control toward a hybrid that retains BBR's insight about bandwidth estimation while respecting the contract that loss-based flows expect from their neighbors.
The deeper lesson is that congestion control is a commons problem. An algorithm optimized for single-flow performance in isolation can produce emergent behavior at aggregate scale that undermines the predictability the internet depends on. Fairness is not a property of an algorithm but of the ecosystem it inhabits.
TakeawayOptimality in isolation is not optimality in aggregate. Any control algorithm deployed at scale must be analyzed as a participant in a game, not as a solitary agent.
The TCP congestion control renaissance is instructive beyond its immediate technical contributions. It demonstrates how assumptions baked into foundational systems—loss as the canonical congestion signal, buffers as benign absorbers of burstiness—can persist long after the conditions that justified them have changed.
Model-based approaches like BBR show that measurement and estimation, properly bounded by conservative filters and explicit probing, can replace reactive heuristics in domains once considered the exclusive province of empirical tuning. The principle generalizes: wherever a system reacts to failure signals, there may be a more direct measurement of the underlying quantity that yields better control.
The unfinished work lies in coexistence and deployment dynamics. As network conditions continue to evolve—higher bandwidths, lower latencies, more heterogeneous paths—the algorithms that govern flow control must evolve with them, and must do so while remaining good citizens of a shared substrate. The renaissance is not a conclusion but a trajectory.