How Congestion Control Algorithms Are Learning from Network Signals

top view photography of two white boats on water at daytime

8 min read

Congestion control is shifting from reactive, loss-based inference toward richer integration of explicit network signals, path models, and learned policies.

ECN-based algorithms like DCTCP and HPCC let network devices communicate congestion state directly, enabling tighter queue control and dramatically lower tail latency in datacenter environments.

BBR pioneered model-based control by estimating bottleneck bandwidth and propagation delay, though its evolution toward incorporating loss and ECN signals reveals the limits of pure estimation.

Reinforcement learning agents can discover congestion control policies that outperform hand-designed algorithms on specific topologies, but generalization to diverse real-world conditions remains the central unsolved challenge.

The future of congestion control lies in hybrid architectures that fuse explicit network telemetry, structural path models, and data-driven adaptation into unified control frameworks.

For decades, congestion control has operated on a fundamental information asymmetry. Endpoints inject packets into the network and then wait to see what happens—interpreting lost packets or rising delays as indirect evidence of congestion somewhere along the path. It's a remarkably successful paradigm, but it's also a reactive one. TCP Reno and its loss-based descendants essentially treat the network as an opaque pipe, probing its capacity by pushing until something breaks.

That model is showing its age. As link speeds climb past 100 Gbps, as datacenter fabrics demand microsecond-level latency discipline, and as heterogeneous wireless paths defy simple capacity models, the cost of inference-by-loss grows steeper. A single dropped packet on a 100 Gbps link can represent milliseconds of wasted transmission opportunity. The gap between what endpoints guess about network state and what they could know has become a performance bottleneck in its own right.

The response is a shift toward richer signal integration. Explicit queue-depth notifications from switches, model-based estimation of bottleneck bandwidth and propagation delay, and most recently, reinforcement learning agents that discover control policies from data—each represents a different strategy for closing that information gap. What unites them is a recognition that congestion control is fundamentally a signal processing problem, and the quality of the control depends on the quality of the signals consumed.

Explicit Congestion Notification: Letting the Network Speak

Loss-based congestion control treats packet drops as a binary congestion signal. The problem is that a drop is a late signal—by the time a packet is discarded, the queue has already overflowed, latency has already spiked, and goodput has already degraded. Explicit Congestion Notification (ECN) changes this by allowing routers to mark packets with a congestion experience (CE) codepoint in the IP header before queues overflow. The endpoint receives a graded, early warning rather than a catastrophic event.

DCTCP—Data Center TCP—demonstrated how transformative this shift could be. Rather than treating any ECN mark as a signal to halve the congestion window (as standard ECN-capable TCP does), DCTCP computes the fraction of marked packets over a window and reduces its sending rate proportionally. A small fraction of marks triggers a small reduction; a large fraction triggers an aggressive one. The result is a control loop that keeps queue occupancy tightly bounded around the marking threshold, delivering both high throughput and low, stable latency.

The implications for datacenter fabrics are profound. In environments running partition-aggregate workloads—where a single query fans out to hundreds of workers and the response time is gated by the slowest—tail latency dominates performance. DCTCP's ability to maintain shallow queues directly compresses that tail. Measurements in production environments have shown order-of-magnitude reductions in 99th-percentile queue delay compared to loss-based schemes operating on the same fabric.

Yet ECN-based control introduces its own design tensions. The marking threshold at switches becomes a critical tuning parameter—set it too low and you starve throughput; set it too high and you lose the latency benefit. Multi-tenant environments face fairness questions when ECN-aware flows compete with legacy loss-based flows, since the latter may dominate buffer space. And extending ECN semantics beyond the datacenter, into wide-area paths with heterogeneous equipment and longer feedback loops, remains an active research challenge.

More recent work pushes explicit signaling further. Protocols like HPCC (High Precision Congestion Control) embed per-hop telemetry—link utilization, queue depth, transmission delay—directly into packet headers using in-band network telemetry. The endpoint doesn't just learn that congestion exists; it learns where, how much, and on which specific link. This moves congestion control from inference to observation, collapsing the information gap almost entirely within controlled network domains.

Takeaway
The most effective congestion signals aren't the ones endpoints infer—they're the ones the network explicitly provides. Control quality tracks signal quality, and the richest signals come from making the network an active participant in the feedback loop.

Model-Based Control: Measuring the Path Instead of Probing It

Google's BBR (Bottleneck Bandwidth and Round-trip propagation time) represents a philosophical departure from both loss-based and ECN-based approaches. Instead of reacting to congestion signals—whether drops or marks—BBR attempts to estimate the two physical properties that define a network path's capacity: the bottleneck bandwidth and the round-trip propagation delay. The optimal operating point, in Kleinrock's sense, is to send at exactly the bottleneck rate with exactly one bandwidth-delay product of data in flight. BBR tries to find and hold that point.

The estimation machinery is elegant in principle. BBR maintains a windowed maximum of delivery rate samples (approximating bottleneck bandwidth) and a windowed minimum of round-trip time samples (approximating propagation delay). It cycles through phases—probing for more bandwidth by briefly increasing the sending rate, then draining any resulting queue—to keep its estimates fresh without persistently inflating queues. The control law doesn't reference loss or marks at all; it's driven entirely by the model.

In practice, BBR delivered striking results on Google's wide-area backbone, achieving 2–25x improvements in throughput on paths previously bottlenecked by loss-based algorithms that couldn't distinguish between congestion loss and the random loss endemic to long-haul links. By decoupling its rate control from loss events, BBR could sustain near-optimal throughput on paths where traditional TCP would repeatedly collapse its window.

But model-based control confronts hard estimation problems. The windowed-max bandwidth estimate can be persistently inflated when multiple BBR flows share a bottleneck, because each flow's probing phase injects excess traffic that other flows measure as available bandwidth. BBRv2 and BBRv3 address this by incorporating loss and ECN signals back into the control loop—a pragmatic acknowledgment that pure model-based control, without any reactive component, struggles with multi-flow convergence and fairness. The bottleneck bandwidth of a shared link isn't a fixed physical constant; it's a dynamic quantity shaped by competing traffic.

The deeper lesson is that model-based and signal-based approaches aren't opposites—they're complements. BBR's evolution toward integrating ECN marks alongside its bandwidth-delay model reflects a convergence: the best congestion controllers will likely combine structural knowledge of the path (what the model provides) with operational feedback from the network (what signals provide). Neither alone is sufficient. The model provides a target; the signals correct for the model's inevitable errors.

Takeaway
Modeling the path gives you a target operating point; reacting to signals corrects for reality's deviations from the model. The most robust congestion controllers will fuse both—structural estimation and operational feedback—rather than relying on either alone.

Learning-Based Approaches: Control Policies from Data

If model-based control encodes human intuition about path properties into explicit estimation algorithms, reinforcement learning (RL) asks a more radical question: can we learn the control policy directly from data, without prescribing the model? Projects like Aurora, Orca, and Sage have demonstrated that RL agents, trained in simulation or through online interaction, can discover congestion control policies that match or exceed hand-designed algorithms on specific network configurations.

The typical architecture is a neural network that observes a state vector—recent delivery rates, RTT samples, loss indicators, sending rate history—and outputs an action, usually a multiplicative adjustment to the sending rate or congestion window. Training uses a reward function that encodes the design objective: some weighted combination of throughput, delay, and loss. The agent explores the action space through thousands of simulated episodes, gradually converging on a policy that maximizes cumulative reward.

The results on controlled benchmarks are genuinely impressive. Aurora, for instance, demonstrated throughput and latency combinations that Pareto-dominated both Cubic and BBR on the specific topologies used during training. The learned policy discovered non-obvious behaviors—aggressive probing in certain delay regimes, conservative backoff in others—that no human designer had prescribed. It suggests there's unexplored policy space that traditional algorithm design hasn't reached.

The deployment challenge is generalization. A policy trained on a distribution of simulated network conditions may behave unpredictably on out-of-distribution paths—an uncommon buffer size, an unusual multiplexing pattern, a satellite link with 600ms RTT. Unlike hand-designed algorithms whose failure modes are analytically understood, a neural network policy can fail in ways that are difficult to predict or diagnose. Safety constraints, fallback mechanisms, and robust training distributions become first-order design concerns, not afterthoughts.

There's also a systems-level question about where inference happens. Running a neural network forward pass on every ACK arrival imposes computational overhead that may be acceptable on a server but is problematic on embedded devices or high-frequency trading infrastructure. Distillation techniques—training a compact model to mimic a larger one—and hybrid architectures that use RL to tune parameters of a classical algorithm rather than replace it entirely are promising directions. The future likely isn't pure RL replacing Cubic everywhere; it's RL augmenting and personalizing classical control in contexts where the additional complexity is justified by measurable performance gains.

Takeaway
Learned congestion control policies can discover strategies humans never designed, but their value is bounded by their ability to generalize safely to conditions they've never seen. The hardest problem isn't training the model—it's trusting it on unfamiliar paths.

Congestion control is converging from three directions simultaneously. Explicit network feedback collapses the information gap between endpoints and infrastructure. Model-based estimation provides structural understanding of path capacity. And learning-based approaches explore policy spaces that human intuition can't easily navigate.

The algorithms likely to define next-generation transport protocols won't belong cleanly to any single category. They'll fuse explicit signals with learned models, using network telemetry to ground RL policies and using RL to adapt classical algorithms to conditions that resist analytical modeling. The boundaries between these approaches are already blurring.

What's emerging is a richer conception of what congestion control is—not a fixed algorithm but an adaptive signal processing system, one that continuously refines its understanding of the network it traverses. The endpoints are finally learning to listen.