Understanding Network Latency: Where Milliseconds Go

2nd. gen black Amazon Echo speaker on white panel

6 min read

Network latency is composed of four distinct components—serialization, propagation, processing, and queuing delay—each requiring different optimization strategies.

Serialization delay matters most on low-bandwidth links, while propagation delay is governed by physics and can only be reduced by shortening the path.

Bufferbloat occurs when oversized buffers mask congestion signals from TCP, causing latency spikes that are worse than controlled packet loss.

Active Queue Management algorithms like CoDel and FQ-CoDel proactively manage queue depth to prevent latency inflation under load.

Effective latency measurement requires percentile distributions rather than averages, and one-way measurements demand verified clock synchronization between endpoints.

Every network engineer has stared at a latency number and wondered where the time actually went. A round-trip measurement of 85 milliseconds between two endpoints represents a composite of several distinct physical and computational processes, each consuming its own share of that budget. Understanding the breakdown isn't academic—it determines whether you can actually fix the problem or whether you're fighting physics.

The challenge is that latency isn't one thing. It's at least four things stacked on top of each other, and the dominant component shifts depending on the network scenario. A data center east-west path has a completely different latency profile than a transcontinental WAN link or a congested last-mile connection. Treating latency as a single number to minimize leads engineers to optimize the wrong layer.

This article dissects the core components of network latency, explains why excessive buffering is often worse than packet loss, and covers the measurement techniques that reveal what's actually happening on your network. The goal is to give you a framework for diagnosing where milliseconds go—and which ones you can reclaim.

The Four Delays: Quantifying What Eats Your Latency Budget

Network latency decomposes into four fundamental components: serialization delay, propagation delay, processing delay, and queuing delay. Serialization delay is the time required to push all the bits of a packet onto the wire. For a 1,500-byte packet on a 1 Gbps link, that's roughly 12 microseconds—negligible. On a 1 Mbps link, the same packet takes 12 milliseconds. This is why bandwidth upgrades on slow links produce dramatic latency improvements, while the same upgrade on fast links barely registers.

Propagation delay is physics: the speed of light in fiber is approximately 200,000 km/s, which translates to about 5 microseconds per kilometer. A 3,000 km cross-country path introduces roughly 15 milliseconds of one-way propagation delay. You cannot optimize this. You can only shorten the path, which is why CDN placement and peering strategy matter so much for latency-sensitive applications. Every unnecessary routing detour adds propagation time that no amount of hardware can eliminate.

Processing delay covers the time a router or switch spends examining headers, performing lookups, applying ACLs, and forwarding the packet. Modern merchant silicon handles basic forwarding in low single-digit microseconds. But add complex policy, deep packet inspection, or encryption and processing delay can jump by an order of magnitude. This is the component that makes the difference between hardware-accelerated forwarding and software-path processing feel like entirely different networks.

Queuing delay is the wildcard. It's zero when links are idle and can spike to hundreds of milliseconds when buffers fill under congestion. Unlike the other three components, queuing delay is variable—it depends on traffic patterns, buffer sizes, and scheduling algorithms at every hop. In many real-world networks, queuing delay dominates the latency experience during busy periods. It's also the component engineers have the most control over, which makes it the most important one to understand deeply.

Takeaway
When diagnosing latency, identify which component dominates before attempting optimization. Bandwidth upgrades fix serialization delay, path changes fix propagation delay, and queue management fixes queuing delay—but none of these solve each other's problems.

Bufferbloat: When More Buffer Creates More Pain

Bufferbloat is one of the most counterintuitive problems in network engineering. The instinct is reasonable: if packets are being dropped due to congestion, add more buffer memory to absorb the burst. Router and switch vendors followed this logic for years, shipping devices with enormous buffers. The result was networks that rarely dropped packets—but introduced latency spikes of seconds under load, because packets sat in queues waiting behind a deep backlog instead of being delivered or dropped promptly.

The core issue is that TCP's congestion control algorithms rely on packet loss as a signal. When a sender detects loss, it reduces its sending rate. With oversized buffers, the loss signal is delayed or eliminated entirely. The sender keeps ramping up, the buffer keeps filling, and round-trip times balloon. Users experience this as applications that feel responsive under light load but become unusable during peak traffic. The irony is that a network dropping packets at a reasonable threshold often performs better from the user's perspective than one that buffers everything.

Active Queue Management (AQM) algorithms like CoDel and FQ-CoDel address bufferbloat by managing queue depth proactively. CoDel doesn't look at queue length—it monitors how long packets have been sitting in the queue. If sojourn time exceeds a threshold (typically 5 ms), CoDel begins dropping packets to signal congestion before the buffer fills completely. FQ-CoDel adds flow fairness by isolating individual flows into separate queues, preventing a single bulk transfer from inflating latency for interactive traffic.

The practical takeaway for network engineers is to audit buffer configurations on every forwarding device in latency-sensitive paths. Default buffer sizes on many platforms are set for throughput maximization, not latency control. Enabling AQM on congestion points—particularly on WAN edge routers and broadband head-ends—is often the single highest-impact latency optimization available. It's not glamorous work, but it directly addresses the most variable and user-visible component of the latency stack.

Takeaway
Buffers trade latency for throughput. In interactive applications, controlled packet loss at modest queue depths delivers a better user experience than deep buffers that mask congestion and delay the feedback signals that TCP needs to regulate itself.

Measuring Latency Honestly: Percentiles, Probes, and Clock Problems

Average latency is a lie—or at least a dangerous oversimplification. A path with a mean latency of 20 ms might deliver 95% of packets in 15 ms and the remaining 5% in 120 ms. For interactive applications, that tail latency defines the user experience. This is why percentile distributions matter: P50 (median), P95, P99, and P99.9 each tell you something different about your network's behavior. The gap between P50 and P99 reveals how much jitter and queuing variability exists in the path.

Active measurement sends synthetic probes—ICMP echo, TCP SYN/ACK, or UDP timestamps—between known endpoints at regular intervals. It's straightforward to implement and gives you a consistent baseline. But probes may be treated differently than real traffic by QoS policies, and they add load. Passive measurement instruments actual application traffic, capturing TCP handshake times or correlating packet captures at ingress and egress points. It reflects real user experience but requires more infrastructure and careful analysis to extract clean latency signals from noisy data.

One-way latency measurement introduces the hardest problem: clock synchronization. Round-trip time sidesteps this because the same clock timestamps both send and receive. One-way delay requires synchronized clocks at both endpoints, and even NTP typically only achieves accuracy within a few milliseconds. PTP (IEEE 1588) can reach sub-microsecond accuracy with hardware timestamping, but requires support throughout the network path. Without adequate clock sync, one-way measurements can show impossible negative latency or misleading asymmetry.

A practical measurement strategy combines active probes for continuous baseline monitoring with passive analysis during troubleshooting. Store latency data as histograms or percentile summaries, not just averages. Set alerting thresholds on P99 rather than mean—this catches the tail latency spikes that degrade application performance long before the average shifts. And always document your measurement methodology, because a latency number without context about how it was captured is nearly meaningless.

Takeaway
Latency is a distribution, not a number. Measure and report percentiles, understand whether you're capturing round-trip or one-way delay, and never trust a one-way measurement without verifying clock synchronization accuracy between endpoints.

Latency is a composite problem, and treating it as one is the most common engineering mistake. Serialization, propagation, processing, and queuing each respond to different interventions. Knowing which dominates in your specific scenario is the prerequisite for effective optimization.

The principles are straightforward: shorten paths where propagation dominates, manage queues where buffering dominates, and measure with percentiles rather than averages so you see what your users actually experience. Bufferbloat remains one of the most underdiagnosed performance issues in production networks.

Every millisecond in your latency budget went somewhere specific. The engineering discipline is in accounting for each one—and accepting the ones you cannot change while aggressively optimizing the ones you can.