The Architecture of Terabit Ethernet and Its Infrastructure Requirements

top view photography of two white boats on water at daytime

9 min read

Terabit Ethernet scaling depends on increasing per-lane SerDes speeds, but equalization energy at 100 Gbaud and beyond is becoming a dominant fraction of total switch power.

Forward error correction schemes grow more powerful with each generation, but concatenated FEC for 1.6 TbE introduces latency and decoding power costs that ripple through the entire system.

Co-packaged optics eliminates lossy PCB traces by integrating photonic engines adjacent to switch ASICs, potentially cutting per-port power by 30 to 50 percent.

The trade-off for co-packaged optics is reduced serviceability and increased packaging complexity, driving exploration of near-packaged and on-board optical alternatives.

The defining metric for next-generation Ethernet is shifting from raw throughput to energy per bit, reshaping datacenter architecture from topology to packaging.

Ethernet has reinvented itself roughly every five years for three decades, each generation compressing more bandwidth into the same rack unit of space. But the leap toward 800 Gigabit Ethernet—and the 1.6 Terabit standard already taking shape in IEEE 802.3dj—represents something qualitatively different from prior doublings. We are approaching hard physical boundaries where copper traces behave less like wires and more like lossy waveguides, where photons must replace electrons closer and closer to the silicon, and where the energy budget for correcting transmission errors threatens to rival the energy spent on switching itself.

For network researchers and datacenter architects, this isn't an abstract scaling exercise. Every AI training cluster, every hyperscale fabric, every high-frequency trading interconnect depends on the assumption that Ethernet bandwidth will continue to grow economically. If that assumption breaks—if the cost-per-bit curve flattens or the power envelope becomes untenable—the architecture of modern computing changes with it.

This article examines three pressure points that will determine whether terabit Ethernet remains on its historical trajectory: the electrical serializer-deserializer (SerDes) lanes that form the physical foundation, the forward error correction schemes that paper over increasingly hostile channel conditions, and the co-packaged optics revolution that promises to redraw the boundary between switching silicon and photonic transport. Each represents a distinct engineering frontier, and each carries trade-offs that ripple through the entire datacenter stack.

SerDes and Lane Scaling: Pushing Electrons to Their Limits

At the lowest layer of every Ethernet link sits the SerDes—the serializer-deserializer circuit that converts parallel data from the switch ASIC into a high-speed serial stream for transmission, and reverses the process at the far end. The aggregate bandwidth of any Ethernet port is simply the product of per-lane speed and lane count. For 400GbE, the industry settled on eight lanes at 50 Gbaud using PAM4 modulation, yielding 100 Gb/s per lane. The 800GbE generation doubles per-lane throughput to 100 Gb/s via 100 Gbaud PAM4, while 1.6 TbE targets eight such lanes or, alternatively, explores higher-order modulation to push even further.

The challenge is that electrical channels degrade rapidly at these frequencies. A standard PCB trace operating at 100 Gbaud suffers insertion losses exceeding 30 dB over typical reach distances of 25 to 40 centimeters from ASIC to front-panel connector. At these loss levels, the received signal eye is functionally closed—meaning no amount of simple amplification can recover the data without sophisticated equalization. Modern SerDes employ multi-stage continuous-time linear equalizers (CTLE), decision feedback equalizers (DFE), and sometimes maximum-likelihood sequence estimation (MLSE) to reopen the eye, each stage consuming additional power and die area.

Lane count scaling offers a seemingly easier alternative—double the lanes instead of doubling the lane speed—but this approach collides with physical connector density and ASIC pin count limitations. A 1.6 TbE port using 50G lanes would require 32 differential pairs for a single port, making a 51.2 Tbps switch with 32 ports physically implausible from a packaging standpoint alone. The industry consensus, therefore, favors increasing per-lane rates, even though each generational step demands exponentially more equalization complexity.

The transition from NRZ (two-level) to PAM4 (four-level) modulation at 50G lanes already halved the signal-to-noise ratio margin by approximately 9.5 dB. Discussions around PAM6 or even PAM8 for future generations would erode margins further, potentially requiring a fundamental rethinking of the electrical channel. Some proposals advocate shortening the electrical reach to near zero—eliminating the PCB trace problem entirely by placing optics directly adjacent to the SerDes. This is not a refinement of existing architecture; it is a paradigm shift.

What makes this moment distinctive is that SerDes design is no longer purely a signaling problem. It has become an energy problem. At 100 Gbaud, a single SerDes lane consumes roughly 6 to 8 picojoules per bit. Multiply across hundreds of lanes in a modern switch ASIC, and the SerDes complex alone can account for 30 to 40 percent of total chip power. Every additional decibel of channel loss that must be equalized translates directly into watts—watts that generate heat, require cooling, and reduce the thermal headroom available for the switching logic itself.

Takeaway
Aggregate bandwidth is the product of lane speed and lane count, but both dimensions face hard physical ceilings. The real constraint isn't how fast we can toggle bits—it's how much energy each recovered bit costs.

Forward Error Correction Trade-offs: The Tax on Every Bit

When the raw bit error rate (BER) of a channel degrades beyond what equalization alone can correct, Forward Error Correction (FEC) becomes the safety net. FEC encodes redundant information into the data stream so that the receiver can detect and correct errors without retransmission. For 400GbE, the IEEE standardized Reed-Solomon RS(544,514) KP4 FEC, which adds approximately 5.8 percent overhead and can correct a pre-FEC BER of roughly 2.4×10⁻⁴ down to a post-FEC BER below 10⁻¹³. This was a carefully chosen operating point—strong enough to handle PAM4 channel impairments, lean enough to keep latency and power within budget.

At 800G and 1.6T, the channel conditions worsen materially. Higher baud rates and longer PAM4 reaches push pre-FEC BER toward 10⁻³ or worse, demanding either stronger FEC codes or concatenated coding schemes. The emerging approach for 1.6 TbE involves layered FEC architectures—an inner code at the physical medium attachment (PMA) layer combined with an outer code at the physical coding sublayer (PCS). This concatenation dramatically improves corrective power but introduces its own costs.

Latency is the most visible trade-off. KP4 FEC at 400G introduces roughly 50 to 100 nanoseconds of latency depending on implementation. Concatenated or iterative decoding schemes for terabit rates could push this toward 200 nanoseconds or more. For latency-sensitive workloads—remote direct memory access (RDMA) fabrics for AI training, for instance—this additional delay is not trivial. It inflates tail latency in collective communication patterns and can directly impact training iteration time at scale.

Power is the less visible but more consequential trade-off. FEC decoding is computationally intensive, particularly for soft-decision decoders that operate on probabilistic information rather than hard bit decisions. Estimates for next-generation concatenated FEC suggest decoding power on the order of 3 to 5 picojoules per bit—comparable to the SerDes equalization overhead itself. In a 51.2 Tbps switch processing aggregate traffic across all ports, FEC decoding alone could consume tens of watts, a significant fraction of the ASIC's total thermal design power.

There is also an architectural subtlety worth noting. Stronger FEC shifts complexity from the analog domain (better SerDes, better channels) to the digital domain (more decoding logic, more redundancy). This is sometimes framed as a clean win because digital logic scales with Moore's Law while analog does not. But that framing understates the reality: at advanced process nodes below 5 nm, digital power density is itself becoming a binding constraint. FEC is not free—it is a tax levied on every bit that traverses the link, and at terabit scales the total tax bill becomes a first-order design consideration.

Takeaway
FEC is the engineering equivalent of borrowing against future complexity to solve a present constraint. The debt compounds: stronger codes mean more latency, more power, and more silicon area—costs that must be budgeted system-wide, not just at the link layer.

Co-Packaged Optics: Redrawing the Boundary Between Electrons and Photons

The conventional datacenter architecture places pluggable optical transceivers on the front panel of a switch, connected to the ASIC via electrical traces across the PCB. This design has been remarkably successful because it decouples the switch silicon lifecycle from the optics lifecycle—operators can upgrade transceivers independently, mix vendors, and swap failed modules without replacing the entire switch. But at 800G per port and beyond, this architecture confronts a thermodynamic wall.

The problem is straightforward: driving a 100 Gbaud PAM4 signal across 20-plus centimeters of PCB trace to a front-panel cage consumes substantial SerDes power and requires aggressive equalization, as discussed earlier. Co-packaged optics (CPO) eliminates this loss-dominated electrical channel by integrating the optical engine directly onto or immediately adjacent to the switch ASIC package. The electrical path from SerDes to optical modulator shrinks to millimeters, enabling lower-swing signaling—potentially NRZ instead of PAM4 for the electrical segment—and dramatically reducing per-bit energy consumption. Estimates suggest CPO can save 30 to 50 percent of the per-port power budget compared to equivalent pluggable solutions.

The engineering challenges, however, are substantial. Thermal management becomes more complex when high-power optical components share a package substrate with a switch ASIC dissipating 500 watts or more. Laser sources are temperature-sensitive; even modest thermal gradients can shift wavelengths and degrade performance in dense wavelength-division multiplexed (DWDM) configurations. Several CPO architectures address this by placing the laser source off-package—external laser source (ELS) designs—while keeping modulators and photodetectors co-packaged. This adds architectural complexity but preserves the core benefit of short electrical reach.

Serviceability is the operational concern that most worries network operators. In a pluggable world, a failed transceiver is a five-second hot-swap operation. In a CPO world, a failed optical engine could mean replacing the entire switch ASIC assembly—an expensive and disruptive event. The industry is exploring middle-ground approaches: near-packaged optics (NPO), where optical engines sit on the same substrate but remain individually replaceable, and on-board optics (OBO), which moves transceivers onto the PCB surface near the ASIC but retains some modularity. Each approach represents a different trade-off between power efficiency and operational flexibility.

What makes CPO architecturally significant beyond the datacenter is its potential to reshape network topology itself. If per-port power drops substantially, switch ASICs can support more ports at higher speeds within the same thermal envelope, enabling flatter, higher-radix fabrics with fewer tiers. This directly reduces hop count and latency in large-scale AI and HPC clusters. The optical integration trend is not merely a packaging optimization—it is a structural enabler for the next generation of network architectures.

Takeaway
Co-packaged optics isn't just about saving power at the port level. By collapsing the boundary between switching silicon and photonic transport, it makes fundamentally different network topologies viable—topology choices that were previously locked out by thermal constraints.

The path to terabit Ethernet is not a single breakthrough but a coordinated negotiation among three tightly coupled constraints: the energy cost of recovering high-speed electrical signals, the overhead of correcting errors in increasingly hostile channels, and the thermal reality of packaging more bandwidth into fixed physical space.

Each of these pressure points is solvable in isolation. The engineering challenge—and the intellectual interest—lies in solving them simultaneously within a power envelope that remains economically viable. The next Ethernet generation will be defined less by raw speed and more by energy per bit, the metric that ultimately determines whether bandwidth growth translates into deployable infrastructure.

For researchers and architects working at this frontier, the implication is clear: the era of straightforward Ethernet scaling through higher baud rates and wider pipes is ending. What follows will require co-design across analog, digital, photonic, and packaging disciplines—a convergence that will produce something recognizably Ethernet in protocol but fundamentally different in physical form.