Co-Packaged Optics: Untangling the AI Network

Why copper is out of room in GPU clusters, and how co-packaged optics moves light onto the switch package — power and reach tradeoffs, grounded.

Co-Packaged Optics: Untangling the AI Network

A modern AI cluster is, at the physical layer, a wiring problem. Tens of thousands of GPUs only matter if they can talk to each other fast enough to act like one machine — and the thing standing between us and that is no longer the silicon. It’s the copper and the optics that connect it. The interconnect is now the bottleneck and the power hog, and the industry’s answer for the next decade is to move light onto the switch package itself.

This is a grounded look at co-packaged optics — what it is, the physics forcing the change, the real efforts from NVIDIA and Broadcom, and the tradeoffs that decide where copper survives and where light has to take over.

The problem: copper is out of road#

Every bit that leaves a switch chip today travels as an electrical signal down a copper trace on a circuit board to a pluggable optical transceiver at the front panel, where it’s converted to light. That architecture worked for years. It’s now breaking on two fronts at once.

Signal integrity. As per-lane rates climbed to 224 Gb/s PAM4 and beyond, copper stopped cooperating. A trace of roughly ten centimeters from the ASIC to a front-panel module becomes a hostile channel at those speeds, demanding retimers, exotic PCB materials, and elaborate equalization just to recover the signal. Push the rate higher and the reach collapses further. Copper has a hard physical ceiling and we’re at it.

Power. This is the one that’s actually forcing the decision. Driving high-speed electrical signals across inches of board to reach pluggables burns serious energy — on the order of 10 to 15 picojoules per bit or more — and SerDes power scales superlinearly with both data rate and trace length. At cluster scale, the network’s optics start consuming a meaningful fraction of total facility power. In a world where every watt is contested between compute and cooling, spending it on shoving electrons down copper to a faceplate is increasingly indefensible.

Macro view of a switch ASIC ringed by integrated optical engines and fiber pigtails

The idea: bring the laser to the chip#

Co-packaged optics (CPO) does the obvious thing once you accept copper’s limits: stop driving long electrical traces. Instead of converting electrical-to-optical at the front panel, CPO integrates the optical engines into the same package as the switch ASIC. The electrical path shrinks from ten centimeters to millimeters; the long-haul journey happens in fiber, where it’s cheap, where it scales, and where reach isn’t a problem.

Shorten the electrical path and the superlinear SerDes power penalty largely evaporates. That’s the whole pitch: CPO is a power-efficiency play first and a bandwidth-scaling play second. You move the optical conversion next to the silicon, you stop paying to push high-rate signals across the board, and you get back the watts — and the faceplate density — that pluggables were costing you.

The technology underneath is silicon photonics: building optical components — modulators, waveguides, photodetectors — in a process compatible with semiconductor manufacturing. Notably, both major players are building their CPO on TSMC’s COUPE silicon photonics platform, which tells you the supply chain is consolidating around a shared foundry approach rather than fragmenting.

What’s actually shipping#

This is no longer a research slide. Two companies dominate the switching-ASIC market and are dragging CPO into production (IDTechEx’s read of the race).

NVIDIA#

NVIDIA has put CPO at the center of its networking roadmap under the silicon-photonics banner. Its Quantum-X InfiniBand switches are slated for early 2026, with Spectrum-X Ethernet photonics switches following in the second half of the year (NVIDIA’s silicon photonics page). NVIDIA quotes the Spectrum-X Ethernet photonics platform at up to 409.6 Tb/s of switch bandwidth, aimed squarely at million-GPU clusters.

The claims NVIDIA attaches are vendor claims — read them as direction, not measurement: it cites roughly 3.5x better power efficiency and 10x better resiliency versus the pluggable approach, the resiliency gain coming from eliminating the failure-prone pluggable connectors and their alignment. The resiliency angle is underrated: at hyperscale, a 10x reduction in optical-link failures is arguably worth more than the power savings, because a flapping link in a synchronous training job stalls thousands of GPUs at once.

Broadcom#

Broadcom has been working CPO since 2021 — longer than most — and is now shipping its third-generation scale-out product, the Tomahawk 6 “Davisson” switch, which it positions at roughly 3.5x better power efficiency than pluggables. Broadcom’s strategy leans on its enormous merchant-switch footprint: it sells silicon to everyone building their own networks, so its CPO reaches the hyperscalers who design their own fabrics rather than buying NVIDIA’s end to end.

The strategic split is worth noting. NVIDIA sells CPO as part of a co-designed, vertically integrated AI factory. Broadcom sells it as merchant silicon to teams who want to build their own. Both bets are credible. They serve different buyers.

Dense bundles of single-mode fiber converging into a switch faceplate

The tradeoffs nobody should skip#

CPO is not a free win, and the honest engineering view holds the costs in frame alongside the benefits.

Copper isn’t dead — it’s specialized#

The most important nuance: copper still wins at very short reach. Inside a rack, for scale-up links between GPUs over a meter or two, copper is cheaper, simpler, and lower-power than optics — which is exactly why NVIDIA’s own rack-scale NVLink spine stays on copper where it can. The future isn’t “optics everywhere.” It’s a clean division of labor: copper for the shortest scale-up hops, optics for everything that has to reach across the row and the hall. Knowing where that line sits for your topology is the actual design decision.

Serviceability gets harder#

A pluggable transceiver that fails gets swapped in two minutes by a technician. An optical engine co-packaged with a multi-thousand-dollar switch ASIC does not. Integrate the optics and you’ve coupled the failure domains — a dead laser can mean replacing the whole package. This is the legitimate operational objection to CPO, and it’s why early deployments lean on redundancy and on detachable fiber connectors that keep some field-serviceability. The power math favors CPO; the maintenance math fights it, and that tension is unresolved.

It’s early#

Early commercial deployments through 2026 and 2027 will be the first large-scale validation of the vendor claims. Until independent operators report real reliability and power numbers from production fleets, the 3.5x figures are projections measured by the people selling the product. Plan accordingly.

What this means if you operate clusters, not fabs#

Most teams reading this rent capacity rather than design switches. The architectural shift still reaches you.

  1. The network is a power line item now. When you model the cost of a training or large-scale inference build, interconnect power is no longer a rounding error. CPO-based fabrics change the facility math, and that flows into what your provider charges.
  2. Topology is where copper-versus-optics gets decided. For a Data Platforms or Operational Automation workload that scales across many nodes, how the fabric is built — where it stays copper, where it goes optical — sets your real bandwidth and your blast radius when a link fails. Ask your provider how their scale-up and scale-out fabrics are wired, not just how many GPUs you get.
  3. Resiliency is a performance feature. In synchronous workloads, one flapping optical link stalls the whole job. The resiliency improvements CPO promises matter to throughput, not just uptime. Weigh them as such.

Co-packaged optics is the clearest case in current hardware of physics dictating architecture. Copper ran out of room, the power bill came due, and the only answer was to move light onto the package. It’s early, the serviceability question is real, and the vendor numbers want independent confirmation. But the direction is not in doubt: the AI network is going optical, and the only question is how fast the rest of the stack catches up.


Scaling a GPU fabric and trying to figure out where copper ends and light begins? Talk to pdpspectra about designing AI cluster networks that don’t melt the power budget.