Liquid Cooling the AI Datacenter

A standard data-center rack was engineered around a simple assumption: a few kilowatts of heat, carried away by moving air. That assumption held for thirty years and is now broken. An NVIDIA GB200 NVL72 rack pulls roughly 120 to 132 kW. Air cooling tops out somewhere around 8 to 25 kW per rack even with aggressive fans and containment. The AI rack is five to fifteen times past the physical ceiling of air. You do not optimize your way out of that gap. You change the working fluid.

This is a practical guide to how that change actually gets engineered — the thresholds, the two main approaches, the plumbing that connects them to the building, and what a retrofit really involves.

Why air gives out#

Air is a poor heat-transport medium. Its volumetric heat capacity is roughly four orders of magnitude below water’s. To remove more heat with air you move more air — faster fans, taller heat sinks, wider aisles — and every one of those has a hard limit. Fan power scales with the cube of airflow, so pushing more CFM burns disproportionately more energy until the fans are eating the efficiency you were trying to save. Heat-sink fins hit a point of diminishing return. And the chips themselves now concentrate heat into flux densities that no realistic air velocity can pull off the package fast enough.

The practical thresholds are well established. Below roughly 25 kW a rack, well-designed air still works. Between 40 and 60 kW, cooling feasibility starts to dominate site selection — it constrains where you can build before anything else does. Past 80 to 100 kW, liquid is not an optimization, it is mandatory. A single GB200 rack lands well beyond that line, which is why every serious AI build from 2025 onward is liquid by default. CoreWeave, for instance, designs all new facilities for liquid, supporting rack densities up to 130 kW.

Direct-to-chip cold plates#

The mainstream answer for 2026 is single-phase direct-to-chip. A cold plate — a machined metal block with internal microchannels — bolts directly onto each GPU and CPU. Coolant, usually a treated water-glycol mix, flows through the plate, absorbs heat at the source, and carries it away. The air in the room now only has to handle the residual 10 to 20% from memory, NICs, and power delivery; the hot silicon is handled by liquid before the heat ever reaches the air.

Direct-to-chip won the near term because it is the least disruptive path to high density. It keeps boards serviceable, keeps the familiar rack form factor, and reuses most of the data-hall layout. Schneider Electric and others call it the practical, scalable default for the bulk of high-density deployments, and the industry timeline has it becoming standard through 2025–2026. The engineering catch is the plumbing discipline: hundreds of quick-disconnect couplings, leak detection, flow balancing across manifolds, and water near live electronics. Done sloppily, you trade a thermal problem for a reliability problem.

There’s a design parameter most teams under-appreciate: coolant inlet temperature. The warmer you can run the supply coolant while still keeping silicon in spec, the more of the year you can reject heat with dry coolers instead of energy-hungry chillers — sometimes eliminating mechanical cooling entirely in a cool climate. Higher inlet temperature is counterintuitive but it is where a lot of the efficiency win actually lives. It also tightens the engineering tolerance: less thermal margin between coolant and junction temperature means flow rate, plate cleanliness, and pump reliability all matter more, not less. That trade — warmer coolant for free cooling, against thinner margin — is one of the first decisions to settle on any high-density design, and it ripples through pump sizing, plate selection, and the facility loop downstream.

Coolant distribution unit with quick-disconnect couplings and supply and return piping

Immersion cooling#

Immersion takes the obvious next step: drop the whole board into a tank of dielectric fluid that won’t conduct electricity, and let the liquid contact every component directly. Single-phase immersion circulates the fluid to a heat exchanger. Two-phase immersion uses a fluid engineered to boil at the chip surface, carrying heat away as latent heat of vaporization — extremely effective, and operationally fussier.

Immersion gives the highest density and removes fans entirely, which is why hyperscalers keep piloting it where maximum density and carbon efficiency outweigh the operational headache. But it is a bigger break from how data centers are built and serviced. You can’t slide a board out and swap a DIMM in thirty seconds; you’re lifting dripping hardware out of a tank. Fluid compatibility, serviceability, and — for two-phase — regulatory scrutiny of the working fluids all slow adoption. The realistic read for 2026: cold plate is the workhorse, immersion grows in the background and reaches the mainstream nearer 2027.

It is worth being clear that these aren’t mutually exclusive. Real high-density builds are hybrids: cold plates on the hottest silicon, residual air handling for the rest, and in some sites immersion tanks for specific workloads. The wrong question is “cold plate or immersion.” The right one is “what fraction of this rack’s heat goes to liquid, at what coolant temperature, rejected by what mechanism” — and that answer is per-site, set by climate, power availability, and how long the facility has to last.

The CDU and the facility loop#

Whichever method touches the chip, the heat still has to leave the building, and that is where the Coolant Distribution Unit earns its place. The CDU is the boundary between two worlds. On one side is the technology loop — the clean, controlled, treated coolant that runs to the cold plates. On the other is the facility loop — the building’s water, going out to dry coolers, cooling towers, or a chiller plant.

The CDU’s job is to keep those two loops separate while transferring heat across a plate heat exchanger, and to control temperature, pressure, and flow on the technology side. Keeping the loops isolated is not a nicety — it protects sub-millimeter cold-plate channels from the particulates and chemistry of raw facility water that would foul them in weeks. For a GB200 NVL72, CDU capacity sits around 150 to 200 kW with flow on the order of 750 to 800 liters per minute and real headroom for balancing. Vendors like Vertiv have co-developed full power-and-cooling reference designs with NVIDIA precisely so operators aren’t engineering this loop from scratch.

There is a water story here too. Done well, liquid cooling can dramatically cut water consumption versus evaporative air-cooling approaches — NVIDIA has claimed large water-efficiency gains for closed-loop Blackwell designs — but the figure depends heavily on whether heat is ultimately rejected by dry coolers or by evaporation. Don’t quote a vendor’s best-case number as your site’s number.

The CDU is also a single point of failure you have to design around. A direct-to-chip loop that loses circulation gives you seconds, not minutes, before silicon throttles or trips — there is far less thermal mass buffering a cold plate than there was buffering a room full of air. That changes the redundancy math. Operators run CDUs in N+1 configurations, dual pumps, and redundant power, and they instrument the loop heavily, because the cost of getting it wrong is no longer a warm aisle but a hard shutdown of a 120 kW rack mid-training-run.

Immersion-cooling tank with server boards submerged in clear dielectric fluid

The retrofit reality#

Greenfield AI halls are designed liquid-first. The hard problem is the existing room. Retrofitting an air-cooled facility for 100 kW-plus racks is not a cooling upgrade; it is a civil and mechanical project. You need floor loading for filled tanks or heavy manifolds, pipework for the technology and facility loops, CDU floor space, pumps, leak detection wired into building management, and enough heat-rejection capacity outside to dump the new load. Many older sites simply can’t get enough power or water to the building to matter, and the cooling retrofit stalls behind the electrical one.

This is where the discipline of Operational Automation stops being a buzzword. At these power densities a slow leak, a fouled exchanger, or a drifting flow rate is a six-figure outage. The CDUs, leak sensors, flow meters, and inlet-temperature telemetry have to feed a control and alerting layer that catches drift before it becomes downtime — the same instrumentation-and-response thinking we bring to a Data Platform or a Hospital Management System, applied to pumps and plumbing. Cooling at this scale is a real-time control problem, and the operators who treat it that way are the ones who keep their fleets up.

The summary is unglamorous and firm: above ~25 kW a rack you are committed to liquid, the cold plate is the 2026 default, immersion is the higher-ceiling option still maturing, and the CDU-plus-facility loop is the unsung part that most retrofits underestimate. Plan the plumbing and the water before you plan the GPUs.

Sizing cooling for a high-density AI build or a retrofit? We design the thermal, power, and telemetry envelope end to end. Talk to our infrastructure team.

Why air gives out#

Direct-to-chip cold plates#

Immersion cooling#

The CDU and the facility loop#

The retrofit reality#

Related posts.

The Highest-ROI AI in a Hospital Is in the Billing Office

The Model Migration Runbook: Swapping the LLM Under a Production System

Agents in Slack: An Engineer's Read on Claude Tag