The Power and Water Bill Behind the AI Capex Wave

China's $295B buildout, a $5.2B lease, Microsoft's $10B in Japan — and the energy, water, and siting constraints that decide where your workloads should actually run.

The Power and Water Bill Behind the AI Capex Wave

The headline numbers in 2026 are about money, but the binding constraints are physical. In a single week in June, Bloomberg reported China preparing a roughly $295 billion (2 trillion yuan) five-year national AI data-center buildout, Applied Digital signed a $5.2 billion, 15-year lease for 210 megawatts of capacity, and Microsoft’s $10 billion Japan commitment through 2029 kept rolling out. Notice the unit on the Applied Digital deal: not square feet, not server count — megawatts. That is the tell. The AI buildout is increasingly priced, leased, and rate-limited in units of power and water, and that changes how engineering teams should think about where and how they run workloads.

Capacity is now measured in watts, not racks#

When a 210 MW lease is the headline, the industry has quietly admitted that the scarce resource is the grid connection, not the building. A single large AI campus now draws on the order of hundreds of megawatts — comparable to a small city — and the long pole is no longer pouring concrete. It is securing an interconnect agreement with a utility, which can take years.

That is the subtext of the state-directed plans. China’s blueprint, as reported, is explicitly about knitting scattered compute into a single interconnected grid operated by China Mobile and China Telecom, with domestic suppliers like Huawei targeted for at least 80% of the technology. Microsoft’s Japan plan pairs data-center capacity with SoftBank and Sakura Internet and a pledge to train over one million engineers and developers by 2030. In both cases the state or hyperscaler is buying three things at once: silicon, power, and people. As a workload owner you don’t control the first and third. You absolutely control how efficiently you consume the second.

PUE: the multiplier on every watt you draw#

Power Usage Effectiveness (PUE) is the ratio of total facility energy to the energy that actually reaches your compute. A PUE of 1.5 means every kilowatt of GPU draws another half-kilowatt of cooling, power conversion, and overhead. Hyperscale facilities push toward 1.1–1.2; older enterprise rooms sit at 1.6 or worse. You rarely get to pick the facility’s PUE, but you should know it, because it silently multiplies your carbon and your bill. A model that costs X in GPU-hours costs X × PUE in grid energy — and PUE is the part you inherit from your region and provider choice, not your code.

There is a second-order risk hiding in the lease structures, too. Applied Digital’s deal is a take-or-pay contract: the tenant pays for the 210 MW whether or not it uses them. That economics flows downhill. When a hyperscaler has committed to pay for a fixed block of power for fifteen years, idle capacity is pure loss, which is exactly why utilization discipline on your workloads is not just a sustainability nicety — it is what keeps a multibillion-dollar commitment from becoming stranded watts. The efficiency levers later in this post are the demand-side answer to a supply side that is now locked in for a decade or more.

Water is the constraint nobody put in the budget#

Cooling hundreds of megawatts of silicon means rejecting heat, and the cheapest way to reject heat is to evaporate water. That is where the AI buildout collides with hydrology. Reporting in 2026 found that about two-thirds of 809 planned or in-development US data centers sit in drought-affected areas — southern Arizona, the Colorado River Basin, Texas. A large facility can consume up to roughly 5 million gallons of water a day, comparable to a town of 50,000 people, and a UN University study projects AI-related water use rising toward the basic annual needs of over a billion people by the end of the decade.

The metric here is Water Usage Effectiveness (WUE) — liters of water consumed per kilowatt-hour of IT load. And it trades directly against PUE:

  • Evaporative cooling uses little electricity but a lot of water — low PUE, high WUE.
  • Air cooling uses no water on-site but more electricity — higher PUE, near-zero direct WUE.

There is no free lunch, only a choice about which scarce resource you spend. Worse, air cooling’s extra electricity has its own indirect water cost: thermoelectric power plants evaporate water to make steam, so a facility that brags about zero on-site water can still drive heavy water use upstream at the power plant. Microsoft’s zero-water cooling design, introduced in 2024, eliminates evaporative loss but accepts a PUE penalty that has to be engineered back down. The honest framing for any team: you are not minimizing “footprint,” you are allocating between watts and water given where your facility sits.

What an engineering team actually controls#

You can’t move the Colorado River and you can’t conjure a grid interconnect. But the demand side — how much compute your workloads pull, and when and where they pull it — is squarely yours. Treat efficiency as a first-class cost-and-sustainability lever, not an afterthought.

Region selection is a sustainability decision#

The same training job has a different carbon and water footprint depending on which region you launch it in, because grid mix and climate differ. A region running on hydro or nuclear at a cool latitude is a fundamentally different machine from one on a gas-heavy grid in a hot, dry basin — even at identical PUE. When you pick a cloud region, you are implicitly picking a grid carbon intensity and a water-stress profile. Make that explicit. Most major clouds now publish per-region carbon data; cross-reference it against water-stress maps before you pin a workload.

Carbon-aware and time-shifted scheduling#

Grid carbon intensity swings hour to hour as wind and solar come on and off. Carbon-aware scheduling exploits that: defer flexible, non-urgent batch work — nightly retraining, large embeddings backfills, evaluation sweeps — to the hours or regions where the grid is cleanest. The pattern is mature enough to be boring:

  • Treat training and batch inference as interruptible and deferrable wherever the business allows.
  • Pull a live carbon-intensity signal for your region and gate job submission on it.
  • Co-locate flexible work with renewable availability instead of running everything at a flat 100% around the clock.

Interactive, latency-bound inference can’t time-shift — but a large fraction of an AI platform’s compute is batch, and batch is exactly what bends to scheduling.

Efficiency: the cheapest megawatt is the one you don’t draw#

Before optimizing where compute runs, cut how much you need. These are the levers with the best ratio of effort to saved watts:

  • Right-sizing. The most common waste we see in client fleets is over-provisioned GPUs running models that fit comfortably on smaller, cheaper, lower-power parts. Match the accelerator to the model’s actual memory and latency profile, not to a benchmark.
  • Quantization. Serving in int8 or fp8 instead of fp16 can roughly halve memory and energy per token with negligible quality loss on many models. This is one of the highest-leverage sustainability moves available, and it ships as a serving-config change.
  • Batching. Continuous batching keeps accelerators near full utilization instead of idling between requests. Idle GPUs still draw power and still inflate your effective PUE; the fix is throughput discipline, not more hardware.
  • Caching and distillation. Cache deterministic results, reuse embeddings, and distill a fine-tuned small model where a frontier model is overkill. Every request you don’t send to a giant model is energy and water you never spend.

None of these are exotic. They are the same moves that cut your cloud bill — which is the point. In an AI platform, the cost curve and the carbon-and-water curve are nearly the same curve, so an efficiency program pays for itself twice.

How we frame siting and scheduling in client work#

When we scope an AI platform for a client, we treat power and water as design inputs from the first whiteboard, alongside latency and data residency. A workload with hard residency requirements — say a clinical model that must stay in one jurisdiction — has limited regional freedom, so efficiency and cooling choice carry the whole load. A globally distributed, latency-tolerant batch pipeline has the opposite profile: enormous freedom to chase clean grids and cool climates, and it should. The interesting question is rarely “which cloud is cheapest per GPU-hour.” It’s “given my residency and latency constraints, where does this workload do the least damage to a grid and a watershed for the same result?”

The capex headlines will keep getting bigger — state-scale plans, multibillion-dollar leases, single-country commitments measured in tens of billions. But the constraint underneath them is shifting from capital to megawatts and gallons, and those are physical, local, and slow to build. The teams that come out ahead won’t be the ones who provisioned the most compute. They’ll be the ones who treated every watt and every liter as something to be earned by their code — and built region selection, carbon-aware scheduling, and ruthless efficiency into the platform before the utility bill, or the drought, forced the issue.