AI Infrastructure 2026: Blackwell, Neoclouds, Stargate, Power

When OpenAI, Oracle, and SoftBank announced five new Stargate sites in September 2025, the combined Stargate plus partner pipeline was already at nearly 7 gigawatts of planned capacity and over $400 billion of investment over the following three years — closing in on the original $500 billion, 10-gigawatt commitment announced in January 2025. That one paragraph captures most of what is interesting about AI infrastructure right now. It is not really a chip story any more. It is a power story, a real-estate story, a permitting story, and a financing story, with the chips as a constraint that everyone is already pricing in.

This is the practical state of AI infrastructure heading into the back half of 2026 — the silicon, the racks, the neoclouds, the energy stack underneath, and where the on-prem rebuild fits in.

Blackwell shipping reality and the Rubin step-up#

Blackwell finally arrived at volume through 2025. HPE shipped its first Grace Blackwell systems in February 2025. GB200 NVL72 racks began landing at Microsoft, Oracle, AWS, and Meta in late 2024 and early 2025, with mass production ramping across Q2 and Q3. Analyst forecasts for 2025 cabinet shipments were cut roughly in half during the year — from a 50,000-80,000 unit range down to 25,000-35,000 units — which the bears read as a stumble and the bulls read as a supply-chain ceiling rather than a demand problem.

What actually ships in a Blackwell rack#

The GB200 NVL72 is the rack that matters. Seventy-two B200 GPUs paired with 36 Grace CPUs, 18 compute blades, nine NVSwitch blades, all liquid-cooled and presented to software as a single logical accelerator over an NVLink fabric. Per-rack power lands at roughly 132 kilowatts, which is the number that has reshaped everything downstream. Most existing colocation floors were designed for 10-20 kilowatts per rack. The Blackwell rollout therefore is not a “swap-the-cards” upgrade — it is a facilities rebuild.

Vera Rubin in production at CES 2026#

At CES 2026, Jensen Huang confirmed that Vera Rubin NVL72 is already in full production. The Rubin platform pairs an Arm-based Vera CPU with a Rubin GPU and targets roughly 5x inference and 3.5x training gains over Blackwell on Nvidia’s published numbers. The NVL72 configuration carries 72 Rubin GPUs, 20.7 TB of HBM4, 36 Vera CPUs, and 54 TB of LPDDR5x. The whole rack is 100% liquid-cooled, and Nvidia claims installation time drops from two hours on Blackwell to roughly five minutes on Rubin. General availability sits in the second half of 2026.

The practical read is that anyone signing a Blackwell purchase order in mid-2026 is buying compute that will share datacenter floors with a meaningfully faster successor within twelve months. That is not a reason to wait — Rubin supply will be hyperscaler-priority for at least a year — but it is a reason to be deliberate about depreciation schedules.

The AMD and accelerator alternative#

The other side of the silicon market is no longer empty. AMD’s MI300X gained real workload share at Microsoft, Meta, and Oracle through 2024 and 2025, the MI325X followed in late 2025, and the MI350X CDNA-4-based generation began shipping in early 2026. OpenAI publicly committed to MI300X capacity on Azure for inference workloads, which gave the rest of the market cover to evaluate the chip seriously rather than treating it as a hobby procurement.

Specialty silicon also moved past the “interesting demo” phase. Cerebras WSE-3 wafer-scale systems landed in production inference deployments, particularly for long-context workloads where the on-wafer memory bandwidth dominates. Groq’s LPU inference accelerators built a real business around the latency-per-token argument for chat and agent workloads. SambaNova continued to win specific enterprise inference contracts. None of these displaces Nvidia at the training tier, but the inference tier is genuinely contested for the first time since the AI buildout began.

AI datacenter rack infrastructure 2026

The fab and supply-chain layer#

TSMC’s N2 2nm ramp through 2025 and into 2026 is the upstream constraint that everyone in the stack ultimately depends on. Rubin sits on N3 variants while early N2 capacity gets allocated to mobile and the very highest-margin parts. The Arizona Fab 21 site has been progressing through phased volume — the first fab is in production on more mature nodes, the second fab has been brought forward, and the third fab now sits in the planning queue with state and federal incentives attached.

The German ESMC Dresden plant — TSMC’s joint venture with Bosch, Infineon, and NXP — broke ground in 2024 and is targeting first production in the late-2027 window on 28nm, 22nm, and FinFET nodes, which is automotive and industrial rather than AI silicon. The European AI training pipeline still depends on Taiwan and Arizona output for the foreseeable future.

The neocloud thesis after the CoreWeave IPO#

CoreWeave priced its IPO on 27 March 2025 at $40 a share — below the indicated $47-$55 range — raising roughly $1.5 billion and putting an initial market value near $23 billion on the company. The debut closed flat at the IPO price, then rallied sharply in the days that followed. The public market quickly absorbed what the private market had already been pricing in: that there is a distinct asset class of GPU-cloud providers whose entire facility footprint was built specifically for the Blackwell power envelope, and that this asset class trades differently from hyperscaler stock.

The wider neocloud field through 2026 includes Lambda Labs, RunPod, Vast.ai, Together AI, Crusoe Energy, Nebius, Genesis Cloud, and a long tail of smaller specialists. The internal differentiation matters more than the category label.

CoreWeave sits at the enterprise end with multi-year contracts, Nvidia as both supplier and investor, and the deepest GB200 NVL72 footprint outside the hyperscalers.
Lambda Labs runs a research-first workflow with reserved and on-demand pricing and has carved out a real position for teams that can tolerate capacity swings in exchange for lower friction.
RunPod is the cheapest self-serve option for spot and serverless GPU work — the right pick for fine-tuning experiments and bursty inference.
Vast.ai is the marketplace model, pooling supply from many providers, useful for cost-sensitive experimentation and offline jobs.
Together AI focused on the inference and fine-tuning API surface rather than raw rentals, competing on price per token.
Crusoe Energy is the energy-arbitrage play — building datacenters next to stranded gas and renewable sites where power is structurally cheap.
Nebius spun out of Yandex’s European operations and built a Helsinki-anchored AI-cloud footprint with a different regulatory posture.

The spot-versus-reserved economics tell the rest of the story. Spot H100 capacity through 2025 and into 2026 has at times traded under $2 per GPU-hour on marketplace platforms, while reserved B200 capacity has held in the $4-$6 range depending on commitment length and region. The arbitrage opportunity sits in matching workload class to capacity class — long-running training jobs to reserved, batch and exploratory work to spot, latency-sensitive inference to dedicated.

Stargate and the hyperscaler buildout#

The Stargate Project announced in January 2025 is structured as a separate company. SoftBank and OpenAI lead the partnership — SoftBank carrying financial responsibility, OpenAI carrying operational responsibility, with Oracle and MGX as additional initial equity funders. The September 2025 expansion added five US sites: Shackelford County, Texas; Doña Ana County, New Mexico; Lordstown, Ohio; Milam County, Texas; and a Wisconsin Midwest site developed by Oracle in partnership with Vantage. The flagship Abilene, Texas, datacenter became the first operational Stargate site in September 2025.

Outside Stargate the hyperscaler buildout continues at scale. Microsoft and Anthropic’s deeper partnership through 2025 reshaped the inference stack — Amazon committed over $4 billion to Anthropic, and the resulting Trainium-and-Inferentia-plus-Nvidia inference footprint on AWS is now the largest non-Microsoft Anthropic deployment. Meta continued its in-house Grand Teton successor builds. Google’s TPU v6 and v7 work moved past the Pathways research phase into mainline Gemini training.

The power stack underneath#

The Three Mile Island Unit 1 restart — rebranded as the Crane Clean Energy Center — is the most-cited piece of the new AI power story. Constellation Energy and Microsoft announced the 20-year power purchase agreement in September 2024, covering roughly 835 megawatts of carbon-free baseload power delivered to the PJM Interconnection. The facility had shut down in 2019 for financial reasons, and the revival is backed by a $1.6 billion project budget and a $1 billion Department of Energy loan, with operations now targeted for 2027 — a year ahead of the original 2028 plan.

That deal opened the door to a cascade of other agreements. Google signed a power purchase agreement with Kairos Power for small modular reactor capacity targeting later in the decade. Amazon entered the 960-megawatt arrangement with Talen Energy tied to the Susquehanna nuclear plant. Oracle, Meta, and others have publicly signaled smaller SMR and behind-the-meter PPAs that have not all closed yet.

The constraint that all of this is responding to is grid interconnection. ERCOT in Texas has emerged as both the most welcoming and the most strained AI-datacenter grid in North America — Stargate’s Texas concentration is rational because permitting moves faster there, and constrained because the grid headroom is already a planning concern. The PJM Interconnection, MISO, and the Western Interconnection all have multi-year interconnection queues that mean a new datacenter site is not really a 2027 project — it is a 2029 or 2030 project unless an existing generation source can be redirected.

The on-prem comeback#

The most interesting infrastructure trend of 2026 is not the megasite buildout — it is the on-prem rebuild. Dell and Nvidia’s AI Factory reference architecture turned the GB200 NVL72 deployment into something a regulated enterprise can buy and rack in its own facility. HPE’s Cray-derived AI factory line did the same with a tighter coupling to Slingshot networking. Nvidia’s own DGX SuperPOD product is now an off-the-shelf hyperscale-class cluster that ships into customer datacenters.

The driver is regulation and data gravity. Banks, hospitals, defense contractors, and large public-sector buyers have data they cannot move, latency requirements they cannot meet across a public-cloud network egress, and audit demands that make a tenant-of-tenant deployment harder than a wholly-owned one. The hyperscalers built dedicated-region products to meet some of this. The on-prem AI factory pattern meets the rest.

Where pdpspectra fits#

This is where we spend most of our time. We help teams pick the right slice of the stack — neocloud reserved capacity for a training run, on-prem GB200 for a regulated inference deployment, hyperscaler for elastic batch — and then operate it end-to-end. The work spans network design, liquid-cooling readiness reviews, scheduler and queue design, MLOps on top of the raw compute, and the FinOps discipline that keeps a $4-per-GPU-hour reserved cluster from quietly costing $9 per GPU-hour because of idle nodes. If you are sizing a 2027 AI infrastructure plan and want a partner who has seen the GB200 procurement curve from the customer side, our cloud infrastructure practice is the right starting point.

Nvidia Blackwell Shipping in 2026 — the rack-level architecture deep dive
GPU Rental Economics: CoreWeave, Lambda, and the Field — the neocloud cost picture
Datacenter Power Constraints in 2026 — the grid and PPA story

What the next twelve months look like#

The pattern through the rest of 2026 is reasonably clear. Blackwell volume continues to land, Rubin starts shipping in the second half, the neoclouds keep eating market share at the inference tier while the hyperscalers hold the training tier, the on-prem AI factory pattern keeps expanding into regulated industries, and the power-and-permitting layer continues to be the binding constraint. The companies winning are the ones treating infrastructure planning as a multi-year capital project rather than a procurement line item.

If your 2027 plan touches AI compute at any meaningful scale — training, inference, or both — we would like to help you avoid the predictable mistakes. Reach out through our contact page and we will scope a working session against the specific workloads you are sizing for.