GPU Rental in 2026: CoreWeave, Lambda, RunPod, and the Neocloud Math

The GPU rental market matured in 2025. Honest economics on CoreWeave, Lambda, RunPod, Vast, Together, Crusoe — and when the neocloud thesis beats AWS p5/p6.

GPU Rental in 2026: CoreWeave, Lambda, RunPod, and the Neocloud Math

The GPU rental market that existed in 2023 was an exotic side bet. By mid-2026 it is a real second tier of compute infrastructure that several Fortune 500 AI teams and most well-funded model labs route meaningful workloads through. CoreWeave’s March 2025 IPO at a roughly 23 billion dollar valuation stamped the category. Lambda Labs raised on a multi-billion valuation. RunPod, Vast.ai, Together AI, Fireworks, Crusoe Energy, Genesis Cloud, Nebius, Voltage Park and a long tail of regional neoclouds are now in serious procurement conversations alongside AWS, GCP, and Azure for training and inference workloads.

We have placed real training and inference workloads on most of these in the last twelve months. Here is what actually matters when the term sheets are gone.

GPU rental market

The neocloud thesis in one paragraph#

Hyperscalers are vertically integrated. They sell you GPUs bundled with their networking, storage, IAM, observability, marketplace and a 30 to 50 percent gross-margin premium that funds the rest of the business. Neoclouds strip that down to the metal: NVIDIA reference designs, InfiniBand fabric, an object store, basic Kubernetes, and a sales team that returns your call the same day. The thesis is that for compute-bound training and high-throughput inference, the bundle is the cost, and unbundling delivers 30 to 60 percent savings. It works when your workload is GPU-dominated and self-contained. It breaks down when you need the rest of the cloud sitting next to it.

The price curve for H100, H200, B100, B200#

Rough on-demand market pricing as of late May 2026 for a single GPU-hour, US regions:

GPUAWS p5/p6 listCoreWeave on-demandLambda on-demandRunPod CommunityTogether / Fireworks burst
H100 80GB SXM4.10 to 4.902.20 to 2.792.49 to 2.991.79 to 2.492.10 to 2.79
H200 141GB SXM5.49 to 6.302.99 to 3.793.29 to 3.992.79 to 3.493.10 to 3.79
B100 (Blackwell)6.50 to 7.50 (limited)3.99 to 4.994.29 to 4.993.49 to 4.49not standard
B200 (Blackwell)8.50 to 9.50 (preview/limited)4.79 to 5.794.99 to 5.993.99 to 4.994.49 to 5.49

Three honest caveats on this table. First, AWS p5 and p6 list pricing is the on-demand sticker — almost nobody on serious volume pays it. EDP and reserved discounts pull effective rates down 30 to 55 percent. Second, neocloud headline rates almost always assume a multi-month reservation or a credit-card on-demand pool that runs out of supply quickly. Third, B100 and B200 supply through every channel was still rationed through Q1 2026, so quoted rates are softer than they look.

The honest takeaway: for a one-year H100 commit, the neocloud rate is in the 1.60 to 2.10 range; the hyperscaler reserved rate is in the 2.40 to 3.20 range. That gap funds the neocloud thesis.

The spot versus reserved curve#

For training, the curve looks something like this in 2026:

  • Spot or Community tier H100s on RunPod, Vast.ai, or Crusoe: 0.99 to 1.79 per hour, interruption rates of 5 to 20 percent per day depending on region and time of day.
  • One-month reserved H100 at CoreWeave or Lambda: 1.99 to 2.49 per hour, no interruption, guaranteed InfiniBand topology.
  • Six-to-twelve-month reserved H100 at a neocloud: 1.49 to 1.99 per hour, the contract increasingly includes minimum take-or-pay terms.
  • Three-year AWS Savings Plan against p5: effective 2.40 to 2.80 per hour, depending on commitment shape.

The right shape depends on how interruption-tolerant the workload is. Training that checkpoints every 5 to 15 minutes can absorb spot. Long-context fine-tuning runs without checkpointing cannot. Production inference effectively cannot.

Who is actually good at what#

CoreWeave. Post-IPO, the most enterprise-credible of the neoclouds. Strong InfiniBand topology, multiple US and EU regions, a real Kubernetes story, contract terms an enterprise procurement team will sign without rewriting. Where it wins: training clusters of 256 to 4,096 GPUs where fabric quality matters and you want a single throat to choke. Where it disappoints: the price advantage versus a deeply discounted hyperscaler EDP shrinks at the very top of the volume curve.

Lambda Labs. The original developer-friendly GPU cloud. Reservations are clean, the on-demand pool is smaller, the API and CLI are pleasant. Where it wins: research teams and mid-sized labs that want H100s today, not in eight weeks. Where it disappoints: support tier is thinner for incident response than CoreWeave at the enterprise level.

RunPod. Strongest community-tier story. Community Cloud is effectively a spot market across third-party datacenters; Secure Cloud is a more conventional managed offering. Where it wins: independent developers, small teams, inference workloads tolerant of variable host quality. Where it disappoints: not the fit for a regulated workload that needs a single tenancy story.

Vast.ai. Genuine spot market across thousands of providers globally. Cheapest H100 hours you will find, often by a wide margin. Where it wins: fine-tuning, batch inference, experimentation. Where it disappoints: enterprise security review will not pass this without serious effort.

Together AI and Fireworks. Compute platforms wrapped around managed inference. You usually do not rent raw GPUs — you rent serverless inference or dedicated endpoints. Where they win: production inference for OSS models without operating vLLM yourself. Where they disappoint: not the right shape for training.

Crusoe Energy. GPU clouds powered by stranded gas and increasingly by renewable sites. Strong sustainability story, real datacenter footprint, growing enterprise traction. Where it wins: large training reservations where the energy story matters to the buyer.

Genesis Cloud and Nebius. European-anchored options. Where they win: EU data-residency workloads where data egress and jurisdictional control matter. Nebius in particular has scaled meaningfully post the Yandex restructuring.

When the neocloud thesis breaks#

It breaks when your workload needs the rest of the cloud sitting next to the GPU.

If your training data lives in S3 with cross-account roles, your feature store is on Redshift, your model registry is in SageMaker, and your serving fleet is on EKS — the egress, the IAM bridging, the data engineering work to move the dataset to a neocloud and back can erase the savings on a single multi-week training run. For workloads that fit cleanly inside one neocloud’s storage and network — pull dataset once, train for weeks, push weights back out — the math is excellent. For workloads tightly coupled to hyperscaler-native data and platform services, the math gets harder.

It also breaks at the very top of volume. A hyperscaler EDP at 8-figure annual commit pulls H100 effective rates into a range that is competitive with neocloud reserved rates, and the bundle is genuinely useful at that scale. The neocloud sweet spot is the middle: too big for credit-card on-demand, too small for hyperscaler EDP negotiating leverage.

The Blackwell transition#

B100 and B200 supply through 2025 was rationed first to hyperscalers and the largest labs. Neoclouds got meaningful B200 capacity starting late Q4 2025 and through Q1 2026. By mid-2026 the curve is normal: any serious neocloud customer can reserve B200s on a 3-to-6 month lead time. The practical question for most teams is whether B200’s roughly 2 to 3x throughput on the right workloads is worth the 1.6 to 1.9x price multiple — and whether their software stack actually exploits it. vLLM, TensorRT-LLM, and SGLang have all had production Blackwell support for a while; PyTorch is mature; many in-house training stacks still need real tuning work to see the full uplift.

Spot vs reserved GPU curve

How we place workloads in 2026#

For client engagements in the last twelve months we have used roughly this decision tree.

  • Hosted inference APIs (Bedrock, OpenAI, Anthropic, Together, Fireworks, DeepInfra) for anything that is not strongly volume-bound or data-residency-bound. Cheaper and operationally simpler than self-hosting up to a real volume floor.
  • Hyperscaler reserved GPUs (AWS p5, GCP A3, Azure ND H100 v5) when the workload is tightly coupled to that cloud’s data and platform services and the EDP discount is real.
  • Neocloud reserved GPUs (CoreWeave, Lambda, Crusoe) for training clusters of 64 to 1,024 GPUs that can live cleanly inside the neocloud’s environment for weeks at a time. This is the sweet spot.
  • Spot and community tiers (RunPod, Vast.ai, hyperscaler spot under Karpenter) for interruption-tolerant batch fine-tuning and experimentation. Spot strategies for ML deserve their own treatment — see spot instances for ML training.
  • On-prem GPU only for the most regulated cases where even in-country cloud is not acceptable.

For the broader cost question across all of this, see our take on GPU cost across spot, reserved, and on-demand and the related FinOps cloud cost optimization piece.

What the market gets wrong#

Two persistent misreadings.

The first is treating neoclouds as a single category. CoreWeave and Vast.ai are barely in the same business. The enterprise-grade neocloud (CoreWeave, Lambda, Crusoe) and the marketplace neocloud (Vast.ai, RunPod Community) serve different buyers and have different operational realities.

The second is treating the price gap as durable margin. NVIDIA capture, datacenter buildout cost, and power contracts move year to year. The 30 to 60 percent neocloud advantage versus hyperscaler list looks different against deeply negotiated EDPs, and looks different again as hyperscalers respond with sharper AI-specific pricing. We expect the gap to compress through 2026 and 2027, not disappear.

Where pdpspectra fits#

Our cloud infrastructure and ML and MLOps practices place training and inference workloads across hyperscalers and neoclouds. We help clients evaluate the actual workload-specific math, negotiate the reservation shape, and operate the resulting fleet — including the unglamorous work of monitoring, capacity planning, and contract review.

Related reading: GPU cost across spot, reserved, and on-demand, sub-100ms inference with vLLM, Triton, and TGI, and Mixture-of-Experts inference economics.


The GPU rental market is real infrastructure now, not a side experiment. Talk to our team about placing your training or inference workload.