Spot Instance Strategies 2026 Karpenter Kueue

Spot capacity moved from “exotic optimization” to “expected default” between 2022 and 2025. Karpenter’s spot consolidation, Kueue’s batch scheduling, and the maturity of interrupt-handling patterns mean that any reasonably modern Kubernetes platform can run 40 to 70 percent of its compute on spot without making the platform-engineering team miserable. The headline savings are real — 60 to 91 percent off on-demand depending on instance shape and region — and the operational cost has dropped sharply.

This is the 2026 spot playbook as we deploy it at clients across training, inference, batch, and stateless services.

Spot capacity flow

What spot is and is not in 2026#

The basics, refreshed for current behavior:

AWS Spot. Spare EC2 capacity, sold at a discount of typically 60 to 90 percent off on-demand. Two-minute interruption notice via the instance metadata service or CloudWatch events. Pricing is now set by AWS rather than fluctuating in real time as it did pre-2018, which made the planning math far easier. Spot Fleet and EC2 Fleet APIs let you specify mixed instance type / mixed AZ / weighted allocation strategies.
GCP Spot VMs. Replaced preemptible VMs in 2021 as the production-grade spot offering. 60 to 91 percent off on-demand, no minimum runtime, 30-second termination notice (improved from the legacy 30-second window).
Azure Spot VMs. Spare Azure capacity, eviction by price or capacity, 30-second eviction notice. Discounts vary heavily by region and instance family — sometimes excellent, sometimes flat.

In all three clouds, spot is now stable enough that production workloads run on it routinely. The interruption rate is the constraint, not the capacity availability per se.

Karpenter and the spot consolidation pattern#

Karpenter (AWS, expanding to GCP and Azure through 2024-2025) changed the operational reality of spot on Kubernetes. The pattern that actually works:

Provision a Karpenter NodePool with a wide diverse spot allocation: 30+ instance types across multiple AZs and instance families.
Use the karpenter.sh/capacity-type: spot requirement on the pool.
Set Karpenter’s consolidation policy to WhenEmptyOrUnderutilized with a TTL of 30 to 60 seconds for stateless workloads, longer for stateful.
Use Pod Disruption Budgets (PDBs) on every workload that should survive consolidation events.
Run the AWS Node Termination Handler or the equivalent on GCP/Azure to gracefully drain spot nodes on interrupt notice.

The result: Karpenter constantly rebalances the fleet onto the cheapest healthy spot capacity, consolidates underused nodes, and drains gracefully when AWS reclaims capacity. Steady-state spot share of 50 to 70 percent across stateless workloads is achievable without a dedicated SRE babysitting it.

The unglamorous detail that determines whether this works: instance type diversity. A NodePool that allows only m5.large and c5.large will see way more interruptions than one that allows 30+ instance types across 4 AZs. The Spot Fleet allocation strategy price-capacity-optimized (default in modern Karpenter) is the right pick — it weights toward instance pools with deeper spare capacity, not just the cheapest pool.

Kueue for batch workloads#

Kueue, the Kubernetes-native job queueing system that graduated to v1.0 in 2024, is the right shape for batch and training workloads on spot. It handles:

Workload prioritization across teams (cohorts and ClusterQueues with fair-sharing).
Quota management (you cannot starve high-priority work).
Gang scheduling for multi-pod jobs (all-or-nothing scheduling).
Workload preemption when higher-priority work arrives.

For a multi-team ML training cluster on spot, Kueue is the missing piece between Karpenter (which gives you nodes) and the actual Job (which needs to run somewhere). Combined with Kubernetes Jobs, Argo Workflows, or Volcano, Kueue makes a spot-heavy batch cluster operationally tractable.

The interrupt-handling patterns that actually work#

The single most important thing to internalize: spot interrupts happen, and your workload either tolerates them or it does not. The patterns that work:

Stateless web services. Multiple replicas behind a load balancer, PDB allowing at most 25 percent disruption at a time, readiness gates so traffic does not route to draining pods. Karpenter consolidation events are handled fine. The interruption budget at the workload level is the key knob.

Stateful services on spot. Sometimes works — usually does not. Postgres on spot is a bad idea; Redis cache on spot is fine; Kafka brokers on spot are subtle but workable with careful disruption budgets and slow rebalance settings. The rule of thumb: if the cost of repopulating state exceeds 2 to 3 minutes of operator attention, do not use spot.

Training jobs with checkpointing. The pattern OpenAI and the major labs use: checkpoint every 5 to 15 minutes, on interrupt notice flush a final checkpoint and exit cleanly, restart picks up from the last checkpoint. Modern training frameworks (PyTorch with TorchElastic, JAX with checkpoint sharding) make this approachable.

Inference on spot. Trickier. Production user-facing inference cannot tolerate a 2-minute drain on every interrupted instance. The compromise we typically deploy: 60 to 70 percent of inference fleet on spot, 30 to 40 percent on-demand, with the on-demand floor sized to absorb the worst-case simultaneous spot drain. Aggregate cost is roughly 40 to 50 percent off on-demand.

Batch inference and bulk processing. Excellent fit. No interactive latency requirement; restartable; idempotent. We routinely run batch embedding generation, document processing pipelines, and analytics ETL almost entirely on spot.

OpenAI’s checkpointing approach, generalized#

The checkpoint-and-restart pattern that came out of the major model labs is now standard for any serious training on spot:

Checkpoint cadence sized to the cost of replay: if a 15-minute replay is acceptable, checkpoint every 15 minutes. If 5 minutes is the limit, checkpoint every 5 minutes.
Checkpoint storage on durable object store (S3, GCS, Azure Blob) with the parallel write pattern from TorchElastic or DeepSpeed.
On spot interrupt notice (the 2-minute or 30-second window), trigger an emergency final checkpoint, drain.
Resumption: pull the latest checkpoint, validate, resume from the closest stable step.

For long pretraining runs (weeks of compute), this works. For smaller fine-tuning runs (hours), the overhead of checkpoint infrastructure can outweigh the spot savings — there is a crossover.

Training vs inference spot economics#

Rough late-Q1 2026 numbers for a 70B-class workload:

Workload	On-demand baseline cost	Spot share	Effective discount
Pretraining cluster (10k GPU-hour run)	100 percent reference	80 to 90 percent on spot	65 to 75 percent off
Fine-tuning run (200 GPU-hours)	100 percent reference	60 to 80 percent on spot	50 to 65 percent off
Production inference (24/7 traffic)	100 percent reference	50 to 70 percent on spot	30 to 45 percent off
Batch inference (overnight bulk)	100 percent reference	90 to 100 percent on spot	70 to 85 percent off

The training-versus-inference gap is real. Training tolerates interrupts cleanly with checkpointing; production inference does not, so the on-demand floor cuts into the discount.

The cross-cloud reality#

Spot economics differ between clouds:

AWS Spot has the deepest market, most diverse instance types, and the most mature interrupt-handling tooling. Karpenter is AWS-first. The right default for serious spot strategy at scale.
GCP Spot VMs are often the cheapest in absolute terms but have a smaller market in some regions. The interrupt rate can be slightly higher than AWS for popular instance types.
Azure Spot VMs vary heavily by region. Excellent in some regions for specific instance families; less consistent than AWS.

Cross-cloud spot strategies are rare in practice — the operational cost of straddling multiple clouds usually outweighs the cost arbitrage.

Karpenter spot consolidation

Where spot does not help#

Three workload shapes where spot is the wrong call:

Workloads where the cost of an interrupt exceeds the savings. If draining a pod requires 30 minutes of human attention and the savings are 200 dollars per month, do not use spot.
Workloads with hard latency SLOs that cannot tolerate replica disruption. Trading systems, real-time bidding, low-latency inference for paying customers.
Workloads where the instance state is genuinely unique. Single-replica databases, leader-only services without proper failover. Spot will burn you.

How we deploy spot in 2026#

For client engagements, the typical shape:

Stateless web and API services: 60 to 80 percent on spot via Karpenter with diverse instance pools, PDBs, and graceful shutdown.
Batch and async workloads: 90 to 100 percent on spot via Kueue and Karpenter, with replay-tolerant job design.
ML training: 70 to 90 percent on spot with checkpointing, depending on run length and replay tolerance.
ML inference: 50 to 70 percent on spot with an on-demand floor sized to worst-case simultaneous drain.
Stateful databases and caches: on-demand or reserved, with rare exceptions for restartable cache layers.

The aggregate effect across a typical platform: 35 to 55 percent off the steady-state compute bill versus all-on-demand. That is independent of the reserved-commitment savings — they stack.

For the broader cost question, see our FinOps reserved instances piece, the GPU rental market piece, and the related spot instances for ML training take.

Where pdpspectra fits#

Our DevOps and CI/CD and cloud infrastructure practices design and operate spot-heavy Kubernetes platforms. We help clients build the Karpenter, Kueue, and interrupt-handling patterns that let them run 50 to 70 percent on spot without operational pain.

Spot is the lowest-effort 35 to 55 percent compute discount available. Talk to our team about deploying it on your platform.