Kubernetes Resource Quotas and LimitRanges: Cost Control That Works

Default K8s clusters silently overspend. Resource quotas and limit ranges, applied early, cut waste 30-50%.

Kubernetes Resource Quotas and LimitRanges: Cost Control That Works

Kubernetes clusters silently overspend by default. The defaults — no resource requests, no limits, no quotas — let any pod consume any available resource. In a single-team cluster this rarely matters; in a multi-team cluster it produces consistent over-provisioning, noisy neighbors, and cost surprises that nobody can attribute. Resource quotas and limit ranges, applied early, typically cut cluster waste by 30-50% without changing application behavior.

This post walks through what to set, when to set it, and the operational discipline that makes the policies stick.

The three layers of control#

Kubernetes provides three complementary mechanisms for resource governance.

ResourceQuotas operate at the namespace level. They cap the total CPU, memory, storage, and object counts that any namespace can consume. A team’s namespace gets a quota; if they exceed it, new pods fail to schedule. The quota forces conversations about prioritization within the team rather than letting them consume cluster-wide capacity.

LimitRanges operate at the pod or container level within a namespace. They enforce minimum and maximum resource specifications, and they apply defaults when developers forget to set requests or limits. The most useful application is the default request — pods that don’t specify what they need get a sensible default rather than scheduling with zero declared resources.

Pod-level requests and limits are the actual resource declarations on each pod. ResourceQuotas and LimitRanges enforce policy on top; pod specs are where the actual workload runs.

What to set on day one#

For any meaningful multi-tenant cluster, four policies should be in place from cluster creation:

Default CPU and memory requests via LimitRange. Typically 100m CPU and 128Mi memory. Pods that don’t declare requests get the default rather than nothing. This prevents the “I forgot to set requests so my pod scheduled on a full node and got OOMKilled when traffic arrived” failure mode.

Default CPU and memory limits via LimitRange. Typically 4x the default request. Prevents runaway pods from consuming entire nodes.

Per-namespace ResourceQuota for total CPU, memory, and persistent volume claim count. The numbers depend on the team and the workload, but having any quota is better than having none.

Object count quotas to prevent runaway pod creation. A team that accidentally creates 10,000 pods through a misconfigured operator should hit the quota and fail fast rather than consuming the cluster.

What we typically see at clients#

Through cluster audits, the common patterns are consistent.

Pods with massive limits — multiple CPUs, several gigabytes of memory — that actually use 5% of what they request. The pod author copied the spec from somewhere, never measured actual usage, and the cluster has been over-provisioned by 10-20x for years.

Pods with no requests at all. They schedule fine in a quiet cluster and start failing under load when the kernel OOM-kills them or the scheduler places them on already-full nodes.

Teams that have never seen their own resource consumption because they have no visibility into namespace-level usage. The quota conversation is the catalyst for actually measuring.

Operators and controllers that create thousands of objects in failure modes. Without object quotas, these can take down the cluster’s etcd or API server.

The migration to quotas in an existing cluster#

Applying quotas to a running multi-tenant cluster requires care. Sudden enforcement breaks workloads. The pattern that works:

Step 1: Audit current consumption. For each namespace, measure actual CPU, memory, and PVC usage over a representative period (typically 30 days). The metrics-server data, Prometheus, or cloud-provider native tools all work.

Step 2: Set quotas slightly above current consumption with explicit headroom. The team can keep operating; the quota becomes a ceiling rather than an immediate constraint.

Step 3: Add LimitRanges with defaults that match observed reasonable usage. Existing pods are unaffected; new pods that don’t specify requests get the defaults.

Step 4: Communicate the policies. Teams need to know the quotas exist before they hit them.

Step 5: Tighten quotas over time as teams right-size. The savings come from this iteration rather than the initial setup.

The policy engines#

For more sophisticated governance — different quota templates for different team tiers, conditional policies, custom defaults — policy engines layer on top of native Kubernetes resource governance.

OPA Gatekeeper and Kyverno are the dominant choices. Both can enforce policies that go beyond what ResourceQuotas and LimitRanges express natively. Common policies: require resource requests on all pods, require specific labels for cost attribution, prevent pods from running as root, enforce image registry policies.

For most teams, native quotas plus a policy engine for the edge cases covers the operational reality.

The cost attribution layer#

Resource quotas plus consistent labeling make cost attribution tractable. Each namespace maps to a team or product; each pod inherits labels from its namespace; cloud cost data joined with Kubernetes labels produces per-team cost reports.

Tools like Kubecost, OpenCost (CNCF), or the cloud-vendor native tools (AWS Cost Explorer’s Kubernetes view, GCP’s cost insights) make this practical without building it yourself.

Where pdpspectra fits#

Our DevOps practice audits Kubernetes clusters for cost, performance, and security. Resource quota work is a routine part of cluster optimization engagements.

Related reading: the Kubernetes production patterns post, the FinOps cloud cost optimization post, and the Karpenter vs Cluster Autoscaler post.


Cluster cost discipline starts with quotas. Talk to our team about your platform.