Network Policies on K8s: From Default-Deny to Workable
Default-deny network policies are the gold standard and rarely deployed. The patterns that get you there without breaking apps.
Default-deny network policies on Kubernetes — where no pod-to-pod traffic is allowed except what’s explicitly permitted — are the gold standard for cluster security. They prevent lateral movement, contain blast radius, and match the broader zero-trust posture most security teams want. They’re also rarely actually deployed because they’re operationally substantial to get right. This post walks through the patterns that make default-deny workable.
Why default-deny is the gold standard#
The security argument is straightforward:
Lateral movement prevention. When an attacker compromises one pod, they shouldn’t be able to reach the rest of the cluster. Default-deny network policies enforce this.
Blast radius containment. When a component fails or behaves badly, the impact is contained. Cascading failures are reduced.
Auditability. Network policies are declarative and reviewable. Auditors and security reviewers can verify what’s allowed.
Compliance. Substantial compliance frameworks (PCI, HIPAA, SOC 2, ISO 27001) explicitly want network segmentation.
The argument against is operational: default-deny breaks things until you’ve identified every legitimate flow.
Why teams don’t actually deploy default-deny#
Several common reasons:
Application discovery work is substantial. Knowing every legitimate flow requires substantial discovery work. Larger clusters with substantial existing workloads have substantial flows to enumerate.
Breaking things is expensive. A misconfigured policy can break production. Teams are appropriately cautious.
Network policy implementation varies. Calico, Cilium, Antrea, plus the various CNI implementations have different behavior and tooling. Cross-cluster consistency is hard.
Egress is harder than ingress. Egress traffic to external services (databases, APIs, plus the various) requires substantial policy work and ongoing maintenance as endpoints change.
DNS traffic in particular. CoreDNS traffic is often missed; missing it produces substantial application failures.
The workable progression#
A workable path to default-deny:
Phase 1: Observation. Deploy network policy enforcement tooling in observation mode. Cilium Hubble, Calico Flow Visualizer, plus the various provide visibility into actual pod-to-pod traffic. Run for at least a week to capture all flow patterns.
Phase 2: Per-namespace default-deny in non-production. Deploy default-deny in a non-production environment, namespace by namespace. Allow only the flows the observation phase identified. Address breakages.
Phase 3: Per-namespace default-deny in production. Once non-production is stable, roll out to production namespace by namespace. Start with low-risk namespaces (development, staging that mirrors prod, plus the various). Move to higher-risk namespaces.
Phase 4: Cluster-wide default-deny. Once all namespaces are running default-deny, cluster-wide default-deny becomes a hardening step.
The progression takes months for substantial clusters. The patience pays off.
The specific policy patterns#
Several patterns we’ve found effective:
Allow DNS explicitly. Pods need to reach CoreDNS (typically in kube-system). Allow this at the namespace policy level.
Allow monitoring scrapes. Prometheus or equivalent needs to reach metrics endpoints. Allow this with specific source label.
Allow ingress controller access. If you’re using ingress controllers, allow them to reach backend pods.
Allow service mesh sidecars. If you’re running Istio, Linkerd, plus the various, allow the sidecar communication patterns.
Tier policies by namespace purpose. Production namespaces get strict policies; development namespaces can be more permissive. The risk profile justifies different posture.
Egress to specific endpoints, not “all egress.” Pod that needs to reach api.stripe.com gets policy allowing that specific egress. Don’t open all egress.
The CNI choice matters#
Network policy implementation depends on CNI:
Calico — substantial network policy maturity, large enterprise deployment.
Cilium — eBPF-based, substantial visibility and feature depth, increasingly the modern default.
Antrea — VMware-anchored, substantial in VMware ecosystem.
AWS VPC CNI with Calico for policies — AWS-specific common pattern.
Azure CNI with Calico — comparable pattern.
GKE Network Policy with Calico — comparable.
Cilium has gained substantial momentum in 2024-2026; many net-new deployments default to it.
The observability requirements#
Default-deny operations require substantial visibility:
Real-time flow logging. Cilium Hubble, Calico Flow Visualizer, plus the various provide pod-to-pod traffic visibility.
Denied-traffic alerting. When traffic is denied by policy, you want visibility to investigate. Sometimes it’s legitimately denied (good); sometimes it’s a missing policy (needs fix).
Application-team visibility. Application teams need to understand network policy impact on their services. Self-service visibility tools matter.
Audit and compliance reporting. Substantial compliance requirements for network segmentation.
The egress challenges#
Egress to external services is the hardest part:
External APIs change. When api.example.com changes IPs, policies based on IP need updating. FQDN-based policies (Cilium supports natively) help.
External egress through proxy. Many enterprises route external egress through forward proxy. Network policies allow egress to the proxy; the proxy handles external destinations.
Cluster-egress via NAT or NAT gateway. Cloud-specific patterns where the cluster has specific egress IPs that external services whitelist.
What we typically see at clients#
Common patterns:
No network policies at all. Most clusters default to no policies. Default-allow is the actual posture. Substantial security gap.
Policies in some namespaces. Selective deployment in security-sensitive namespaces; rest of cluster is default-allow.
Default-deny in non-production only. The most-common partial deployment — non-prod has default-deny, prod is default-allow.
Substantial default-deny deployments — increasingly common in regulated industries; rarer elsewhere.
Where pdpspectra fits#
Our DevOps practice builds production Kubernetes platforms with appropriate network segmentation.
Related reading: the cluster autoscaler post, the GitOps post, and the Kubernetes secrets post.
Default-deny is the gold standard. Talk to our team about your Kubernetes security posture.