Docker in Production: Patterns That Stop Costing You Money

Most production Docker setups have the same three flaws. Multi-stage builds, image hygiene, and runtime patterns that scale beyond hello-world.

Docker in Production: Patterns That Stop Costing You Money

Docker is one of those tools where the gap between “tutorial Docker” and “production Docker” is enormous. A FROM python:3.12 Dockerfile that works on your laptop will, in production, be 2GB, take minutes to pull, fail under restrictive security scans, and burn ECR egress every deploy. The patterns to fix this are well-known but routinely skipped.

We audit Docker setups across hospital platforms, banking workloads, and SaaS clients. The same three flaws appear in 80% of cases. Here are the patterns we apply to fix them.

Pattern 1: Multi-stage builds, always#

The single biggest image-size win is multi-stage builds. Build in a stage with all your tooling, copy only the artifacts into a lean runtime stage.

# Build stage: full toolchain
FROM node:20-slim AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Runtime stage: minimal
FROM node:20-slim AS runtime
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package.json ./
EXPOSE 3000
CMD ["node", "dist/server.js"]

The runtime image only contains what’s needed at runtime. No build tools, no source files, no test dependencies. Typical savings: 60-80% off the image size.

For compiled languages (Go, Rust), the runtime stage can be FROM scratch or FROM gcr.io/distroless/static — final images measured in megabytes, not gigabytes.

Pattern 2: Pin base images by digest, not tag#

FROM node:20 is how you get a different image on every build. Today’s node:20 is node:20.11.0; tomorrow’s might be node:20.11.1 with a vulnerability you didn’t expect.

Better:

FROM node:20.11.1-slim

Best:

FROM node:20.11.1-slim@sha256:abc123def456...

Pinning by SHA digest guarantees the exact same base image bit-for-bit. Reproducible builds, no surprises.

Tools like Renovate / Dependabot can auto-update these pins via PRs so you stay current without losing reproducibility.

Pattern 3: Layer caching that actually works#

A Dockerfile that does COPY . . early causes every code change to bust the dependency cache, forcing a full reinstall every build. Common pattern; routinely costs 5+ minutes per CI build.

Right order:

# 1. Copy ONLY the dependency manifests
COPY package*.json ./
RUN npm ci  # cached unless package.json changes

# 2. THEN copy the rest of the code
COPY . .
RUN npm run build

Same pattern for Python (COPY requirements.txt, pip install, then COPY .), Rust (COPY Cargo.toml Cargo.lock, cargo build, then COPY src), Go (COPY go.mod go.sum, go mod download, then COPY .), etc.

This single ordering change typically cuts CI Docker build time by 70-90% on the cache hit path.

Pattern 4: Run as a non-root user#

Default Docker runs containers as root. If the container is compromised, attacker has root inside. Many container scanners flag this as a security finding.

FROM node:20-slim
RUN useradd -m -u 1001 app
USER app
WORKDIR /home/app
# ... rest of Dockerfile

Or use a distroless base image with a nonroot variant: FROM gcr.io/distroless/nodejs20-debian12:nonroot.

Combined with Kubernetes’ runAsNonRoot: true security context, this prevents the container from ever running as root regardless of what the Dockerfile says.

Pattern 5: .dockerignore like you mean it#

A missing .dockerignore copies your node_modules, .git, dist, *.log, etc. into the build context. Bloats the image, breaks caching, leaks secrets.

.git
node_modules
dist
build
*.log
.env
.env.*
.idea
.vscode
coverage
.pytest_cache
__pycache__
*.pyc
README.md
docs/
tests/

Audit your build context size: docker build --progress=plain shows it. If it’s >100MB, you’re missing things.

Pattern 6: Health checks#

Containers without health checks are containers that “appear running” but might not be serving traffic. Always:

HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
  CMD curl -fsS http://localhost:3000/health || exit 1

Kubernetes has its own probe model (livenessProbe, readinessProbe) which supersedes the Dockerfile HEALTHCHECK for K8s deployments. But for ECS, Docker Compose, plain Docker runs, the HEALTHCHECK is what matters.

For Kubernetes, define probes in your Pod spec — and make them meaningful (not just “the HTTP server is alive”). See our PyTorch in production piece for examples of meaningful readiness probes.

Pattern 7: Don’t use :latest for anything that matters#

# Don't:
image: my-app:latest

# Do:
image: my-app:1.2.3
# or:
image: my-app@sha256:abc...

:latest is how a “just redeploy” silently installs a different version than what you tested. Pin tags or, better, SHAs in production manifests.

CI should push immutable tags — typically a combination of git SHA + semver. Never overwrite an existing tag.

Pattern 8: Read-only root filesystem#

Containers don’t usually need to write to their root filesystem. Mark it read-only:

# Kubernetes
securityContext:
  readOnlyRootFilesystem: true
  # Mount specific writable volumes if needed
volumeMounts:
  - name: tmp
    mountPath: /tmp
volumes:
  - name: tmp
    emptyDir: {}

Compromises that try to drop files into /tmp/malware or modify /etc fail. Doesn’t help with all attacks, but raises the bar.

Pattern 9: Logging to stdout/stderr, not files#

The 12-factor app principle. Containers write logs to stdout/stderr; the orchestrator (Docker, Kubernetes, ECS) handles capturing and shipping them.

# Right:
print("Application started")

# Wrong:
with open('/var/log/app.log', 'a') as f:
    f.write("Application started\n")

Apps that write to log files inside containers fill the writable layer, complicate log shipping, and break the read-only filesystem pattern above.

Anti-patterns we rip out#

A few patterns we routinely audit out:

  • Mounting docker.sock into a container. Privilege escalation waiting to happen. If you need to talk to Docker, use a dedicated tool (Kaniko, Buildah, BuildKit remote).
  • Hardcoding secrets in environment variables in the Dockerfile. ENV API_KEY=... ships the key in the image layer history forever. Use runtime secret injection (K8s secrets, Vault — see our Vault piece).
  • apt-get install without --no-install-recommends and rm -rf /var/lib/apt/lists/*. Adds 100s of MBs of package indexes and recommended-but-unused packages.
  • Multiple RUN apt-get install commands. Each is a layer. Combine into one RUN for image size.
  • Building images from full distros when distroless or alpine would work. ubuntu:22.04 is 80MB before you’ve added anything. gcr.io/distroless/static is 2MB.
  • Not setting WORKDIR. Defaults to / which means your app files end up in the root filesystem. Use WORKDIR /app (or similar).
  • Running tests in the production image. Tests should run in CI, not in the image you deploy. Multi-stage build keeps test deps out of runtime.

Image scanning in CI#

Add Trivy, Grype, or Snyk to your CI pipeline:

# GitHub Actions
- name: Scan image for vulnerabilities
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: my-app:${{ github.sha }}
    format: sarif
    severity: HIGH,CRITICAL
    exit-code: 1

Fail the build on high/critical CVEs. Don’t just generate reports nobody reads. Patching CVEs at the CI gate is far cheaper than patching them in production.

What we deploy by default#

For client work, every Dockerfile we ship has:

  • Multi-stage build (build stage + minimal runtime stage)
  • Pinned base image versions (often by digest)
  • Layer caching ordered for dependency stability
  • Non-root user
  • .dockerignore that excludes everything that isn’t needed
  • Health check (Dockerfile HEALTHCHECK + Kubernetes probes)
  • Read-only root filesystem (where the app permits it)
  • Logs to stdout/stderr
  • Trivy scan in CI, fail on high/critical CVEs

Plus, in Kubernetes:

  • runAsNonRoot: true in security context
  • readOnlyRootFilesystem: true
  • Resource requests + limits (never deploy without these)
  • Pod security standards (restricted profile for new workloads)

For hospital and banking platforms, these aren’t optional — they’re the baseline that compliance reviews look for.

The thing Docker doesn’t solve#

Docker packages your app. It doesn’t solve:

  • Image distribution at scale: that’s your container registry’s problem (ECR, GCR, Harbor).
  • Orchestration: that’s Kubernetes / ECS / Nomad.
  • Secret management: see our Vault piece.
  • Network policy: that’s a service mesh or Kubernetes NetworkPolicy concern.
  • Runtime security: that’s Falco, Wiz, Aqua, etc.

Docker is one component in a production stack. The patterns above make sure that component pulls its weight; the rest is other tools’ jobs.

The pattern of patterns#

Production Docker is mostly discipline applied to small choices. None of the patterns above are novel — they’re in the Docker docs, the OWASP container security guide, and a thousand blog posts. Teams that ship reliable Docker workloads apply them consistently; teams that don’t have surprises.

The image size, the CVE count, the build time, the security posture — all of these are downstream of the basic patterns. Get them right early and the rest of the operational story gets easier.


Production Docker is discipline applied consistently. None of it is exotic. If you’re auditing a container stack or building one fresh, our DevOps and CI/CD team has shipped this discipline across regulated and consumer platforms. Tell us about the workload.