Vault Production Secrets Management

Secrets management is one of those problems where the wrong answer is invisible until it isn’t. Hardcoded API keys in code, env vars in CI logs, credentials in Slack — every team has done all of these at least once. HashiCorp Vault is the heavyweight answer to “do it right.” It’s also genuinely complicated to operate, and for many teams a lighter option (cloud-native secrets, External Secrets Operator, SOPS) is the right starting point.

We’ve deployed Vault for hospital platforms with HIPAA requirements, banking workloads that need NRB compliance, and enterprise platforms with strict secrets-at-rest mandates. Here’s how we actually decide when to use it and what to avoid.

What Vault is for#

Strip the marketing: Vault is a service that stores secrets, encrypts data at rest with a master key, and gates access via flexible auth methods. The killer features beyond “secret store”:

Dynamic secrets: Vault can generate database credentials, AWS IAM credentials, SSH certificates on the fly with TTLs. The DB user “app-foo-2026-05-26T12:34:56” exists for 1 hour then is automatically revoked.
Transit engine: Vault can encrypt data on behalf of apps without giving them the encryption keys. Apps send plaintext, get ciphertext back. Key never leaves Vault.
PKI: Vault can be a private CA. Apps request short-lived certificates; Vault issues them; expiry is automatic.
Audit logging: Every secret access is logged with auth identity, timestamp, and request details.

If you’re using Vault as “just a key-value store,” you’re using a sledgehammer to drive a nail.

When you actually need Vault#

You probably need Vault if:

You’re in a regulated environment (healthcare, finance, government) where audit logging of every credential access is a compliance requirement
You need dynamic database credentials so dev environments can’t accidentally use prod credentials
You’re issuing certificates internally and need them rotated automatically
You have a multi-tier secrets policy (“this team can read this secret, but not that one”) that goes beyond what cloud-native ACLs handle
You’re truly multi-cloud and need one secrets system that works across providers

You probably DON’T need Vault if:

You’re on AWS — AWS Secrets Manager + Parameter Store + IAM Roles handle most use cases simpler
You’re on GCP — Secret Manager + Workload Identity is cleaner
You’re a small team without dedicated platform engineering capacity
Your “secrets management” need is “encrypted env vars in CI” — SOPS or Doppler is enough

The deployment patterns#

Pattern 1: Vault as the system-of-record, cloud-native consumers#

The most common pattern we deploy:

Vault Enterprise (or self-hosted OSS) stores all canonical secrets
External Secrets Operator (ESO) in Kubernetes syncs Vault secrets into K8s Secret objects
Workloads consume K8s secrets via standard mount or env var
Vault audit logs ship to your SIEM (Splunk, Datadog, etc.)

This pattern keeps the application code unchanged (still reads from env vars or files) but centralizes the source of truth in Vault.

Pattern 2: Vault Agent sidecar#

Apps run with a vault-agent sidecar that handles authentication and writes secrets to a shared file/volume. The app reads from the file.

Useful when:

The app needs to refresh secrets without restarting (vault-agent can re-render templates)
You’re outside Kubernetes (VMs, bare metal)
The team is comfortable with the sidecar pattern

Pattern 3: Direct Vault API from app#

Apps authenticate to Vault directly and read secrets. Skip the K8s Secret intermediary.

Useful when:

You’re using dynamic secrets (which K8s Secrets can’t easily represent)
The app needs fine-grained per-request auth (e.g., AWS IAM credentials scoped to a specific operation)
Performance is non-critical (each Vault read is a network roundtrip)

Pattern we avoid: long-lived secrets, no rotation#

Stuffing static API keys into Vault and never rotating them. Defeats most of Vault’s value. If you’re not using Vault for dynamic secrets or short-lived credentials, a cheaper KV store would do the same job.

Vault operational surface#

Vault is a stateful distributed system. Things to know:

Storage backend. Vault stores its encrypted data somewhere. Integrated Storage (Raft) is the modern default — Vault manages its own data. Old setups use Consul as a backend; we don’t recommend new deployments do this.

Unseal. When Vault starts, its master key is encrypted with unseal keys. Until enough unseal keys are provided (Shamir’s Secret Sharing), Vault is “sealed” and rejects requests. Production setups use Auto-Unseal via cloud KMS (AWS KMS, GCP KMS, Azure Key Vault) — Vault automatically unseals itself using a cloud KMS key.

HA and clustering. Vault Raft mode runs an odd-number cluster (3 or 5 nodes typically). One leader, followers replicate. If you lose quorum, you have a real incident.

Backup. vault operator raft snapshot save regularly. Practice restore.

Upgrade story. Vault versions every few months. Major version bumps have specific upgrade paths. Read the changelog every time.

Audit log volume. Audit logs are written for every API call. On a busy cluster, this is a lot of data. Plan storage and rotation.

The auth method that matters#

Vault has 15+ auth methods (LDAP, OIDC, GitHub, AWS, Kubernetes, AppRole, etc.). For production, two combinations cover most cases:

Kubernetes auth + ServiceAccount: pods authenticate via their K8s service account token. Vault validates against the K8s API. No long-lived credentials in pods.

AWS IAM auth: EC2 instances or Lambda functions authenticate via their instance/role identity. No credentials in app config.

OIDC for humans: humans log in via your SSO (Okta, Google, GitHub) → Vault. No personal Vault credentials.

AppRole for legacy systems only. Long-lived role ID + secret ID combo. Convenient but a worse security posture than dynamic auth. Reserve for systems that can’t use Kubernetes or AWS native auth.

Comparison with cloud-native options#

Need	Vault	AWS Native	GCP Native
KV store	KV v2 engine	Secrets Manager	Secret Manager
Encryption-as-a-service	Transit engine	KMS + envelope encryption	KMS + envelope encryption
Dynamic DB credentials	Database secrets engine	RDS IAM auth (limited)	Cloud SQL IAM auth (limited)
Dynamic cloud credentials	AWS / GCP / Azure secrets engines	Native (IAM roles)	Native (Workload Identity)
Internal PKI	PKI secrets engine	Private CA	Certificate Authority Service
Multi-cloud	Yes (designed for it)	AWS-only	GCP-only
Audit log granularity	Excellent	Good (CloudTrail)	Good (Audit Logs)
Operational cost	Real (you operate it)	Managed	Managed

For pure-AWS workloads, AWS-native is usually the right answer. Lower ops cost; tighter IAM integration; cheaper. We reach for Vault when there’s a real reason: multi-cloud, on-prem requirements, dynamic credentials beyond what RDS IAM offers, or compliance audits that specifically prefer Vault.

Patterns that bite#

A few things we’ve seen go wrong in production Vault deployments:

Lost unseal keys. Without unseal keys (or auto-unseal KMS access), a sealed Vault is permanently sealed. There is no recovery. Document where unseal keys live. Test restore.

Tokens that never expire. Vault tokens have TTLs. Forgetting to set sensible TTLs means long-lived tokens accumulate. Always set explicit TTLs and renewal windows.

Audit log disk full. Vault refuses to serve secrets if its audit log can’t be written. A full audit disk takes down the cluster. Monitor disk; rotate aggressively.

KV v1 instead of KV v2. KV v2 supports versioning, soft-delete, metadata. KV v1 doesn’t. Use v2 for all new mounts.

Policies that grant * on secret/*. “Admin” policies copy-pasted from tutorials. Audit your policies regularly; remove wildcards.

No backup testing. Backups that haven’t been restored don’t exist.

What we deploy by default#

For a new platform that genuinely needs centralized secrets management:

Pure-AWS workload? AWS Secrets Manager + Parameter Store + IAM Roles. Skip Vault.
Pure-GCP workload? Secret Manager + Workload Identity. Skip Vault.
Multi-cloud, regulated, or dynamic-credentials-heavy? Vault Enterprise or OSS with:
- 3-node Raft cluster
- Auto-unseal via cloud KMS
- Kubernetes auth method for in-cluster apps
- AWS IAM auth for EC2/Lambda
- OIDC for humans
- External Secrets Operator in K8s for the K8s consumers
- Audit log shipping to SIEM
- Daily Raft snapshots to S3 with retention
- Disaster recovery: cross-region replication (Vault Enterprise) or warm standby restore plan (OSS)

For hospital management systems and banking platforms we deploy, Vault is often the right call because HIPAA/NRB audits like to see “secrets accessed by user X at time Y for operation Z.” Vault’s audit log answers that natively.

The pattern of patterns#

Vault is real infrastructure with real operational cost. If you can solve your secrets problem with cloud-native primitives, do that — it’s cheaper and simpler. Reach for Vault when you have a specific reason: multi-cloud, dynamic credentials, regulated audit, or on-prem requirements.

The teams that operate Vault well treat it like the critical infrastructure it is — clustered, backed up, monitored, audited. The teams that struggle are the ones that deployed it because “secrets management” was on a checklist, without designing for the operational surface.

Vault solves real problems and adds real ops surface. Pick deliberately. If you’re sizing secrets infrastructure for a regulated platform, our DevOps and CI/CD team has deployed both Vault and cloud-native alternatives. Tell us about the constraints.