Why Your Kubernetes Secrets Strategy Will Fail at 3 AM
Itâs 3 AM and Vault is down. Your on-call engineer gets paged because deployments are failingâpods stuck in ContainerCreating, blocking a critical hotfix. Meanwhile, another teamâs services keep humming along despite the same outage. The difference isnât luck. Itâs how secrets get into pods.
Both teams use Vault. Both followed the documentation. But one team chose External Secrets Operator, which syncs secrets periodically and caches them as native Kubernetes secrets. The other chose the Secrets Store CSI Driver, which fetches secrets on-demand when pods start. When Vault went down, ESOâs cached secrets kept working. CSIâs synchronous fetches failed, and pods couldnât start. This isnât about which tool is betterâitâs about understanding the failure mode youâve chosen before it matters.
The Two Patterns That Matter
External secrets management in Kubernetes has consolidated around two dominant patterns. External Secrets Operator runs as a controller in your cluster, periodically syncing secrets from Vault (or AWS Secrets Manager, Azure Key Vault, etc.) into native Kubernetes Secret objects. The Secrets Store CSI Driver takes a different approach: it mounts secrets directly into pods as volumes, fetching them from the external manager when pods start.
Both work fine when your secret manager is healthy. The difference is what happens when it isnât. ESO decouples secret fetching from pod lifecycleâthe controller syncs independently, and pods consume cached Kubernetes Secrets. CSI couples them tightlyâpods canât start until secrets are fetched. This architectural difference determines everything about how your applications behave during an outage.
| Pattern | Existing Pods | New Pods | Recovery |
|---|---|---|---|
| ESO | â Running | â Start (cached) | Automatic |
| CSI Driver | â Running | â Blocked | May need intervention |
Init containers offer a third pathâDIY scripts that fetch secrets before your main container startsâbut they require you to implement retry logic, fallback sources, and monitoring yourself. For most organizations, ESO or CSI covers the use case without that operational burden. (We cover init container patterns with fallback logic in our full guide.)
ESO: Graceful Degradation
External Secrets Operator works by watching ExternalSecret custom resources in your cluster. When you create an ExternalSecret, the controller fetches the referenced secrets from your external manager and creates (or updates) a native Kubernetes Secret. Your pods consume that Secret normallyâvia environment variables or volume mountsâcompletely unaware that it originated from Vault.
The controller runs its reconciliation loop on a configurable interval (typically 15-30 minutes). Each cycle, it checks whether the external secret has changed and updates the Kubernetes Secret if needed. This decoupling is ESOâs key advantage: the Kubernetes Secret persists in etcd independent of the external managerâs availability.
When Vault goes down, ESOâs controller logs errors and keeps retrying. But the Kubernetes Secret it already created remains unchanged. Existing pods keep running with their last-synced values. Hereâs the part that surprises people: new pods can also start. They mount the Kubernetes Secret normally, unaware that ESO is failing to sync. The Secret itself is the cache.
The downside is silent staleness. If you rotate a database password in Vault but ESO canât sync for two hours, your pods run with the old password. They work fineâuntil something restarts them after the old password has been revoked. This is why monitoring sync status matters. An ExternalSecret that hasnât synced in multiple refresh intervals indicates a problem, even if your applications seem healthy.
The cached Secret is encrypted in etcd only if youâve configured encryption at rest on your cluster. The external managerâs encryption doesnât carry overâonce ESO syncs a secret into Kubernetes, itâs subject to your clusterâs encryption configuration.
For most workloadsâweb applications, APIs, microservicesâ15-30 minutes of staleness is acceptable. Connection pools cache connections anyway, so this staleness window rarely causes immediate failures. ESOâs graceful degradation keeps services running through outages, which is usually the right tradeoff.
$ Stay Updated
> One deep dive per month on infrastructure topics, plus quick wins you can ship the same day.
CSI Driver: Loud Failures
The Secrets Store CSI Driver takes the opposite approach. Instead of syncing secrets to Kubernetes Secret objects, it mounts them directly into pods as volumes. When a pod starts, the CSI driver intercepts the volume mount, authenticates to Vault using the podâs service account, fetches the secrets, and presents them as files in the containerâs filesystem.
The pod cannot start until this volume mount succeeds. If Vault is unavailable, the mount fails. The pod stays in ContainerCreating with events showing MountVolume.SetUp failed. No graceful degradationâno start.
This has cascading implications that arenât obvious until you experience them:
- New deployments block entirelyâno pods can start
- In-progress rolling updates stall because replacement pods canât become ready
- Horizontal pod autoscaler scales up pods that immediately get stuck
- A node reboot during an incident restarts pods that canât fetch their secrets
Existing pods continue runningâthey already have their secrets mounted. But anything that needs to start fresh is blocked until Vault recovers.
This failure mode is loud, which is actually its advantage for certain use cases. Payment processing systems might prefer failing visibly over running with potentially stale credentials. Compliance requirements sometimes prohibit caching secrets in Kubernetes at all. If you need to guarantee that every pod startup uses fresh credentials from the authoritative source, CSIâs blocking behavior is a feature, not a bug.
CSIâs failure mode can cascade quickly during incidents. A Vault outage combined with a node failure means pods that were running fine suddenly canât restart. Plan for this with pre-deployment health checks or consider combining CSI with a fallback Kubernetes Secret.
Choosing Your Failure Mode
The decision between ESO and CSI comes down to two questions: Can your application tolerate minutes of staleness? And can your operations tolerate blocked deployments during secret manager outages?
Figure: Pattern decision tree.
The diagram includes init containers as an option for custom fallback logicâuseful when you need behavior that ESO and CSI donât provide out of the box. Our full guide covers implementation patterns including multi-source fallbacks and sidecar refresh.
For most organizations, ESO is the right default. Itâs operationally simpler, GitOps-friendly (ExternalSecrets are declarative resources you commit to version control), and its failure mode keeps services running. Reserve CSI for specific applications with strict compliance requirements or real-time credential needs.
Kubernetes Secrets: ESO vs CSI vs Init Containers
Comparing secret injection patterns and their failure modes when connecting Vault or cloud secret managers.
What you'll get:
- Secret pattern comparison matrix
- ESO configuration starter templates
- CSI failure mode runbooks
- Rotation and audit checklist
The mistake isnât choosing either pattern. Itâs not understanding which failure mode youâve chosen. The team that slept through the 3 AM Vault outage didnât get luckyâthey understood that ESOâs cached secrets would keep their services running. The team that got paged made a valid choice too; for their payment system, blocking on fresh credentials was the right call. Their runbooks reflected it.
Whatever you choose, document it. When the next outage happens, your incident responders shouldnât be learning your secret injection architecture for the first time.
Start with ESO for the majority of workloads, with a 15-minute refresh interval, and monitor sync status with alerts on stale ExternalSecrets. Add CSI for specific high-security applications where staleness is unacceptable. This gives you operational simplicity with escape hatches for edge cases.
Table of Contents
Share this article
Found this helpful? Share it with others who might benefit.
Share this article
Enjoyed the read? Share it with your network.