obust Kubernetes storage requires selecting the correct pattern based on data lifecycle requirements and hardening the configuration against production failure modes. Below are the four primary patterns with implementation details.
Pattern 1: Ephemeral & Caching
Use for temporary data, scratch space, or sidecar communication. No persistence across pod restarts.
Implementation:
apiVersion: v1
kind: Pod
metadata:
name: cache-pod
spec:
containers:
- name: app
image: nginx
volumeMounts:
- name: scratch
mountPath: /tmp/cache
volumes:
- name: scratch
emptyDir:
medium: Memory
sizeLimit: 512Mi
Rationale: medium: Memory utilizes tmpfs for ultra-low latency. sizeLimit prevents OOM kills by capping usage. Ideal for Redis ephemeral caches or build artifacts.
Pattern 2: Dedicated Stateful (RWO)
The standard for databases, queues, and single-writer workloads. Ensures strong consistency by limiting access to one node.
Implementation:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium-rwo
provisioner: ebs.csi.aws.com
parameters:
type: gp3
encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowedTopologies:
- matchLabelExpressions:
- key: topology.ebs.csi.aws.com/zone
values:
- us-east-1a
- us-east-1b
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: db-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: premium-rwo
resources:
requests:
storage: 100Gi
Rationale:
WaitForFirstConsumer: Delays volume provisioning until the pod is scheduled, ensuring the PV is created in the same zone as the pod.
ReclaimPolicy: Retain: Prevents accidental data deletion when the PVC is removed. Manual intervention is required to delete the PV and underlying volume.
allowedTopologies: Restricts provisioning to specific zones, avoiding cross-zone attachment failures.
Pattern 3: Shared Read-Only
Use for static assets, configuration injection, or code bases shared across replicas.
Implementation:
Leverage ConfigMap, Secret, or PersistentVolumeClaim with ReadOnlyMany.
volumes:
- name: static-assets
persistentVolumeClaim:
claimName: asset-pvc
readOnly: true
Rationale: Read-only mounts prevent accidental modification and allow safe sharing across pods on the same or different nodes without locking overhead.
Pattern 4: Distributed High Availability
For workloads requiring shared write access with strong consistency or high throughput across many pods. Requires a distributed CSI driver (e.g., Ceph Rook, Portworx, OpenEBS).
Implementation:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: distributed-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: cephfs
resources:
requests:
storage: 200Gi
Rationale: Distributed file systems provide POSIX compliance and high availability. However, application logic must handle file locking if multiple writers modify the same files. Use only when the storage driver supports cluster-wide locking or the application handles concurrency.
Architecture Decisions
- CSI Driver Selection: Evaluate drivers based on feature support. Not all drivers support
VolumeSnapshots, RWX, or ReadWriteOncePod. Verify the driver capabilities matrix before selection.
- Topology Management: In multi-zone clusters, always use
WaitForFirstConsumer. Static provisioning is error-prone and should be avoided unless required for legacy data migration.
- Security Context: Always define
fsGroup in the pod security context to ensure correct file ownership inside the container, especially for non-root users.
- Snapshot Strategy: Integrate
VolumeSnapshotClass for all stateful workloads. Snapshots provide faster recovery than backups and are essential for point-in-time recovery.
Pitfall Guide
1. Ignoring volumeBindingMode
Mistake: Using Immediate binding in multi-zone clusters.
Impact: The PVC binds to a volume in Zone A, but the scheduler places the pod in Zone B. The volume cannot be attached across zones, causing FailedScheduling.
Fix: Always set volumeBindingMode: WaitForFirstConsumer for zone-aware storage classes.
2. RWX for Databases
Mistake: Mounting an RWX volume to multiple database pods for "high availability."
Impact: Database engines expect exclusive block access. Concurrent writes from multiple nodes bypass lock managers, leading to immediate data corruption and split-brain.
Fix: Use RWO for primary databases. For read replicas, use replication mechanisms, not shared storage.
3. The Retain Policy Trap
Mistake: Deleting a PVC assuming data is gone, or deleting a PVC and losing data unexpectedly.
Impact: With Retain, the PV remains bound to the deleted PVC, orphaning the data. The storage provider may continue charging for the volume. Conversely, teams expecting data deletion may find the volume still exists.
Fix: Document reclaim policies. Implement automated cleanup scripts for Retain PVs, or use Delete only for ephemeral state.
4. Neglecting fsGroup and Permissions
Mistake: Assuming the volume mounts with correct permissions for the container user.
Impact: Containers running as non-root users receive Permission denied errors when accessing mounted paths.
Fix: Set securityContext.fsGroup in the pod spec. The CSI driver will chown the volume root to the specified GID.
5. Missing Snapshot Classes
Mistake: Deploying stateful workloads without a configured VolumeSnapshotClass.
Impact: Inability to perform fast backups or rollbacks. Recovery relies on slow external backups, increasing RTO.
Fix: Deploy VolumeSnapshotClass resources and validate snapshot support with the CSI driver. Automate snapshot creation via CronJobs or operators.
6. Overlooking ReadWriteOncePod
Mistake: Using standard RWO when strict single-pod access is required.
Impact: Standard RWO allows the volume to be mounted by multiple pods on the same node. If a malicious or buggy pod shares the node, it can access the volume.
Fix: Use accessModes: ["ReadWriteOncePod"] for sensitive workloads to enforce one-pod-per-volume semantics.
7. CSI Sidecar Resource Limits
Mistake: Not setting resource requests/limits on CSI controller sidecars.
Impact: CSI sidecars (e.g., snapshot-controller) may be evicted or throttled during node pressure, causing volume operations to hang or fail.
Fix: Configure resource limits for all CSI controller pods. Monitor CSI metrics for latency and errors.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Relational Database | RWO Block + Snapshot | Strong consistency; fast point-in-time recovery. | High (IOPS provisioning) |
| Static Web Assets | RWX File or ConfigMap | Shared read access; low latency not critical. | Low |
| Build Cache | EmptyDir (Memory) | Ultra-low latency; ephemeral; no persistence needed. | Low (Memory usage) |
| ML Training Jobs | RWX High-Perf (e.g., Lustre) | Parallel read/write across many pods. | Medium-High |
| Message Queue | RWO + Replication | Single writer per partition; replication for HA. | Medium |
| Log Aggregation | Ephemeral + Centralized | Pods write to local disk; sidecar ships logs. | Low |
Configuration Template
Production-Ready StorageClass and PVC Template:
# StorageClass with topology and reclaim policy
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: prod-rwo-gp3
provisioner: ebs.csi.aws.com
parameters:
type: gp3
encrypted: "true"
iopsPerGB: "50"
throughput: "125"
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
- key: topology.ebs.csi.aws.com/zone
values:
- ${CLUSTER_ZONE_A}
- ${CLUSTER_ZONE_B}
---
# SnapshotClass for backup
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: prod-snapshot-class
driver: ebs.csi.aws.com
deletionPolicy: Retain
---
# PVC using the class
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data-pvc
labels:
app: my-app
env: production
spec:
accessModes:
- ReadWriteOncePod
storageClassName: prod-rwo-gp3
resources:
requests:
storage: 50Gi
Quick Start Guide
- Verify CSI Driver: Run
kubectl get csidrivers to ensure your storage provider's CSI driver is installed and ready.
- Apply StorageClass: Save the template above, replace placeholders, and apply:
kubectl apply -f storage-class.yaml.
- Create PVC: Create a PVC manifest referencing the StorageClass and apply it. Verify status is
Bound using kubectl get pvc.
- Mount in Pod: Add the volume definition to your pod spec and mount it. Deploy the pod.
- Validate: Exec into the pod and write a test file. Delete the pod, recreate it, and verify the file persists. Test snapshot creation if configured.