Stateful Backup and Restore
Stateful workloads need more than a Deployment rollback — you must protect data, configuration, and often ordering semantics (StatefulSet identity).
Volume snapshots (CSI)
Section titled “Volume snapshots (CSI)”VolumeSnapshot objects capture a point-in-time copy of a CSI-backed PVC. Good for:
- RPO in minutes when snapshots are frequent and consistent enough for the app.
- Fast clone for restore tests.
Check your storage class for snapshot support and whether snapshots are crash-consistent only.
Velero
Section titled “Velero”Velero backs up Kubernetes objects (and optionally PV data via CSI snapshots or restic) to object storage.
Typical use:
- Namespace- or label-scoped backup schedules.
- Cluster migration — restore resources into a new cluster with adjusted storage classes.
Application-consistent backups
Section titled “Application-consistent backups”For databases, crash-consistent snapshots may not be enough under write load.
Patterns:
- Logical dumps (pg_dump, mysqldump) on a schedule to object storage — portable, slower restore.
- Quiesce hooks — pause writers or flush buffers before snapshot (operator-specific).
- File system freeze — only when coordinated with storage (enterprise arrays); rarely DIY in Kubernetes without vendor support.
Document RPO/RTO per workload class (for example “15 min RPO for Postgres via WAL + nightly snapshot”).
Restore drills
Section titled “Restore drills”Quarterly at minimum:
- Restore to an isolated namespace or test cluster.
- Verify data checksums or row counts and application smoke tests.
- Time the procedure — compare to RTO target.
Related
Section titled “Related”- Storage — PVCs, StorageClasses, and binding behavior.
- Workload types — StatefulSet identity and volumes.
- Operators — Operator-driven backup examples.
- Architecture review answers — Prompts this page deepens.