Skip to content

Stateful Backup and Restore

First PublishedByAtif Alam

Stateful workloads need more than a Deployment rollback — you must protect data, configuration, and often ordering semantics (StatefulSet identity).

VolumeSnapshot objects capture a point-in-time copy of a CSI-backed PVC. Good for:

  • RPO in minutes when snapshots are frequent and consistent enough for the app.
  • Fast clone for restore tests.

Check your storage class for snapshot support and whether snapshots are crash-consistent only.

Velero backs up Kubernetes objects (and optionally PV data via CSI snapshots or restic) to object storage.

Typical use:

  • Namespace- or label-scoped backup schedules.
  • Cluster migration — restore resources into a new cluster with adjusted storage classes.

For databases, crash-consistent snapshots may not be enough under write load.

Patterns:

  • Logical dumps (pg_dump, mysqldump) on a schedule to object storage — portable, slower restore.
  • Quiesce hooks — pause writers or flush buffers before snapshot (operator-specific).
  • File system freeze — only when coordinated with storage (enterprise arrays); rarely DIY in Kubernetes without vendor support.

Document RPO/RTO per workload class (for example “15 min RPO for Postgres via WAL + nightly snapshot”).

Quarterly at minimum:

  1. Restore to an isolated namespace or test cluster.
  2. Verify data checksums or row counts and application smoke tests.
  3. Time the procedure — compare to RTO target.