Databases Overview
This section is the cloud-agnostic home for database reliability and data on-call patterns. It complements the cloud-specific pages — AWS databases and Azure databases — which cover provisioning and vendor features. Pages here cover what breaks at 3 AM and how to recover.
The focus is operations for relational stores (PostgreSQL, MySQL, SQL Server, Oracle, and managed equivalents). NoSQL is touched only where the on-call patterns differ.
Why a Dedicated Section
Section titled “Why a Dedicated Section”Most cloud DB documentation answers “how do I provision an instance?” — but the questions that matter on-call are different:
- The pool is exhausted; what is the right knob to turn?
- A read replica is 30 seconds behind; do users see stale data, or worse?
- A failover happened during a deploy; is the new primary the right one?
- A schema migration locked a hot table; can we abort safely?
- We need to restore last night’s backup; how long will it take, and have we ever tested this?
These are vendor-agnostic patterns that show up on RDS, Cloud SQL, Aurora, on-prem Postgres, and Azure SQL alike.
Topics in This Section
Section titled “Topics in This Section”- RDBMS Reliability and On-Call — Connection pools, replication lag, failover and split-brain awareness, backup and restore drills, schema migration risk, and observability for the database boundary.
More pages may be added over time (NoSQL on-call, caching, queueing) — but the entry point is the RDBMS page above.
Related Sections
Section titled “Related Sections”| Topic | Where to Go |
|---|---|
| Cloud DB provisioning (AWS) | AWS databases |
| Cloud DB provisioning (Azure) | Azure databases |
| Metrics, dashboards, alerts | Observability, Alerting |
| Reliability targets | SLOs, SLIs, error budgets |
| Service readiness gates | Service readiness checklist |
| Stateful workloads on Kubernetes | Stateful backup and restore |