Databases Overview

First PublishedApr 29, 2026ByAtif Alam

This section is the cloud-agnostic home for database reliability and data on-call patterns. It complements the cloud-specific pages — AWS databases and Azure databases — which cover provisioning and vendor features. Pages here cover what breaks at 3 AM and how to recover.

The focus is operations for relational stores (PostgreSQL, MySQL, SQL Server, Oracle, and managed equivalents). NoSQL is touched only where the on-call patterns differ.

Why a Dedicated Section

Most cloud DB documentation answers “how do I provision an instance?” — but the questions that matter on-call are different:

The pool is exhausted; what is the right knob to turn?
A read replica is 30 seconds behind; do users see stale data, or worse?
A failover happened during a deploy; is the new primary the right one?
A schema migration locked a hot table; can we abort safely?
We need to restore last night’s backup; how long will it take, and have we ever tested this?

These are vendor-agnostic patterns that show up on RDS, Cloud SQL, Aurora, on-prem Postgres, and Azure SQL alike.

Topics in This Section

RDBMS Reliability and On-Call — Connection pools, replication lag, failover and split-brain awareness, backup and restore drills, schema migration risk, and observability for the database boundary.

More pages may be added over time (NoSQL on-call, caching, queueing) — but the entry point is the RDBMS page above.

Topic	Where to Go
Cloud DB provisioning (AWS)	AWS databases
Cloud DB provisioning (Azure)	Azure databases
Metrics, dashboards, alerts	Observability, Alerting
Reliability targets	SLOs, SLIs, error budgets
Service readiness gates	Service readiness checklist
Stateful workloads on Kubernetes	Stateful backup and restore

Databases Overview

Why a Dedicated Section

Topics in This Section

Related Sections