Etcd and Control Plane Health
etcd is the Kubernetes control plane’s source of truth for API objects. The API server is the only component that talks to etcd directly for normal reads/writes; controllers and the scheduler watch the API.
Symptoms of etcd impairment
Section titled “Symptoms of etcd impairment”- API latency spikes and timeouts on writes or large list/watch operations.
- Stale watches — controllers stop reconciling promptly;
kubectlmay show surprising delays. - Leader election flaps for controller-manager and scheduler if the API path to etcd is unstable.
- In the worst case, loss of quorum — the API becomes read-only or unavailable depending on failure mode.
Backups and quorum
Section titled “Backups and quorum”- Take consistent snapshots on a schedule supported by your distro (often snapshot API + defrag policy from vendor docs).
- Quorum loss is an emergency — restore from backup only with a tested runbook; never “guess” at etcd data files.
Raft, Consensus, and What Split-Brain Means
Section titled “Raft, Consensus, and What Split-Brain Means”Kubernetes stores API objects in etcd, which implements Raft consensus across (typically) three or five members:
- Leader election — one member accepts writes; followers replicate the log. If the leader fails, followers elect a new leader after a timeout.
- Quorum — a majority of members must agree for a write to commit. With five members, three failures still lose quorum; odd counts avoid ties in “who is the majority?”
- Split-brain (informal) — in generic distributed systems, people mean two partitions each thinking they are authoritative. etcd avoids two live writers for the same cluster: without a majority, the minority partition becomes read-only or unavailable for writes rather than accepting divergent state.
- Real-world danger is not “magic split-brain” but operations mistakes: restoring an old backup into a live cluster, forking membership, or losing quorum during unplanned net partitions — follow vendor restore procedures exactly.
On managed clusters (for example EKS), you do not operate etcd members, but API latency, watch storms, and large objects still stress the same control-plane path. For in-cluster API metrics and watch churn, see EKS troubleshooting cheat sheet — Symptom 8: API Server / Control Plane Slow.
Proving control-plane components are healthy
Section titled “Proving control-plane components are healthy”| Component | Practical signal |
|---|---|
| API server | kubectl get --raw /readyz?verbose and /livez; successful CRUD on a dummy ConfigMap |
| etcd | Managed: cloud health; self-managed: member list, metrics (etcd_server_has_leader, disk fsync latency) |
| Scheduler | Pending pods get Scheduled events; scheduler logs without leader errors |
| Controller manager | Replica counts match Deployments; Node lifecycle works; logs show stable leadership |
Control-plane certificate rotation
Section titled “Control-plane certificate rotation”Always follow your vendor runbook (kubeadm, OpenShift, EKS control plane, etc.). General pattern:
- Backup etcd (if you own it) and export a known-good kubeconfig.
- Rotate apiserver → kubelet and kubelet client certs in the documented order for your stack.
- Restart control plane static pods or systemd units as required; watch node connectivity.
- Re-distribute kubeconfigs to admins if client CA or front-proxy certs changed.
For kubeconfig TLS troubleshooting on the client side, see Kubeconfig and authentication.
Related
Section titled “Related”- Architecture — Component diagram and baseline request path.
- Scheduling and placement — Scheduler health signals alongside etcd.
- Cluster upgrades — Upgrade order and platform certificate inventory.
- Architecture review answers — Prompts this page deepens.