Production Platform Checklist
This page helps platform, SRE, and production teams evaluate whether Kubernetes architecture and delivery practices are clear, safe, and maintainable.
Layered platform model
Section titled “Layered platform model”Think in layers with explicit boundaries:
- Foundation: cloud/network/account/project boundaries, cluster lifecycle, node pools.
- Shared platform services: ingress, cert management, secrets, observability, policy.
- Workloads: namespaces, deployments/stateful sets, runtime configuration, storage.
- Delivery and reconciliation: CI pipelines, Helm/Kustomize/raw YAML, GitOps sync.
The exact tools vary, but unclear boundaries almost always create ownership gaps and slow incidents.
Practitioner checklist
Section titled “Practitioner checklist”- Scope is documented (clusters, environments, regions, tenants/accounts/projects).
- Ownership is explicit for cluster, add-ons, namespace policies, and workload teams.
- Layer boundaries are documented and visible in repos/runbooks.
- Tool choice is intentional per layer (Helm, operators, raw YAML/Kustomize, GitOps controller).
- Blast radius controls exist (namespace isolation, rollout strategy, PDBs, quotas/policies).
- Change path is clear (review process, promotion order, rollback triggers, approvals if required).
- Incident paths are practical (alerts, dashboards, logs/traces, runbooks, escalation route).
- Drift detection exists between desired and actual state (especially with GitOps).
- Teams track outcome metrics such as deploy failure rate, rollback time, and incident recurrence.
Tooling decision hints
Section titled “Tooling decision hints”- Use Helm for reusable packaging and release parameterization.
- Use operators when systems need continuous Day 2 automation logic.
- Use GitOps when you want Git as the source of truth and continuous reconciliation.
- Combine them when needed: package with Helm, reconcile with GitOps, automate lifecycle with operators.
See Helm, operators, and GitOps for deeper examples.
Further reading
Section titled “Further reading”- Kubernetes Architecture
- Helm, operators, and GitOps
- GitOps
- Production Patterns
- Deployment Strategies
- EKS Terraform Cluster
- Service readiness checklist
- QA and reliability guide
Optional depth: