Kubernetes Overview
Kubernetes (often shortened to K8s) is an open-source container orchestration platform.
It automates deploying, scaling, and managing containerized applications across clusters of machines.
Why Kubernetes?
Section titled “Why Kubernetes?”- Containers alone aren’t enough. A single
docker runworks for one machine, but production needs scheduling across many nodes, self-healing, load balancing, rolling updates, and secret management. - Kubernetes handles that orchestration. You describe the desired state (e.g. “run 3 replicas of this container”) and Kubernetes continuously works to make reality match.
Key Terminology
Section titled “Key Terminology”- Cluster — A set of machines (nodes) managed by Kubernetes.
- Node — A single machine in the cluster (physical or virtual). Runs containers.
- Pod — The smallest deployable unit; one or more containers that share network and storage.
- Service — A stable network endpoint that routes traffic to a set of pods.
- Deployment — Declares the desired state for pods (image, replicas, update strategy). Kubernetes creates and manages the pods.
- Namespace — A virtual partition inside a cluster for isolating resources.
- kubectl — The CLI tool for interacting with a Kubernetes cluster.
Topics in This Section
Section titled “Topics in This Section”Order matches the sidebar: foundations → networking → storage and workloads → delivery → production → security → review pages → examples and EKS.
- Architecture — Control plane, worker nodes, and how the pieces fit together.
- Scheduling and placement — Predicate vs priority scheduling, PriorityClass, and Pending triage vocabulary.
- etcd and control plane health — etcd role, health signals, snapshots, and control-plane certificate rotation.
- Core Objects — Pods, Deployments, Namespaces, Labels, and YAML anatomy.
- Manifests — YAML structure, the four top-level fields, spec vs status, and applying manifests.
- Networking — Services, Ingress, Gateway API vs Ingress, CNI comparison, DNS, network policies, and CoreDNS failure modes.
- Services and endpoints — ClusterIP datapath, EndpointSlices, kube-proxy modes, and debugging.
- Network policies — Default deny, allowlists, metadata egress, and CNI compatibility.
- Sidecar Pattern — When to use sidecars, sidecar vs library trade-offs, and production rollout patterns.
- Ingress Controllers — NGINX, Traefik, and AWS Load Balancer Controller with TLS/mTLS and cert-manager patterns.
- Istio — Service mesh architecture, VirtualService, mTLS, and troubleshooting with istioctl.
- Storage — Volumes, PVCs, StorageClass binding modes, topology, and Secrets.
- Workload Types — Deployments vs StatefulSets vs DaemonSets vs Jobs and CronJob patterns.
- Stateful backup and restore — Snapshots, Velero, restore drills, and RPO/RTO.
- Helm — Package management for Kubernetes.
- Helm Templating — Go template syntax, value injection methods, and where values come from in production.
- Helm vs operators vs GitOps — When to use Helm charts, operators, and Argo CD/Flux, and how they work together.
- Production Platform Checklist — Layered platform checks for ownership, blast radius, delivery guardrails, and drift.
- Production Patterns — Health checks, resource limits, noisy-neighbor mitigation, PDBs, draining, and rolling updates.
- Autoscaling on EKS — HPA, VPA, KEDA, Cluster Autoscaler, and Karpenter; custom metrics via Prometheus or CloudWatch.
- Production Scenarios — Scenario-based practice for production reasoning and mitigation planning.
- Cluster upgrades — Control plane vs nodes, add-on smoke tests, CRD compatibility, and certificate inventory.
- Operators — CRDs, custom controllers, the reconcile loop, and building your own operator.
- Kubectl Reference — Common commands grouped by task.
- Kubeconfig and authentication — Client TLS, OIDC, and EKS
get-tokenpatterns. - RBAC — Roles, bindings, blast radius, and
kubectl auth can-i. - Pod Security Standards — What PSA blocks vs allows, choosing Baseline or Restricted, rollout, exemptions, and Restricted-friendly manifests.
- Admission controllers — Mutating/validating webhooks, failurePolicy, and policy engines.
- Multi-tenancy and policy — Namespaces, quotas, shared ingress fairness, and audit logging.
- Multi-cluster management — Cluster API, Karmada, and GitOps fleet patterns across clusters.
- Troubleshooting and Debugging — A practical production triage flow across workloads, cluster signals, and networking.
- Kubernetes architecture review questions — High-signal prompts for architecture reviews, hiring calibration, and production readiness checks.
- Kubernetes architecture review answers — Concise reference answers for the discussion prompts by domain, with links to deeper guides.
- Examples — File-oriented layouts (base workloads, Istio, kubectl vs Argo CD).
- EKS (AWS) — Amazon EKS overview and a production-oriented cluster with Terraform (VPC, private API, node groups, add-ons). Uses the AWS networking and VPC connectivity guides for prerequisites.
- Migrating workloads from EC2 to EKS — Phased cutover, DNS and traffic shift, data, and rollback framing.
- Prometheus Adapter for HPA on EKS — Custom metrics from PromQL for HPA.
- Container Insights for HPA on EKS — CloudWatch external metrics for HPA.
- EKS troubleshooting cheat sheet — Symptom-driven EKS debugging runbook for networking, autoscaling, and on-call triage.