Autoscaling on EKS
Production autoscaling on Amazon EKS spans three orthogonal layers. Each layer answers a different question; you usually need more than one.
HPA / VPA / KEDA → pod count and sizeCAS / Karpenter → node count and typeEC2 / Fargate → underlying compute supplyPod autoscaling without node autoscaling still leaves pods Pending when the cluster is full.
This guide is EKS-leaning but uses portable Kubernetes APIs where possible. For symptom-driven commands during incidents, see EKS troubleshooting cheat sheet — Autoscaling.
Hands-on labs in Kubernetes Examples cover pod-level HPA on k3s only — not Cluster Autoscaler or Karpenter (single-node labs have no node pool to grow). Node autoscaling sections below target EKS production clusters.
End-to-end flow
Section titled “End-to-end flow”flowchart TB traffic[Traffic or queue depth rises] hpa[HPA increases desired replicas] sched[Scheduler binds or leaves pods Pending] nodes[CAS or Karpenter adds nodes] ready[Pods Running and serving] traffic --> hpa --> sched sched -->|capacity available| ready sched -->|no fitting nodes| nodes --> ready
Metrics prerequisites
Section titled “Metrics prerequisites”HPA needs a metrics pipeline. Pick the path that matches your signal.
| Signal | Typical source | EKS / lab guide |
|---|---|---|
| CPU / memory utilization | metrics-server (metrics.k8s.io) | EKS add-on; lab: HPA on k3s |
| PromQL / app metrics | Prometheus + prometheus-adapter (custom.metrics.k8s.io) | Prometheus Adapter for HPA on EKS; lab: Prometheus Adapter HPA on k3s |
| CloudWatch metrics | Container Insights + CloudWatch metrics adapter (external.metrics.k8s.io) | Container Insights for HPA on EKS |
metrics-server on EKS
Section titled “metrics-server on EKS”Install or verify the Metrics Server EKS add-on (or equivalent). Confirm the API is healthy:
kubectl get apiservice v1beta1.metrics.k8s.iokubectl top nodeskubectl top pods -APods need CPU and memory requests set for resource utilization metrics. Without requests, HPA often shows current: <unknown>.
Custom metrics: Prometheus vs CloudWatch
Section titled “Custom metrics: Prometheus vs CloudWatch”| Prometheus Adapter | CloudWatch / Container Insights | |
|---|---|---|
| Setup | Prometheus (in-cluster or AMP) + adapter Helm chart | Observability EKS add-on + CloudWatch metrics adapter |
| Query model | PromQL rules exposed as Kubernetes metrics | CloudWatch metric names / dimensions |
| Cost | Infra for Prometheus; no per-metric API pricing like CloudWatch custom metrics | CloudWatch ingestion and custom metric charges — see AWS cost management |
| Best fit | Kubernetes-native apps, rich app metrics, multi-cloud portability | AWS-centric ops, ALB/SQS/RDS signals alongside cluster metrics |
Background on CloudWatch: AWS Monitoring.
Horizontal Pod Autoscaler (HPA)
Section titled “Horizontal Pod Autoscaler (HPA)”Horizontal Pod Autoscaler changes replica count on a workload (usually a Deployment) based on metrics.
Minimal CPU HPA
Section titled “Minimal CPU HPA”apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: my-app-hpaspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70Requires metrics-server. Set requests on containers. See also Production patterns — HPA.
Metric types (autoscaling/v2)
Section titled “Metric types (autoscaling/v2)”type | Use for |
|---|---|
Resource | CPU or memory vs requests |
Pods | Average per-pod custom metric |
Object | Metric for another object (e.g. Ingress) |
External | Cluster-external signal (CloudWatch, queue length via adapter) |
Install steps for non-resource metrics live in the dedicated guides linked above — not duplicated here.
Scale behavior (anti-thrash)
Section titled “Scale behavior (anti-thrash)”Tune spec.behavior when replicas oscillate under bursty load:
behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Pods value: 2 periodSeconds: 60HPA troubleshooting
Section titled “HPA troubleshooting”Symptom in kubectl describe hpa | Likely cause | Fix |
|---|---|---|
unable to get metrics | metrics-server or adapter down | Fix APIService; check adapter pods and RBAC |
current: <unknown> | Missing requests | Set resources.requests on containers |
| Desired equals current under load | Target too high or wrong metric | Lower target or fix metric name in adapter rules |
| Rapid scale up/down | No stabilization | Add spec.behavior windows |
| Replicas rise, pods Pending | No node capacity | Cluster Autoscaler / Karpenter; see below |
NetworkPolicies can block metrics paths while workloads still run — see Network policies and the HPA on k3s troubleshooting table.
kubectl get hpa -Akubectl describe hpa <name> -n <namespace>KEDA (event-driven scaling)
Section titled “KEDA (event-driven scaling)”KEDA scales workloads from external event sources (SQS, Kafka, cron, Prometheus queries, etc.) via ScaledObject (or ScaledJob). For queue depth and schedules, KEDA is often simpler than wiring a custom metrics adapter yourself.
apiVersion: keda.sh/v1alpha1kind: ScaledObjectmetadata: name: my-app-scalerspec: scaleTargetRef: name: my-app minReplicaCount: 1 maxReplicaCount: 20 triggers: - type: aws-sqs-queue metadata: queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/my-queue awsRegion: us-east-1 queueLength: "5"KEDA can drive the same Deployment HPA would scale, or manage replicas directly depending on configuration. Full install walkthroughs are out of scope here; see KEDA documentation.
Vertical Pod Autoscaler (VPA)
Section titled “Vertical Pod Autoscaler (VPA)”VPA adjusts CPU/memory requests (and sometimes limits) per container — often by evicting and recreating pods in Auto mode.
| Mode | Behavior |
|---|---|
Off | Recommendations only |
Initial | Apply recommendations on pod creation |
Auto | Evict/recreate pods to apply new resource values |
Do not run HPA and VPA on the same signal (for example both on CPU).
| Combination | Guidance |
|---|---|
| HPA on CPU + VPA on CPU | Avoid — fighting controllers |
| HPA on custom metric + VPA on CPU | Common pattern — VPA right-sizes; HPA scales count |
HPA on CPU + VPA Off | Use VPA recommendations in CI or quarterly right-sizing |
VPA is not a managed EKS add-on in all setups; treat install and upgrades as platform ownership. No dedicated install page in this library yet.
Cluster Autoscaler on EKS
Section titled “Cluster Autoscaler on EKS”Cluster Autoscaler (CAS) adjusts Auto Scaling group or managed node group sizes when pods cannot schedule or nodes are underutilized.
| Cluster Autoscaler | Karpenter | |
|---|---|---|
| Provisioning unit | ASG / node group | Direct EC2 launches |
| Instance selection | Pre-defined by group | Per-pod fit |
| Scale-up speed | Usually slower | Usually faster |
| Best fit | Regulated, fixed instance families | Dynamic, cost-optimized pools |
Requirements: IAM permissions, correct ASG tags (k8s.io/cluster-autoscaler/enabled, cluster name tag), min/max sizes on the group, and schedulable pod backlog.
Scale-down blockers: PDBs, safe-to-evict: "false", local-storage pods, DaemonSets without tolerance for empty nodes.
kubectl logs -n kube-system deployment/cluster-autoscaler --tail=200kubectl get configmap cluster-autoscaler-status -n kube-system -o yamlKarpenter on EKS
Section titled “Karpenter on EKS”Karpenter provisions nodes directly against NodePools and EC2NodeClasses (CRDs), often reacting faster than CAS and choosing instance types per pending pod.
Mental model:
- NodePool — constraints (arch, capacity type, limits).
- EC2NodeClass — subnets, security groups, AMI, IAM instance profile.
- NodeClaim — one provisioned node instance.
If pods stay Pending and Karpenter is silent: check NodePool requirements, pod nodeSelector / affinity / taints, subnet IPs, EC2 quotas, and NodeClass IAM.
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=200kubectl get nodepool,ec2nodeclass,nodeclaimFor infrastructure-as-code context, see Create an EKS cluster with Terraform — Karpenter may be added separately from that baseline.
Fargate and autoscaling
Section titled “Fargate and autoscaling”AWS Fargate profiles run pods without EC2 worker nodes you manage. HPA still applies to Fargate-backed Deployments. Cluster Autoscaler and Karpenter do not add Fargate capacity — limits come from profile selectors, namespace configuration, and account/region quotas.
Plan Fargate capacity explicitly; pair HPA with realistic maxReplicas and quota headroom.
Designing a complete stack
Section titled “Designing a complete stack”| Workload | Pod layer | Node layer |
|---|---|---|
| Stateless HTTP API | HPA on CPU or RPS (Prometheus/ALB metric) | CAS or Karpenter |
| Queue consumer | KEDA on queue depth | Karpenter for burst shapes |
| Batch / cron | KEDA cron or fixed Job parallelism | Optional burst pool |
| Steady low traffic | minReplicas ≥ 2, conservative max | Smaller default node group |
Capacity planning still matters: load-test before launch, align with SLOs, and define who approves spend when maxReplicas or node pools grow. See Production patterns — Capacity planning.
PDBs protect availability during scale-down and node drains — define them before aggressive autoscaling. See Production patterns — PDB.
If HPA increases replicas but pods stay Pending, work through Production scenarios — Scenario 2.
Operations checklist
Section titled “Operations checklist”| Check | Why |
|---|---|
| metrics-server (or metric adapters) healthy | HPA has signals |
| Container requests set | Resource metrics work |
HPA minReplicas / maxReplicas reviewed | Cost and availability bounds |
spec.behavior tuned for bursty apps | Reduces thrash |
| Node autoscaler installed and within ASG/node limits | Pending pods get capacity |
PDBs and safe-to-evict annotations | Clean scale-down and drains |
| Alerts on pending pods, HPA at max, adapter errors | Early warning |
See also
Section titled “See also”- HPA on k3s — CPU HPA lab
- Prometheus Adapter for HPA on EKS
- Container Insights for HPA on EKS
- Production patterns — Probes, requests, limits, PDBs
- Scheduling and placement — Why pods stay Pending
- EKS troubleshooting cheat sheet
- Saturation and monitoring frameworks
- Service readiness checklist