Autoscaling on EKS

First PublishedMay 18, 2026ByAtif Alam

Production autoscaling on Amazon EKS spans three orthogonal layers. Each layer answers a different question; you usually need more than one.

1
HPA / VPA / KEDA       → pod count and size
2
CAS / Karpenter        → node count and type
3
EC2 / Fargate          → underlying compute supply

Pod autoscaling without node autoscaling still leaves pods Pending when the cluster is full.

This guide is EKS-leaning but uses portable Kubernetes APIs where possible. For symptom-driven commands during incidents, see EKS troubleshooting cheat sheet — Autoscaling.

Hands-on labs in Kubernetes Examples cover pod-level HPA on k3s only — not Cluster Autoscaler or Karpenter (single-node labs have no node pool to grow). Node autoscaling sections below target EKS production clusters.

End-to-end flow

flowchart TB
  traffic[Traffic or queue depth rises]
  hpa[HPA increases desired replicas]
  sched[Scheduler binds or leaves pods Pending]
  nodes[CAS or Karpenter adds nodes]
  ready[Pods Running and serving]
  traffic --> hpa --> sched
  sched -->|capacity available| ready
  sched -->|no fitting nodes| nodes --> ready

Metrics prerequisites

HPA needs a metrics pipeline. Pick the path that matches your signal.

Signal	Typical source	EKS / lab guide
CPU / memory utilization	metrics-server (`metrics.k8s.io`)	EKS add-on; lab: HPA on k3s
PromQL / app metrics	Prometheus + prometheus-adapter (`custom.metrics.k8s.io`)	Prometheus Adapter for HPA on EKS; lab: Prometheus Adapter HPA on k3s
CloudWatch metrics	Container Insights + CloudWatch metrics adapter (`external.metrics.k8s.io`)	Container Insights for HPA on EKS

metrics-server on EKS

Install or verify the Metrics Server EKS add-on (or equivalent). Confirm the API is healthy:

1
kubectl get apiservice v1beta1.metrics.k8s.io
2
kubectl top nodes
3
kubectl top pods -A

Pods need CPU and memory requests set for resource utilization metrics. Without requests, HPA often shows current: <unknown>.

Custom metrics: Prometheus vs CloudWatch

	Prometheus Adapter	CloudWatch / Container Insights
Setup	Prometheus (in-cluster or AMP) + adapter Helm chart	Observability EKS add-on + CloudWatch metrics adapter
Query model	PromQL rules exposed as Kubernetes metrics	CloudWatch metric names / dimensions
Cost	Infra for Prometheus; no per-metric API pricing like CloudWatch custom metrics	CloudWatch ingestion and custom metric charges — see AWS cost management
Best fit	Kubernetes-native apps, rich app metrics, multi-cloud portability	AWS-centric ops, ALB/SQS/RDS signals alongside cluster metrics

Background on CloudWatch: AWS Monitoring.

Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler changes replica count on a workload (usually a Deployment) based on metrics.

Minimal CPU HPA

1
apiVersion: autoscaling/v2
2
kind: HorizontalPodAutoscaler
3
metadata:
4
  name: my-app-hpa
5
spec:
6
  scaleTargetRef:
7
    apiVersion: apps/v1
8
    kind: Deployment
9
    name: my-app
10
  minReplicas: 2
11
  maxReplicas: 10
12
  metrics:
13
    - type: Resource
14
      resource:
15
        name: cpu
16
        target:
17
          type: Utilization
18
          averageUtilization: 70

Requires metrics-server. Set requests on containers. See also Production patterns — HPA.

Metric types (`autoscaling/v2`)

`type`	Use for
`Resource`	CPU or memory vs requests
`Pods`	Average per-pod custom metric
`Object`	Metric for another object (e.g. Ingress)
`External`	Cluster-external signal (CloudWatch, queue length via adapter)

Install steps for non-resource metrics live in the dedicated guides linked above — not duplicated here.

Scale behavior (anti-thrash)

Tune spec.behavior when replicas oscillate under bursty load:

1
behavior:
2
  scaleDown:
3
    stabilizationWindowSeconds: 300
4
    policies:
5
      - type: Percent
6
        value: 50
7
        periodSeconds: 60
8
  scaleUp:
9
    stabilizationWindowSeconds: 0
10
    policies:
11
      - type: Pods
12
        value: 2
13
        periodSeconds: 60

HPA troubleshooting

Symptom in `kubectl describe hpa`	Likely cause	Fix
`unable to get metrics`	metrics-server or adapter down	Fix APIService; check adapter pods and RBAC
`current: <unknown>`	Missing requests	Set `resources.requests` on containers
Desired equals current under load	Target too high or wrong metric	Lower target or fix metric name in adapter rules
Rapid scale up/down	No stabilization	Add `spec.behavior` windows
Replicas rise, pods Pending	No node capacity	Cluster Autoscaler / Karpenter; see below

NetworkPolicies can block metrics paths while workloads still run — see Network policies and the HPA on k3s troubleshooting table.

1
kubectl get hpa -A
2
kubectl describe hpa <name> -n <namespace>

KEDA (event-driven scaling)

KEDA scales workloads from external event sources (SQS, Kafka, cron, Prometheus queries, etc.) via ScaledObject (or ScaledJob). For queue depth and schedules, KEDA is often simpler than wiring a custom metrics adapter yourself.

1
apiVersion: keda.sh/v1alpha1
2
kind: ScaledObject
3
metadata:
4
  name: my-app-scaler
5
spec:
6
  scaleTargetRef:
7
    name: my-app
8
  minReplicaCount: 1
9
  maxReplicaCount: 20
10
  triggers:
11
    - type: aws-sqs-queue
12
      metadata:
13
        queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/my-queue
14
        awsRegion: us-east-1
15
        queueLength: "5"

KEDA can drive the same Deployment HPA would scale, or manage replicas directly depending on configuration. Full install walkthroughs are out of scope here; see KEDA documentation.

Vertical Pod Autoscaler (VPA)

VPA adjusts CPU/memory requests (and sometimes limits) per container — often by evicting and recreating pods in Auto mode.

Mode	Behavior
`Off`	Recommendations only
`Initial`	Apply recommendations on pod creation
`Auto`	Evict/recreate pods to apply new resource values

Do not run HPA and VPA on the same signal (for example both on CPU).

Combination	Guidance
HPA on CPU + VPA on CPU	Avoid — fighting controllers
HPA on custom metric + VPA on CPU	Common pattern — VPA right-sizes; HPA scales count
HPA on CPU + VPA `Off`	Use VPA recommendations in CI or quarterly right-sizing

VPA is not a managed EKS add-on in all setups; treat install and upgrades as platform ownership. No dedicated install page in this library yet.

Cluster Autoscaler on EKS

Cluster Autoscaler (CAS) adjusts Auto Scaling group or managed node group sizes when pods cannot schedule or nodes are underutilized.

	Cluster Autoscaler	Karpenter
Provisioning unit	ASG / node group	Direct EC2 launches
Instance selection	Pre-defined by group	Per-pod fit
Scale-up speed	Usually slower	Usually faster
Best fit	Regulated, fixed instance families	Dynamic, cost-optimized pools

Requirements: IAM permissions, correct ASG tags (k8s.io/cluster-autoscaler/enabled, cluster name tag), min/max sizes on the group, and schedulable pod backlog.

Scale-down blockers: PDBs, safe-to-evict: "false", local-storage pods, DaemonSets without tolerance for empty nodes.

1
kubectl logs -n kube-system deployment/cluster-autoscaler --tail=200
2
kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml

Karpenter on EKS

Karpenter provisions nodes directly against NodePools and EC2NodeClasses (CRDs), often reacting faster than CAS and choosing instance types per pending pod.

Mental model:

NodePool — constraints (arch, capacity type, limits).
EC2NodeClass — subnets, security groups, AMI, IAM instance profile.
NodeClaim — one provisioned node instance.

If pods stay Pending and Karpenter is silent: check NodePool requirements, pod nodeSelector / affinity / taints, subnet IPs, EC2 quotas, and NodeClass IAM.

1
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=200
2
kubectl get nodepool,ec2nodeclass,nodeclaim

For infrastructure-as-code context, see Create an EKS cluster with Terraform — Karpenter may be added separately from that baseline.

Fargate and autoscaling

AWS Fargate profiles run pods without EC2 worker nodes you manage. HPA still applies to Fargate-backed Deployments. Cluster Autoscaler and Karpenter do not add Fargate capacity — limits come from profile selectors, namespace configuration, and account/region quotas.

Plan Fargate capacity explicitly; pair HPA with realistic maxReplicas and quota headroom.

Designing a complete stack

Workload	Pod layer	Node layer
Stateless HTTP API	HPA on CPU or RPS (Prometheus/ALB metric)	CAS or Karpenter
Queue consumer	KEDA on queue depth	Karpenter for burst shapes
Batch / cron	KEDA cron or fixed Job parallelism	Optional burst pool
Steady low traffic	minReplicas ≥ 2, conservative max	Smaller default node group

Capacity planning still matters: load-test before launch, align with SLOs, and define who approves spend when maxReplicas or node pools grow. See Production patterns — Capacity planning.

PDBs protect availability during scale-down and node drains — define them before aggressive autoscaling. See Production patterns — PDB.

If HPA increases replicas but pods stay Pending, work through Production scenarios — Scenario 2.

Operations checklist

Check	Why
metrics-server (or metric adapters) healthy	HPA has signals
Container requests set	Resource metrics work
HPA `minReplicas` / `maxReplicas` reviewed	Cost and availability bounds
`spec.behavior` tuned for bursty apps	Reduces thrash
Node autoscaler installed and within ASG/node limits	Pending pods get capacity
PDBs and `safe-to-evict` annotations	Clean scale-down and drains
Alerts on pending pods, HPA at max, adapter errors	Early warning