Skip to content

Autoscaling on EKS

First PublishedByAtif Alam

Production autoscaling on Amazon EKS spans three orthogonal layers. Each layer answers a different question; you usually need more than one.

HPA / VPA / KEDA → pod count and size
CAS / Karpenter → node count and type
EC2 / Fargate → underlying compute supply

Pod autoscaling without node autoscaling still leaves pods Pending when the cluster is full.

This guide is EKS-leaning but uses portable Kubernetes APIs where possible. For symptom-driven commands during incidents, see EKS troubleshooting cheat sheet — Autoscaling.

Hands-on labs in Kubernetes Examples cover pod-level HPA on k3s only — not Cluster Autoscaler or Karpenter (single-node labs have no node pool to grow). Node autoscaling sections below target EKS production clusters.

flowchart TB
  traffic[Traffic or queue depth rises]
  hpa[HPA increases desired replicas]
  sched[Scheduler binds or leaves pods Pending]
  nodes[CAS or Karpenter adds nodes]
  ready[Pods Running and serving]
  traffic --> hpa --> sched
  sched -->|capacity available| ready
  sched -->|no fitting nodes| nodes --> ready

HPA needs a metrics pipeline. Pick the path that matches your signal.

SignalTypical sourceEKS / lab guide
CPU / memory utilizationmetrics-server (metrics.k8s.io)EKS add-on; lab: HPA on k3s
PromQL / app metricsPrometheus + prometheus-adapter (custom.metrics.k8s.io)Prometheus Adapter for HPA on EKS; lab: Prometheus Adapter HPA on k3s
CloudWatch metricsContainer Insights + CloudWatch metrics adapter (external.metrics.k8s.io)Container Insights for HPA on EKS

Install or verify the Metrics Server EKS add-on (or equivalent). Confirm the API is healthy:

Terminal window
kubectl get apiservice v1beta1.metrics.k8s.io
kubectl top nodes
kubectl top pods -A

Pods need CPU and memory requests set for resource utilization metrics. Without requests, HPA often shows current: <unknown>.

Prometheus AdapterCloudWatch / Container Insights
SetupPrometheus (in-cluster or AMP) + adapter Helm chartObservability EKS add-on + CloudWatch metrics adapter
Query modelPromQL rules exposed as Kubernetes metricsCloudWatch metric names / dimensions
CostInfra for Prometheus; no per-metric API pricing like CloudWatch custom metricsCloudWatch ingestion and custom metric charges — see AWS cost management
Best fitKubernetes-native apps, rich app metrics, multi-cloud portabilityAWS-centric ops, ALB/SQS/RDS signals alongside cluster metrics

Background on CloudWatch: AWS Monitoring.

Horizontal Pod Autoscaler changes replica count on a workload (usually a Deployment) based on metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

Requires metrics-server. Set requests on containers. See also Production patterns — HPA.

typeUse for
ResourceCPU or memory vs requests
PodsAverage per-pod custom metric
ObjectMetric for another object (e.g. Ingress)
ExternalCluster-external signal (CloudWatch, queue length via adapter)

Install steps for non-resource metrics live in the dedicated guides linked above — not duplicated here.

Tune spec.behavior when replicas oscillate under bursty load:

behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 2
periodSeconds: 60
Symptom in kubectl describe hpaLikely causeFix
unable to get metricsmetrics-server or adapter downFix APIService; check adapter pods and RBAC
current: <unknown>Missing requestsSet resources.requests on containers
Desired equals current under loadTarget too high or wrong metricLower target or fix metric name in adapter rules
Rapid scale up/downNo stabilizationAdd spec.behavior windows
Replicas rise, pods PendingNo node capacityCluster Autoscaler / Karpenter; see below

NetworkPolicies can block metrics paths while workloads still run — see Network policies and the HPA on k3s troubleshooting table.

Terminal window
kubectl get hpa -A
kubectl describe hpa <name> -n <namespace>

KEDA scales workloads from external event sources (SQS, Kafka, cron, Prometheus queries, etc.) via ScaledObject (or ScaledJob). For queue depth and schedules, KEDA is often simpler than wiring a custom metrics adapter yourself.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: my-app-scaler
spec:
scaleTargetRef:
name: my-app
minReplicaCount: 1
maxReplicaCount: 20
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/my-queue
awsRegion: us-east-1
queueLength: "5"

KEDA can drive the same Deployment HPA would scale, or manage replicas directly depending on configuration. Full install walkthroughs are out of scope here; see KEDA documentation.

VPA adjusts CPU/memory requests (and sometimes limits) per container — often by evicting and recreating pods in Auto mode.

ModeBehavior
OffRecommendations only
InitialApply recommendations on pod creation
AutoEvict/recreate pods to apply new resource values

Do not run HPA and VPA on the same signal (for example both on CPU).

CombinationGuidance
HPA on CPU + VPA on CPUAvoid — fighting controllers
HPA on custom metric + VPA on CPUCommon pattern — VPA right-sizes; HPA scales count
HPA on CPU + VPA OffUse VPA recommendations in CI or quarterly right-sizing

VPA is not a managed EKS add-on in all setups; treat install and upgrades as platform ownership. No dedicated install page in this library yet.

Cluster Autoscaler (CAS) adjusts Auto Scaling group or managed node group sizes when pods cannot schedule or nodes are underutilized.

Cluster AutoscalerKarpenter
Provisioning unitASG / node groupDirect EC2 launches
Instance selectionPre-defined by groupPer-pod fit
Scale-up speedUsually slowerUsually faster
Best fitRegulated, fixed instance familiesDynamic, cost-optimized pools

Requirements: IAM permissions, correct ASG tags (k8s.io/cluster-autoscaler/enabled, cluster name tag), min/max sizes on the group, and schedulable pod backlog.

Scale-down blockers: PDBs, safe-to-evict: "false", local-storage pods, DaemonSets without tolerance for empty nodes.

Terminal window
kubectl logs -n kube-system deployment/cluster-autoscaler --tail=200
kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml

Karpenter provisions nodes directly against NodePools and EC2NodeClasses (CRDs), often reacting faster than CAS and choosing instance types per pending pod.

Mental model:

  • NodePool — constraints (arch, capacity type, limits).
  • EC2NodeClass — subnets, security groups, AMI, IAM instance profile.
  • NodeClaim — one provisioned node instance.

If pods stay Pending and Karpenter is silent: check NodePool requirements, pod nodeSelector / affinity / taints, subnet IPs, EC2 quotas, and NodeClass IAM.

Terminal window
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=200
kubectl get nodepool,ec2nodeclass,nodeclaim

For infrastructure-as-code context, see Create an EKS cluster with Terraform — Karpenter may be added separately from that baseline.

AWS Fargate profiles run pods without EC2 worker nodes you manage. HPA still applies to Fargate-backed Deployments. Cluster Autoscaler and Karpenter do not add Fargate capacity — limits come from profile selectors, namespace configuration, and account/region quotas.

Plan Fargate capacity explicitly; pair HPA with realistic maxReplicas and quota headroom.

WorkloadPod layerNode layer
Stateless HTTP APIHPA on CPU or RPS (Prometheus/ALB metric)CAS or Karpenter
Queue consumerKEDA on queue depthKarpenter for burst shapes
Batch / cronKEDA cron or fixed Job parallelismOptional burst pool
Steady low trafficminReplicas ≥ 2, conservative maxSmaller default node group

Capacity planning still matters: load-test before launch, align with SLOs, and define who approves spend when maxReplicas or node pools grow. See Production patterns — Capacity planning.

PDBs protect availability during scale-down and node drains — define them before aggressive autoscaling. See Production patterns — PDB.

If HPA increases replicas but pods stay Pending, work through Production scenarios — Scenario 2.

CheckWhy
metrics-server (or metric adapters) healthyHPA has signals
Container requests setResource metrics work
HPA minReplicas / maxReplicas reviewedCost and availability bounds
spec.behavior tuned for bursty appsReduces thrash
Node autoscaler installed and within ASG/node limitsPending pods get capacity
PDBs and safe-to-evict annotationsClean scale-down and drains
Alerts on pending pods, HPA at max, adapter errorsEarly warning