Scheduling and Placement

First PublishedApr 24, 2026Last UpdatedMay 18, 2026ByAtif Alam

The kube-scheduler assigns each schedulable Pod to exactly one node by running a two-phase pipeline: filter (predicates — node must pass all) then score (priorities — pick the best among survivors). The result is written to spec.nodeName; kubelet then starts the workload.

This page is the vocabulary you need for SME-level scheduling reviews and production design discussions. For control-plane context, see Architecture. For Pending pods and capacity, see EKS troubleshooting cheat sheet — Symptom 1: Pod stuck in Pending and Workload types (DaemonSets and node placement).

Filtering (predicates)

Predicates answer: can this Pod legally run on this node? If any required predicate fails, the node is dropped. Common built-in themes (exact names vary by version and profile):

Theme	What it checks
Resource fit	Enough allocatable CPU/memory (and hugepages if requested) after subtracting pods already bound.
Node selector / affinity	Hard (`requiredDuringSchedulingIgnoredDuringExecution`) node affinity and `nodeSelector` must match labels.
Taints and tolerations	Pod must tolerate every taint on the node unless the taint is tolerated as NoExecute/NoSchedule in a way that admits the pod.
Volume topology	For PVCs with topology or WaitForFirstConsumer, nodes must satisfy storage and zone constraints.
Pod affinity/anti-affinity	Hard inter-pod rules (e.g. “must sit in same zone as cache”) must be satisfiable.
Ports and hostNetwork	Host port conflicts and similar collisions.

When no node passes filtering, the Pod stays Pending; kubectl describe pod shows a concise message (for example 0/3 nodes are available: 3 Insufficient cpu).

Scoring (priorities)

Priorities answer: among feasible nodes, which is best? The scheduler assigns a score per node; the highest wins (with ties broken pseudo-randomly for spread). Examples of what scoring tends to reward:

Spread — avoid piling too many pods of the same workload on one node.
Preferred (soft) affinity — “like” to be in the same rack or region, without excluding nodes if impossible.
Resource balance — prefer nodes with more headroom after the pod lands.
Image locality — slight preference if the image is already pulled on the node.

Scheduler profiles (and disabled/default priority configurations) can change which priority functions run; managed distributions may ship a tuned profile.

PriorityClass vs scheduler “priorities”

These names collide in conversation:

Concept	What it is
Scheduler priority / scoring	Internal numeric ranking of nodes during normal scheduling.
`PriorityClass`	A Pod field (`priorityClassName`) that sets `priority` on the Pod spec — used for scheduling and preemption ordering: higher Pod priority can evict lower-priority Pods to free a node (preemption), subject to PDBs and fairness.

When debugging unexpected evictions, check PriorityClass, PDBs, and recent changes to priority values — not only resource requests.

Practical debugging checklist

kubectl describe pod <pod> -n <ns> — read Events at the bottom; scheduler messages are usually explicit.
Requests vs allocatable — kubectl describe node for Allocatable and running pods’ requests (not just limits).
Taints / tolerations / affinity — compare node labels, pod template, and any webhook-injected fields.
PVCs — kubectl get pvc, StorageClass volumeBindingMode, and topology; see Storage.
Cluster-wide events — kubectl get events -A --sort-by='.lastTimestamp' | tail -50 for quota, webhook, or autoscaler signals.

For node provisioning (Karpenter, Cluster Autoscaler) when the scheduler is “silent,” see Autoscaling on EKS and the Pending playbook linked above.

Architecture — Where scheduling sits in the kubectl apply → running Pod path.
Production patterns — Requests, limits, QoS, and noisy-neighbor context.
Multi-tenancy and policy — Quotas and placement fairness across teams.

Scheduling and Placement

Filtering (predicates)

Scoring (priorities)

PriorityClass vs scheduler “priorities”

Practical debugging checklist

Related