Multi-Cluster Management
Multi-cluster is a deliberate choice about blast radius, tenancy, regionality, and upgrade independence. It is not “we failed to standardize one cluster” — regulated teams, edge footprints, and large enterprises often require multiple API servers.
This page names the main SME-level tools and patterns. For tenant isolation inside one cluster, see Multi-tenancy and policy. For delivery mechanics (repos, sync, secrets), see GitOps — especially ApplicationSet-style patterns that render the same chart into many clusters.
Problems multi-cluster solves
Section titled “Problems multi-cluster solves”- Failure isolation — a bad CRD/webhook upgrade or etcd incident in cluster A does not take cluster B offline.
- Version skew experiments — canary a minor Kubernetes upgrade on a small cluster before the fleet.
- Data residency — separate clusters per jurisdiction with no shared control plane.
- Hard tenancy — some business units need cluster-admin-like freedom; separate clusters avoid impossible policy on a shared plane.
Costs include duplicated add-ons (CNI, DNS, ingress), identity integration per cluster, and observability correlation across boundaries.
Cluster API (lifecycle, not traffic)
Section titled “Cluster API (lifecycle, not traffic)”Cluster API (CAPI) is a Kubernetes-style API for creating, upgrading, and deleting clusters themselves — usually by driving cloud provider machine APIs from a management cluster.
Common framing: “We treat clusters like cattle — Terraform or CAPI provisions the management plane, then we bootstrap workloads with GitOps.” Pair CAPI with your image / SBOM pipeline and Cluster upgrades discipline.
Karmada (placement across existing clusters)
Section titled “Karmada (placement across existing clusters)”Karmada (Kubernetes Armada) focuses on propagating workloads and policies to member clusters — scheduling and override semantics across a fleet from a host control plane.
Contrast with CAPI: CAPI stands up clusters; Karmada distributes work onto clusters that already exist. Teams sometimes combine both: CAPI for birth, Karmada (or another layer) for ongoing placement.
Many clusters + GitOps (fleet delivery)
Section titled “Many clusters + GitOps (fleet delivery)”The common production pattern is one Git repo (or monorepo path) per concern and a GitOps controller per cluster (Argo CD, Flux) — or one Argo CD controlling many registered clusters — plus ApplicationSet generators to fan out Applications from cluster metadata, environments, or folder layout.
This is still multi-cluster even without Karmada: Git is the source of truth; each cluster reconciles its slice. See GitOps for the core workflow; treat secrets, drift, and sync waves as fleet-wide risks.
Cloud “fleet” managers (light touch)
Section titled “Cloud “fleet” managers (light touch)”Vendors expose hosted fleet UIs/APIs (attach clusters, run policy packs, aggregate metrics). Mentally they sit above GitOps or beside it — useful for inventory and policy baselines, but your desired state should still live in versioned manifests unless you accept UI-only drift.
Related
Section titled “Related”- Multi-tenancy and policy — Namespace vs virtual cluster vs separate cluster tradeoffs.
- EKS overview and Migrating workloads from EC2 to EKS — AWS-flavored migration and operations.
- Scheduling and placement — Single-cluster scheduling; compare to Karmada’s cross-cluster placement story.