Skip to content

Multi-Tenancy and Policy

First PublishedLast UpdatedByAtif Alam

Multi-tenancy on Kubernetes usually means many teams, one cluster (or a few large clusters). Isolation is defense in depth: namespaces, quotas, network policy, admission, and audit — not any single layer.

Namespaces vs virtual clusters vs separate clusters

Section titled “Namespaces vs virtual clusters vs separate clusters”

Design and architecture discussions often contrast soft isolation (shared API server) with hard isolation (separate failure domains).

ModelIsolation strengthOps / costWhen it wins
Namespaces + policyShared etcd, scheduler, and control plane; boundaries are RBAC, NetworkPolicy, ResourceQuota, admissionLowest incremental costMost internal platforms; strong platform engineering and policy maturity
Virtual clusters (for example vcluster — nested API server + sync to host)Stronger logical separation and sometimes separate version skew experiments; still shares nodes and underlying infraMedium — extra controllers and upgrade surfacesTenant-shaped “almost own cluster” without full metal per team
Separate clustersHardest isolation — distinct control planes, blast radius, and credentialsHighest — more GitOps, observability, and identity sprawlRegulated tenants, multi-region independence, or teams that truly need cluster-admin-like freedom

Separate clusters do not remove policy work — they move it to fleet concerns (see Multi-cluster management). Namespaces remain the default sweet spot when you can enforce defense in depth consistently.

  • One namespace per team or per environment slice is a common default.
  • Use RBAC so teams administer only their namespaces; platform roles own cluster-scoped objects.
  • Standardize labels (team, cost-center, tier) for chargeback and policy selectors.

ResourceQuota caps aggregate consumption per namespace (CPU, memory, object counts). LimitRange sets defaults and ceilings per pod/container.

Tuning loop:

  1. Start from measured usage (Prometheus, kubectl top) plus headroom — not guesses.
  2. Set quotas that block runaway growth but allow legitimate spikes; document an exception path with time-bound overrides.
  3. Review monthly; tighten after incidents, loosen after repeated false blocks.

Combine NetworkPolicy with a clear allowlist model. See Network policies for default-deny examples and metadata egress blocks.

For engines such as Kyverno or OPA Gatekeeper:

  1. Audit — report violations without blocking; collect noise metrics.
  2. Warn — surface messages to CI or namespaces; still allow deploys if policy supports it.
  3. Enforce — block non-compliant resources with a published exception process (ticket, expiry, owner).

Pair with Admission controllers for webhook failure modes and timeouts.

When many tenants share one ingress controller:

  • Prefer separate IngressClass (or Gateway) per tenant or tier for noisy neighbors.
  • Apply per-route rate limits at the edge where the controller supports it.
  • Watch controller CPU/memory and admission latency; split controllers when saturation is chronic.
  • Namespace-scoped Ingress ownership prevents cross-tenant route hijack via RBAC.
  • Enable Kubernetes audit logs at a policy that captures Role, RoleBinding, ClusterRoleBinding, ValidatingWebhookConfiguration, and policy CRDs.
  • Ship logs to your SIEM; alert on cluster-admin bindings, anonymous access, and webhook config edits.
  • Retention and immutability follow your compliance baseline.