Multi-Tenancy and Policy
Multi-tenancy on Kubernetes usually means many teams, one cluster (or a few large clusters). Isolation is defense in depth: namespaces, quotas, network policy, admission, and audit — not any single layer.
Namespaces vs virtual clusters vs separate clusters
Section titled “Namespaces vs virtual clusters vs separate clusters”Design and architecture discussions often contrast soft isolation (shared API server) with hard isolation (separate failure domains).
| Model | Isolation strength | Ops / cost | When it wins |
|---|---|---|---|
| Namespaces + policy | Shared etcd, scheduler, and control plane; boundaries are RBAC, NetworkPolicy, ResourceQuota, admission | Lowest incremental cost | Most internal platforms; strong platform engineering and policy maturity |
| Virtual clusters (for example vcluster — nested API server + sync to host) | Stronger logical separation and sometimes separate version skew experiments; still shares nodes and underlying infra | Medium — extra controllers and upgrade surfaces | Tenant-shaped “almost own cluster” without full metal per team |
| Separate clusters | Hardest isolation — distinct control planes, blast radius, and credentials | Highest — more GitOps, observability, and identity sprawl | Regulated tenants, multi-region independence, or teams that truly need cluster-admin-like freedom |
Separate clusters do not remove policy work — they move it to fleet concerns (see Multi-cluster management). Namespaces remain the default sweet spot when you can enforce defense in depth consistently.
Namespace boundaries
Section titled “Namespace boundaries”- One namespace per team or per environment slice is a common default.
- Use RBAC so teams administer only their namespaces; platform roles own cluster-scoped objects.
- Standardize labels (
team,cost-center,tier) for chargeback and policy selectors.
ResourceQuota and LimitRange
Section titled “ResourceQuota and LimitRange”ResourceQuota caps aggregate consumption per namespace (CPU, memory, object counts). LimitRange sets defaults and ceilings per pod/container.
Tuning loop:
- Start from measured usage (Prometheus,
kubectl top) plus headroom — not guesses. - Set quotas that block runaway growth but allow legitimate spikes; document an exception path with time-bound overrides.
- Review monthly; tighten after incidents, loosen after repeated false blocks.
Default-deny network posture
Section titled “Default-deny network posture”Combine NetworkPolicy with a clear allowlist model. See Network policies for default-deny examples and metadata egress blocks.
Admission policy rollout
Section titled “Admission policy rollout”For engines such as Kyverno or OPA Gatekeeper:
- Audit — report violations without blocking; collect noise metrics.
- Warn — surface messages to CI or namespaces; still allow deploys if policy supports it.
- Enforce — block non-compliant resources with a published exception process (ticket, expiry, owner).
Pair with Admission controllers for webhook failure modes and timeouts.
Shared ingress fairness
Section titled “Shared ingress fairness”When many tenants share one ingress controller:
- Prefer separate IngressClass (or Gateway) per tenant or tier for noisy neighbors.
- Apply per-route rate limits at the edge where the controller supports it.
- Watch controller CPU/memory and admission latency; split controllers when saturation is chronic.
- Namespace-scoped Ingress ownership prevents cross-tenant route hijack via RBAC.
Auditing security-sensitive changes
Section titled “Auditing security-sensitive changes”- Enable Kubernetes audit logs at a policy that captures
Role,RoleBinding,ClusterRoleBinding,ValidatingWebhookConfiguration, and policy CRDs. - Ship logs to your SIEM; alert on cluster-admin bindings, anonymous access, and webhook config edits.
- Retention and immutability follow your compliance baseline.
Related
Section titled “Related”- Multi-cluster management — Fleet patterns when one cluster per tenant is not enough.
- Network policies — Default deny, allowlists, metadata egress.
- RBAC — Bindings and blast radius.
- Production platform checklist — Layered platform ownership.
- Architecture review answers — Prompts this page deepens.