Skip to content

Operators

First PublishedByAtif Alam

An operator extends Kubernetes by encoding domain-specific operational knowledge — how to deploy, scale, back up, and upgrade a particular application — into a controller that runs inside the cluster.

Kubernetes already uses a controller loop: watch desired state, compare to actual state, reconcile. Operators use the exact same pattern for your own custom resources.

  1. Custom Resource Definition (CRD) — Extends the Kubernetes API with a new resource type (e.g. kind: PostgresCluster instead of just Deployment).
  2. Custom Resource (CR) — An instance of that CRD (e.g. “I want a 3-node Postgres cluster with 100Gi storage”).
  3. Controller — A program that watches CRs and reconciles reality to match the desired state.
User creates CR → Controller sees it → Creates Pods, Services, PVCs, etc.
↑ |
└──── Updates CR status with current state ────┘

Without an operator, running something like PostgreSQL on Kubernetes means manually managing StatefulSets, PVCs, Services, ConfigMaps, backups, failover, replication, version upgrades, and monitoring.

An operator encodes all that knowledge so you just write:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: my-db
spec:
postgresVersion: 15
instances:
- replicas: 3
dataVolumeClaimSpec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 100Gi
backups:
pgbackrest:
repos:
- name: repo1
schedules:
full: "0 1 * * 0"

And the operator handles everything else — creating pods, setting up replication, running backups on schedule, and healing failures.

OperatorWhat It Manages
Prometheus OperatorPrometheus instances, alerting rules, ServiceMonitors
Cert-ManagerTLS certificates (auto-renewal from Let’s Encrypt, Vault, etc.)
External Secrets OperatorSyncs secrets from Vault / AWS SM / Azure KV into K8s Secrets
ArgoCDGitOps-based continuous deployments
StrimziApache Kafka clusters
CloudNativePG / CrunchyPostgreSQL clusters

Cert-Manager fits into the broader TLS/PKI picture for clusters and cloud load balancers — see TLS and Certificates for ACM-centric lifecycle and how teams often split trust between cloud edges and in-cluster issuers.

You can browse hundreds more at OperatorHub.io.

Every operator follows the same pattern, regardless of framework:

Watch for changes (create / update / delete of your CR)
Fetch the current CR
Compare desired state (spec) vs actual state (what exists in cluster)
Take action to reconcile (create / update / delete child resources)
Update CR status
Return (requeue if needed)

Key principles:

  • Idempotent — Running reconcile twice with the same input produces the same result.
  • Level-triggered, not edge-triggered — React to current state, not to “what just happened.”
  • Owns child resources — Set ownerReference so garbage collection cleans up when the CR is deleted.
FrameworkLanguageBest For
KubebuilderGoThe standard — most production operators use this
Operator SDKGo, Ansible, HelmRed Hat’s toolkit; wraps Kubebuilder for Go, also supports Ansible/Helm-based operators
KopfPythonPython shops, simpler operators, rapid prototyping
MetacontrollerAny (via webhooks)Lightweight — you write a webhook in any language, Metacontroller handles the watch/reconcile loop

Terminal window
kubebuilder init --domain example.com --repo github.com/myorg/my-operator
Terminal window
kubebuilder create api --group apps --version v1alpha1 --kind MyApp

This generates:

  • api/v1alpha1/myapp_types.go — Your CRD struct (spec and status fields)
  • internal/controllers/myapp_controller.go — The reconcile loop
api/v1alpha1/myapp_types.go
type MyAppSpec struct {
Replicas int32 `json:"replicas"`
Image string `json:"image"`
Port int32 `json:"port,omitempty"`
}
type MyAppStatus struct {
AvailableReplicas int32 `json:"availableReplicas"`
Phase string `json:"phase"`
}
internal/controllers/myapp_controller.go
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// 1. Fetch the CR
var myapp appsv1alpha1.MyApp
if err := r.Get(ctx, req.NamespacedName, &myapp); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 2. Build the desired Deployment
desired := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: myapp.Name,
Namespace: myapp.Namespace,
},
Spec: appsv1.DeploymentSpec{
Replicas: &myapp.Spec.Replicas,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{"app": myapp.Name},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{"app": myapp.Name},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{{
Name: myapp.Name,
Image: myapp.Spec.Image,
Ports: []corev1.ContainerPort{{
ContainerPort: myapp.Spec.Port,
}},
}},
},
},
},
}
// 3. Set owner reference (garbage collection)
if err := ctrl.SetControllerReference(&myapp, desired, r.Scheme); err != nil {
return ctrl.Result{}, err
}
// 4. Create or update the Deployment
found := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: desired.Name, Namespace: desired.Namespace}, found)
if err != nil && errors.IsNotFound(err) {
log.Info("Creating Deployment", "name", desired.Name)
err = r.Create(ctx, desired)
} else if err == nil {
log.Info("Updating Deployment", "name", desired.Name)
found.Spec = desired.Spec
err = r.Update(ctx, found)
}
if err != nil {
return ctrl.Result{}, err
}
// 5. Update status
myapp.Status.AvailableReplicas = found.Status.AvailableReplicas
myapp.Status.Phase = "Running"
if err := r.Status().Update(ctx, &myapp); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
Terminal window
make manifests # generates CRD YAML from Go struct tags
make install # applies CRD to cluster (kubectl apply)
make run # runs the controller locally (for development)
Terminal window
make docker-build docker-push IMG=myorg/my-operator:v0.1.0
make deploy IMG=myorg/my-operator:v0.1.0

The operator now runs as a Deployment inside the cluster, watching for MyApp resources.


A lighter alternative for simpler operators or teams that prefer Python:

import kopf
import kubernetes.client as k8s
@kopf.on.create('example.com', 'v1alpha1', 'myapps')
def on_create(spec, name, namespace, **kwargs):
replicas = spec.get('replicas', 1)
image = spec.get('image', 'nginx')
api = k8s.AppsV1Api()
deployment = k8s.V1Deployment(
metadata=k8s.V1ObjectMeta(name=name),
spec=k8s.V1DeploymentSpec(
replicas=replicas,
selector=k8s.V1LabelSelector(match_labels={'app': name}),
template=k8s.V1PodTemplateSpec(
metadata=k8s.V1ObjectMeta(labels={'app': name}),
spec=k8s.V1PodSpec(containers=[
k8s.V1Container(name=name, image=image)
])
)
)
)
api.create_namespaced_deployment(namespace, deployment)
@kopf.on.update('example.com', 'v1alpha1', 'myapps')
def on_update(spec, name, namespace, **kwargs):
api = k8s.AppsV1Api()
patch = {'spec': {'replicas': spec.get('replicas', 1)}}
api.patch_namespaced_deployment(name, namespace, patch)
@kopf.on.delete('example.com', 'v1alpha1', 'myapps')
def on_delete(name, namespace, **kwargs):
api = k8s.AppsV1Api()
api.delete_namespaced_deployment(name, namespace)

Run it with:

Terminal window
kopf run my_operator.py --verbose

Kopf handles the watch loop, retries, and leader election. You write the handlers.


Once the CRD is installed and the operator is running:

my-app.yaml
apiVersion: apps.example.com/v1alpha1
kind: MyApp
metadata:
name: web-frontend
namespace: default
spec:
replicas: 3
image: my-frontend:v2.0.0
port: 8080
Terminal window
kubectl apply -f my-app.yaml
kubectl get myapps # see your custom resources
kubectl describe myapp web-frontend # see status and events
  • An operator = CRD (defines a new resource type) + controller (reconciles desired vs actual state).
  • The reconcile loop must be idempotent and level-triggered.
  • Use Kubebuilder (Go) for production operators; Kopf (Python) for simpler use cases.
  • Set owner references on child resources so deletion cascades automatically.
  • Start simple — an operator that creates a Deployment + Service — and layer on complexity (backups, upgrades, failover) as needed.