Operators

First PublishedFeb 16, 2026ByAtif Alam

An operator extends Kubernetes by encoding domain-specific operational knowledge — how to deploy, scale, back up, and upgrade a particular application — into a controller that runs inside the cluster.

Kubernetes already uses a controller loop: watch desired state, compare to actual state, reconcile. Operators use the exact same pattern for your own custom resources.

The Building Blocks

Custom Resource Definition (CRD) — Extends the Kubernetes API with a new resource type (e.g. kind: PostgresCluster instead of just Deployment).
Custom Resource (CR) — An instance of that CRD (e.g. “I want a 3-node Postgres cluster with 100Gi storage”).
Controller — A program that watches CRs and reconciles reality to match the desired state.

1
User creates CR  →  Controller sees it  →  Creates Pods, Services, PVCs, etc.
2
        ↑                                              |
3
        └──── Updates CR status with current state ────┘

Why Use an Operator?

Without an operator, running something like PostgreSQL on Kubernetes means manually managing StatefulSets, PVCs, Services, ConfigMaps, backups, failover, replication, version upgrades, and monitoring.

An operator encodes all that knowledge so you just write:

1
apiVersion: postgres-operator.crunchydata.com/v1beta1
2
kind: PostgresCluster
3
metadata:
4
  name: my-db
5
spec:
6
  postgresVersion: 15
7
  instances:
8
    - replicas: 3
9
      dataVolumeClaimSpec:
10
        accessModes: [ReadWriteOnce]
11
        resources:
12
          requests:
13
            storage: 100Gi
14
  backups:
15
    pgbackrest:
16
      repos:
17
        - name: repo1
18
          schedules:
19
            full: "0 1 * * 0"

And the operator handles everything else — creating pods, setting up replication, running backups on schedule, and healing failures.

Well-Known Operators

Operator	What It Manages
Prometheus Operator	Prometheus instances, alerting rules, ServiceMonitors
Cert-Manager	TLS certificates (auto-renewal from Let’s Encrypt, Vault, etc.)
External Secrets Operator	Syncs secrets from Vault / AWS SM / Azure KV into K8s Secrets
ArgoCD	GitOps-based continuous deployments
Strimzi	Apache Kafka clusters
CloudNativePG / Crunchy	PostgreSQL clusters

Cert-Manager fits into the broader TLS/PKI picture for clusters and cloud load balancers — see TLS and Certificates for ACM-centric lifecycle and how teams often split trust between cloud edges and in-cluster issuers.

You can browse hundreds more at OperatorHub.io.

The Reconcile Loop

Every operator follows the same pattern, regardless of framework:

1
Watch for changes (create / update / delete of your CR)
2
       ↓
3
Fetch the current CR
4
       ↓
5
Compare desired state (spec) vs actual state (what exists in cluster)
6
       ↓
7
Take action to reconcile (create / update / delete child resources)
8
       ↓
9
Update CR status
10
       ↓
11
Return (requeue if needed)

Key principles:

Idempotent — Running reconcile twice with the same input produces the same result.
Level-triggered, not edge-triggered — React to current state, not to “what just happened.”
Owns child resources — Set ownerReference so garbage collection cleans up when the CR is deleted.

Frameworks for Building Operators

Framework	Language	Best For
Kubebuilder	Go	The standard — most production operators use this
Operator SDK	Go, Ansible, Helm	Red Hat’s toolkit; wraps Kubebuilder for Go, also supports Ansible/Helm-based operators
Kopf	Python	Python shops, simpler operators, rapid prototyping
Metacontroller	Any (via webhooks)	Lightweight — you write a webhook in any language, Metacontroller handles the watch/reconcile loop

Building With Kubebuilder (Go)

1. Scaffold the Project

1
kubebuilder init --domain example.com --repo github.com/myorg/my-operator

2. Create an API (CRD + Controller)

1
kubebuilder create api --group apps --version v1alpha1 --kind MyApp

This generates:

api/v1alpha1/myapp_types.go — Your CRD struct (spec and status fields)
internal/controllers/myapp_controller.go — The reconcile loop

3. Define the CRD Spec

1
type MyAppSpec struct {
2
    Replicas int32  `json:"replicas"`
3
    Image    string `json:"image"`
4
    Port     int32  `json:"port,omitempty"`
5
}
6

7
type MyAppStatus struct {
8
    AvailableReplicas int32  `json:"availableReplicas"`
9
    Phase             string `json:"phase"`
10
}

4. Write the Reconcile Loop

1
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
2
    log := log.FromContext(ctx)
3

4
    // 1. Fetch the CR
5
    var myapp appsv1alpha1.MyApp
6
    if err := r.Get(ctx, req.NamespacedName, &myapp); err != nil {
7
        return ctrl.Result{}, client.IgnoreNotFound(err)
8
    }
9

10
    // 2. Build the desired Deployment
11
    desired := &appsv1.Deployment{
12
        ObjectMeta: metav1.ObjectMeta{
13
            Name:      myapp.Name,
14
            Namespace: myapp.Namespace,
15
        },
16
        Spec: appsv1.DeploymentSpec{
17
            Replicas: &myapp.Spec.Replicas,
18
            Selector: &metav1.LabelSelector{
19
                MatchLabels: map[string]string{"app": myapp.Name},
20
            },
21
            Template: corev1.PodTemplateSpec{
22
                ObjectMeta: metav1.ObjectMeta{
23
                    Labels: map[string]string{"app": myapp.Name},
24
                },
25
                Spec: corev1.PodSpec{
26
                    Containers: []corev1.Container{{
27
                        Name:  myapp.Name,
28
                        Image: myapp.Spec.Image,
29
                        Ports: []corev1.ContainerPort{{
30
                            ContainerPort: myapp.Spec.Port,
31
                        }},
32
                    }},
33
                },
34
            },
35
        },
36
    }
37

38
    // 3. Set owner reference (garbage collection)
39
    if err := ctrl.SetControllerReference(&myapp, desired, r.Scheme); err != nil {
40
        return ctrl.Result{}, err
41
    }
42

43
    // 4. Create or update the Deployment
44
    found := &appsv1.Deployment{}
45
    err := r.Get(ctx, types.NamespacedName{Name: desired.Name, Namespace: desired.Namespace}, found)
46
    if err != nil && errors.IsNotFound(err) {
47
        log.Info("Creating Deployment", "name", desired.Name)
48
        err = r.Create(ctx, desired)
49
    } else if err == nil {
50
        log.Info("Updating Deployment", "name", desired.Name)
51
        found.Spec = desired.Spec
52
        err = r.Update(ctx, found)
53
    }
54
    if err != nil {
55
        return ctrl.Result{}, err
56
    }
57

58
    // 5. Update status
59
    myapp.Status.AvailableReplicas = found.Status.AvailableReplicas
60
    myapp.Status.Phase = "Running"
61
    if err := r.Status().Update(ctx, &myapp); err != nil {
62
        return ctrl.Result{}, err
63
    }
64

65
    return ctrl.Result{}, nil
66
}

5. Generate, Install, and Run

1
make manifests         # generates CRD YAML from Go struct tags
2
make install           # applies CRD to cluster (kubectl apply)
3
make run               # runs the controller locally (for development)

6. Build and Deploy

1
make docker-build docker-push IMG=myorg/my-operator:v0.1.0
2
make deploy IMG=myorg/my-operator:v0.1.0

The operator now runs as a Deployment inside the cluster, watching for MyApp resources.

Building With Kopf (Python)

A lighter alternative for simpler operators or teams that prefer Python:

1
import kopf
2
import kubernetes.client as k8s
3

4
@kopf.on.create('example.com', 'v1alpha1', 'myapps')
5
def on_create(spec, name, namespace, **kwargs):
6
    replicas = spec.get('replicas', 1)
7
    image = spec.get('image', 'nginx')
8

9
    api = k8s.AppsV1Api()
10
    deployment = k8s.V1Deployment(
11
        metadata=k8s.V1ObjectMeta(name=name),
12
        spec=k8s.V1DeploymentSpec(
13
            replicas=replicas,
14
            selector=k8s.V1LabelSelector(match_labels={'app': name}),
15
            template=k8s.V1PodTemplateSpec(
16
                metadata=k8s.V1ObjectMeta(labels={'app': name}),
17
                spec=k8s.V1PodSpec(containers=[
18
                    k8s.V1Container(name=name, image=image)
19
                ])
20
            )
21
        )
22
    )
23
    api.create_namespaced_deployment(namespace, deployment)
24

25
@kopf.on.update('example.com', 'v1alpha1', 'myapps')
26
def on_update(spec, name, namespace, **kwargs):
27
    api = k8s.AppsV1Api()
28
    patch = {'spec': {'replicas': spec.get('replicas', 1)}}
29
    api.patch_namespaced_deployment(name, namespace, patch)
30

31
@kopf.on.delete('example.com', 'v1alpha1', 'myapps')
32
def on_delete(name, namespace, **kwargs):
33
    api = k8s.AppsV1Api()
34
    api.delete_namespaced_deployment(name, namespace)

Run it with:

1
kopf run my_operator.py --verbose

Kopf handles the watch loop, retries, and leader election. You write the handlers.

Using Your Custom Resource

Once the CRD is installed and the operator is running:

1
apiVersion: apps.example.com/v1alpha1
2
kind: MyApp
3
metadata:
4
  name: web-frontend
5
  namespace: default
6
spec:
7
  replicas: 3
8
  image: my-frontend:v2.0.0
9
  port: 8080

1
kubectl apply -f my-app.yaml
2
kubectl get myapps                    # see your custom resources
3
kubectl describe myapp web-frontend   # see status and events

Key Takeaways

An operator = CRD (defines a new resource type) + controller (reconciles desired vs actual state).
The reconcile loop must be idempotent and level-triggered.
Use Kubebuilder (Go) for production operators; Kopf (Python) for simpler use cases.
Set owner references on child resources so deletion cascades automatically.
Start simple — an operator that creates a Deployment + Service — and layer on complexity (backups, upgrades, failover) as needed.