Observability Prometheus + Grafana + Alertmanager + Loki Setup
This page walks through setting up the full Prometheus + Grafana + Alertmanager + Loki stack, first with Docker Compose (for local dev or small deployments) and then with Kubernetes.
Docker Compose Setup
Section titled “Docker Compose Setup”Directory Structure
Section titled “Directory Structure”monitoring/ docker-compose.yml prometheus/ prometheus.yml rules/ alerts.yml alertmanager/ alertmanager.yml grafana/ provisioning/ datasources/ datasources.yml dashboards/ dashboards.yml dashboards/ node-exporter.json loki/ loki-config.yml promtail/ promtail-config.ymldocker-compose.yml
Section titled “docker-compose.yml”version: "3.8"
services: prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - ./prometheus/rules:/etc/prometheus/rules - prometheus-data:/prometheus command: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.retention.time=15d"
alertmanager: image: prom/alertmanager:latest ports: - "9093:9093" volumes: - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
grafana: image: grafana/grafana:latest ports: - "3000:3000" environment: GF_SECURITY_ADMIN_PASSWORD: admin volumes: - ./grafana/provisioning:/etc/grafana/provisioning - ./grafana/dashboards:/var/lib/grafana/dashboards - grafana-data:/var/lib/grafana
loki: image: grafana/loki:latest ports: - "3100:3100" volumes: - ./loki/loki-config.yml:/etc/loki/local-config.yaml - loki-data:/loki
promtail: image: grafana/promtail:latest volumes: - ./promtail/promtail-config.yml:/etc/promtail/config.yml - /var/log:/var/log:ro command: -config.file=/etc/promtail/config.yml
node-exporter: image: prom/node-exporter:latest ports: - "9100:9100" pid: host volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro command: - "--path.procfs=/host/proc" - "--path.sysfs=/host/sys" - "--path.rootfs=/rootfs"
volumes: prometheus-data: grafana-data: loki-data:prometheus.yml
Section titled “prometheus.yml”global: scrape_interval: 15s evaluation_interval: 15s
rule_files: - "rules/*.yml"
alerting: alertmanagers: - static_configs: - targets: ["alertmanager:9093"]
scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"]
- job_name: "node-exporter" static_configs: - targets: ["node-exporter:9100"]
- job_name: "loki" static_configs: - targets: ["loki:3100"]Alert Rules
Section titled “Alert Rules”groups: - name: node_alerts rules: - alert: HighCpuUsage expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage on {{ $labels.instance }}"
- alert: DiskSpaceLow expr: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 90 for: 10m labels: severity: critical annotations: summary: "Disk space above 90% on {{ $labels.instance }}"
- alert: InstanceDown expr: up == 0 for: 2m labels: severity: critical annotations: summary: "Instance {{ $labels.instance }} is down"Alertmanager Config
Section titled “Alertmanager Config”global: resolve_timeout: 5m
route: receiver: default group_by: [alertname] group_wait: 30s group_interval: 5m repeat_interval: 4h
receivers: - name: default webhook_configs: - url: "http://localhost:5001/webhook" # replace with Slack/PagerDutyGrafana Data Source Provisioning
Section titled “Grafana Data Source Provisioning”apiVersion: 1datasources: - name: Prometheus type: prometheus url: http://prometheus:9090 isDefault: true
- name: Loki type: loki url: http://loki:3100
- name: Alertmanager type: alertmanager url: http://alertmanager:9093Grafana Dashboard Provisioning
Section titled “Grafana Dashboard Provisioning”apiVersion: 1providers: - name: default folder: "" type: file options: path: /var/lib/grafana/dashboards foldersFromFilesStructure: truePlace exported dashboard JSON files in grafana/dashboards/. Popular community dashboards:
| Dashboard | ID | Metrics From |
|---|---|---|
| Node Exporter Full | 1860 | Node exporter |
| Docker Monitoring | 893 | cAdvisor |
| Prometheus Stats | 2 | Prometheus itself |
| Loki Logs | 13639 | Loki |
Import by ID: Grafana → Dashboards → Import → Enter ID.
Loki Config
Section titled “Loki Config”auth_enabled: false
server: http_listen_port: 3100
common: path_prefix: /loki storage: filesystem: chunks_directory: /loki/chunks rules_directory: /loki/rules replication_factor: 1 ring: kvstore: store: inmemory
schema_config: configs: - from: 2020-10-24 store: tsdb object_store: filesystem schema: v13 index: prefix: index_ period: 24hPromtail Config
Section titled “Promtail Config”server: http_listen_port: 9080
positions: filename: /tmp/positions.yaml
clients: - url: http://loki:3100/loki/api/v1/push
scrape_configs: - job_name: system static_configs: - targets: [localhost] labels: job: varlogs __path__: /var/log/*.logStarting Everything
Section titled “Starting Everything”cd monitoringdocker compose up -dThen open:
- Grafana: http://localhost:3000 (admin / admin)
- Prometheus: http://localhost:9090
- Alertmanager: http://localhost:9093
Kubernetes Setup
Section titled “Kubernetes Setup”For Kubernetes, the easiest path is the kube-prometheus-stack Helm chart, which installs Prometheus, Grafana, Alertmanager, Node exporter, kube-state-metrics, and pre-built dashboards.
Install With Helm
Section titled “Install With Helm”helm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --create-namespace \ --set grafana.adminPassword=adminThis deploys:
- Prometheus Operator — Manages Prometheus instances via CRDs.
- Prometheus — Configured to scrape Kubernetes pods, nodes, and services.
- Grafana — Pre-loaded with Kubernetes dashboards.
- Alertmanager — With default alert rules.
- Node exporter — DaemonSet on every node.
- kube-state-metrics — Kubernetes object metrics.
Access Grafana
Section titled “Access Grafana”kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80Open http://localhost:3000 (admin / admin).
Add Loki
Section titled “Add Loki”helm install loki grafana/loki-stack \ --namespace monitoring \ --set promtail.enabled=true \ --set loki.persistence.enabled=true \ --set loki.persistence.size=10GiThen add Loki as a data source in Grafana: URL http://loki:3100.
Custom Scrape Targets (ServiceMonitor)
Section titled “Custom Scrape Targets (ServiceMonitor)”The Prometheus Operator uses CRDs to configure scraping. To scrape your app:
apiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata: name: my-app namespace: monitoring labels: release: monitoring # must match the Helm release labelspec: namespaceSelector: matchNames: [default] selector: matchLabels: app: my-app endpoints: - port: metrics interval: 15sYour app’s Service must have a port named metrics and the label app: my-app.
Custom Alert Rules (PrometheusRule)
Section titled “Custom Alert Rules (PrometheusRule)”apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: name: my-app-alerts namespace: monitoring labels: release: monitoringspec: groups: - name: my-app rules: - alert: MyAppHighErrorRate expr: | sum(rate(http_requests_total{app="my-app", status=~"5.."}[5m])) / sum(rate(http_requests_total{app="my-app"}[5m])) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate on my-app"Custom Values File
Section titled “Custom Values File”grafana: adminPassword: "secure-password" persistence: enabled: true size: 5Gi
prometheus: prometheusSpec: retention: 30d storageSpec: volumeClaimTemplate: spec: accessModes: [ReadWriteOnce] resources: requests: storage: 50Gi
alertmanager: config: route: receiver: slack receivers: - name: slack slack_configs: - api_url: "https://hooks.slack.com/services/XXX" channel: "#alerts"helm upgrade monitoring prometheus-community/kube-prometheus-stack \ --namespace monitoring \ -f monitoring-values.yamlFrom Zero to Dashboard Checklist
Section titled “From Zero to Dashboard Checklist”- Deploy the stack (Docker Compose or Helm).
- Verify targets — Prometheus → Status → Targets (all should be “UP”).
- Import Node Exporter dashboard (ID 1860) in Grafana.
- Add your app as a scrape target (static config or ServiceMonitor).
- Instrument your app with client libraries (counters, histograms).
- Create alert rules for error rate, latency, and resource usage.
- Configure Alertmanager receivers (Slack, PagerDuty, email).
- Add Loki and configure Promtail for log collection.
- Build custom dashboards combining metrics (Prometheus) and logs (Loki).
Key Takeaways
Section titled “Key Takeaways”- Docker Compose is the fastest way to get the full stack running locally.
- kube-prometheus-stack Helm chart deploys everything for Kubernetes with pre-built dashboards and auto-discovery.
- Use ServiceMonitor and PrometheusRule CRDs to add scrape targets and alerts in Kubernetes.
- Provision Grafana data sources and dashboards from files — treat monitoring config as code.
- Start with Node Exporter + community dashboards, then add application metrics and custom dashboards.