Grafana

First PublishedFeb 16, 2026ByAtif Alam

Grafana is an open-source visualization and analytics platform. It connects to data sources (Prometheus, Loki, Elasticsearch, CloudWatch, etc.) and turns queries into dashboards, charts, and alerts.

Data Sources

Grafana doesn’t store data — it queries external sources. Add a data source in Configuration → Data Sources:

Data Source	Used For
Prometheus	Metrics (PromQL queries)
Loki	Logs (LogQL queries)
Elasticsearch	Logs, metrics, search
CloudWatch	AWS metrics and logs
InfluxDB	Time-series metrics
PostgreSQL / MySQL	Business data, custom queries
Tempo / Jaeger	Distributed traces

You can have multiple data sources of the same type (e.g. one Prometheus for production, another for staging).

Dashboards

A dashboard is a collection of panels (charts, tables, stats) arranged in rows.

Creating a Dashboard

Click + → New Dashboard.
Add a panel.
Choose a data source and write a query.
Select a visualization type.
Configure panel options (title, legend, thresholds).
Save the dashboard.

Dashboard JSON Model

Dashboards are stored as JSON. You can:

Export a dashboard as JSON for version control.
Import a JSON file or paste a dashboard ID from grafana.com/dashboards.
Provision dashboards from files on disk (for GitOps / config-as-code).

Provisioning (Config as Code)

Place YAML configs and JSON dashboards in Grafana’s provisioning directory:

1
apiVersion: 1
2
providers:
3
  - name: default
4
    folder: ""
5
    type: file
6
    options:
7
      path: /var/lib/grafana/dashboards

1
apiVersion: 1
2
datasources:
3
  - name: Prometheus
4
    type: prometheus
5
    url: http://prometheus:9090
6
    isDefault: true

This lets you deploy Grafana with dashboards and data sources pre-configured — no manual setup.

Panels and Visualization Types

Time Series (Default)

Line/area/bar chart over time. The most common panel:

1
rate(http_requests_total[5m])

Options: line width, fill opacity, gradient, stacking, point size, thresholds.

Stat

Single large number with optional sparkline. Good for KPIs:

1
sum(rate(http_requests_total[5m]))

Shows: “2,345 req/s”

Gauge

Circular gauge showing a value against a range:

1
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100

Shows: 72% with color thresholds (green/yellow/red).

Bar Chart

Compare values across categories:

1
sum by (method) (rate(http_requests_total[5m]))

Table

Tabular data with sortable columns:

1
topk(10, sum by (instance) (rate(http_requests_total[5m])))

Heatmap

Visualize distributions over time (e.g. latency buckets):

1
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))

Logs Panel

Display log lines from Loki or Elasticsearch:

1
{app="my-app"} |= "error"

Other Types

Pie chart — Proportions
State timeline — Status over time (up/down/degraded)
Alert list — Current firing alerts
Text — Markdown or HTML for notes and documentation

Variables (Templating)

Variables make dashboards dynamic — users can switch between environments, hosts, or services without editing queries.

Defining a Variable

Dashboard Settings → Variables → Add variable:

1
Name: instance
2
Type: Query
3
Data source: Prometheus
4
Query: label_values(up, instance)

This populates a dropdown with all instance label values.

Using Variables in Queries

1
rate(http_requests_total{instance="$instance"}[5m])

The $instance is replaced with the selected value from the dropdown.

Common Variable Patterns

Variable	Query	Purpose
`job`	`label_values(up, job)`	Select by job
`instance`	`label_values(up{job="$job"}, instance)`	Chain: instances for selected job
`namespace`	`label_values(kube_pod_info, namespace)`	Kubernetes namespace
`interval`	Custom: `1m, 5m, 15m, 1h`	Adjustable time range

Chained Variables

When one variable depends on another (e.g. namespace → pod):

Create namespace variable: label_values(kube_pod_info, namespace)
Create pod variable: label_values(kube_pod_info{namespace="$namespace"}, pod)

Selecting a namespace automatically filters the pod list.

Repeating Panels

Repeat a panel for each value of a variable:

Set the variable to allow multi-value selection.
In the panel, enable Repeat → Variable: instance.

Grafana creates one panel per selected instance — useful for “per-host” views.

Annotations

Mark events on time-series panels (deploys, incidents, config changes):

1
# Query annotation source
2
ALERTS{alertname="HighErrorRate"}

Or add manual annotations by clicking on the graph and writing a note.

Share link — Direct URL with current time range and variables.
Snapshot — Static copy of the dashboard (no live data).
Export JSON — Full dashboard definition for version control.
Embed panel — iframe embed for external pages.
PDF/PNG — Via Grafana Image Renderer plugin.

Dashboard Design Patterns

Well-designed dashboards answer questions quickly. Poorly designed ones become “wall of graphs” that nobody reads. These patterns help you build dashboards that are actually useful.

The USE Method (Infrastructure)

For every resource (CPU, memory, disk, network), show three things:

Signal	Meaning	Example Panel
Utilization	How busy is it? (%)	`node_cpu_seconds_total` → CPU usage %
Saturation	How overloaded is it? (queue depth)	`node_load1` → load average
Errors	Is it failing?	`node_disk_io_time_weighted_seconds_total`

Layout: One row per resource, three panels per row.

The RED Method (Services)

For every service (API, microservice), show three things:

Signal	Meaning	Example Panel
Rate	Requests per second	`rate(http_requests_total[5m])`
Errors	Error rate (% or count)	`rate(http_requests_total{status=~"5.."}[5m])`
Duration	Latency (p50, p95, p99)	`histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))`

Layout: One row per service, three panels per row. This is the most common pattern for microservice dashboards.

The Four Golden Signals (Google SRE)

Google’s SRE book recommends monitoring these for every user-facing system:

Signal	What to Measure
Latency	Time to serve a request (separate success vs error latency)
Traffic	Requests per second
Errors	Rate of failed requests
Saturation	How “full” the service is (CPU, memory, queue depth)

RED covers the first three; add a saturation panel (CPU/memory of the service pods) for the fourth.

Dashboard Layout Patterns

Overview → Detail (Drill-Down):

1
┌─────────────────────────────────────────────────┐
2
│  Row 1: Key stats (stat panels)                 │
3
│  [Total RPS]  [Error %]  [p99 Latency]  [Pods] │
4
├─────────────────────────────────────────────────┤
5
│  Row 2: Time series (trends)                    │
6
│  [Request rate over time]  [Error rate]         │
7
├─────────────────────────────────────────────────┤
8
│  Row 3: Per-instance breakdown                  │
9
│  [Latency by pod]  [CPU by pod]                 │
10
├─────────────────────────────────────────────────┤
11
│  Row 4: Logs (Loki panel)                       │
12
│  [Recent errors from Loki]                      │
13
└─────────────────────────────────────────────────┘

This pattern gives you the summary at the top and lets you scroll down for detail.

Service Map (Multi-Service):

Create a dashboard per service (using the RED method), then link them:

A “Platform Overview” dashboard shows all services as stat panels.
Clicking a service stat links to that service’s detailed dashboard.
Use Grafana’s Dashboard Links and pass variables.

Panel Design Tips

Tip	Why
Put stat panels at the top	Instant overview of current state
Use thresholds and colors	Green/yellow/red makes problems visible without reading numbers
Label axes	”Requests per second” not just “rate”
Set meaningful Y-axis limits	Don’t auto-scale from 0.001 to 0.002 — it looks like a crisis
Use the right unit	Grafana supports `reqps`, `bytes`, `percent`, `seconds`, etc.
Add descriptions to panels	Hover-text explaining what the panel shows and what “bad” looks like
Collapse rows	Group related panels; default-collapse less important sections
Limit to 10–15 panels	More than that = information overload

Anti-Patterns to Avoid

Anti-Pattern	Problem	Fix
Wall of graphs	30+ panels, no hierarchy	Use rows, collapse, and a summary row at top
No variables	Separate dashboard per environment	Add `$environment`, `$namespace`, `$service` variables
Raw metric names as titles	”node_cpu_seconds_total” means nothing to on-call	Use human-readable titles: “CPU Usage (%)“
Default time range too wide	7-day view hides the last-10-minute spike	Set default to “Last 1 hour” for operational dashboards
No alerting link	Dashboard shows a problem but no way to see related alerts	Add an Alert List panel or link to alert rules
Mixing audiences	Dev metrics + business metrics on one dashboard	Separate: “Service Health” (ops) vs “Business KPIs” (product)

Dashboard-as-Code

Store dashboards in Git and provision them automatically:

Export dashboard JSON from Grafana UI.
Parameterize data source names using ${DS_PROMETHEUS} variables.
Commit to a dashboards/ directory in your repo.
Use Grafana provisioning or a Kubernetes ConfigMap to load on startup.

1
# Kubernetes ConfigMap for dashboard provisioning
2
apiVersion: v1
3
kind: ConfigMap
4
metadata:
5
  name: grafana-dashboards
6
  labels:
7
    grafana_dashboard: "1"    # Grafana sidecar picks this up
8
data:
9
  service-health.json: |
10
    { ... exported dashboard JSON ... }

The kube-prometheus-stack Helm chart’s Grafana sidecar auto-discovers ConfigMaps with the grafana_dashboard label and loads them.

Key Takeaways

Grafana connects to data sources — it doesn’t store data itself.
Use variables to make dashboards dynamic (environment, host, namespace dropdowns).
Provision data sources and dashboards from files for config-as-code deployments.
Choose the right panel type: time series for trends, stat for KPIs, heatmap for distributions, table for top-N lists.
Export dashboards as JSON and commit to Git — treat dashboards as code.
Use the RED method (Rate, Errors, Duration) for service dashboards and the USE method (Utilization, Saturation, Errors) for infrastructure dashboards.
Design dashboards with a summary row at top, details below, and 10–15 panels max to avoid information overload.