Distributed Tracing
Distributed tracing follows a single request as it travels through multiple services, showing where time is spent and which service is the bottleneck. It completes the “three pillars” of observability alongside metrics and logs.
Why Tracing?
Section titled “Why Tracing?”Metrics tell you something is slow. Logs tell you what happened in one service. Traces tell you the full story of a request across every service it touched.
User ──► API Gateway ──► Auth Service ──► User DB └──► Product Service ──► Cache └──► Product DBWithout tracing, debugging “why is this API call slow?” means correlating timestamps across logs from 5+ services. With tracing, you get one view that shows the exact latency of every hop.
Core Concepts
Section titled “Core Concepts”A span represents a single unit of work — an HTTP handler, a database query, a gRPC call. Each span records:
| Field | What It Stores |
|---|---|
| Trace ID | Unique ID shared by all spans in the same request |
| Span ID | Unique ID for this specific span |
| Parent Span ID | The span that triggered this one (builds the tree) |
| Operation name | What the span represents (e.g. GET /api/users) |
| Start / end time | When the operation started and finished |
| Attributes | Key-value pairs (http.status_code=200, db.system=postgresql) |
| Status | OK, Error, or Unset |
| Events | Timestamped annotations (e.g. “cache miss at 14ms”) |
Traces
Section titled “Traces”A trace is a tree of spans — the root span is the entry point (e.g. the API gateway), and child spans are downstream calls:
Trace ID: abc123
[API Gateway]──────────────────────────────── 250ms ├─[Auth Service]──────── 40ms │ └─[User DB query]── 15ms └─[Product Service]──────────────── 180ms ├─[Cache lookup]── 2ms (cache miss) └─[Product DB query]────── 160ms ← bottleneckThis immediately shows the Product DB query is the bottleneck.
Context Propagation
Section titled “Context Propagation”For tracing to work across services, the trace ID must be passed between them. This is called context propagation.
How it works:
- Service A creates a span and generates a trace ID.
- Service A adds the trace ID to the outgoing HTTP headers.
- Service B reads the trace ID from the incoming headers.
- Service B creates a child span using the same trace ID.
Common propagation formats:
| Format | Header | Used By |
|---|---|---|
| W3C Trace Context (standard) | traceparent, tracestate | OpenTelemetry, most modern tools |
| B3 | X-B3-TraceId, X-B3-SpanId | Zipkin, older Jaeger |
| Jaeger | uber-trace-id | Jaeger native |
W3C Trace Context is the recommended standard. OpenTelemetry uses it by default.
Example traceparent header:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 │ │ │ │ │ │ │ └─ sampled (01=yes) │ │ └─ parent span ID │ └─ trace ID └─ versionSampling
Section titled “Sampling”In production, tracing every request generates enormous volumes. Sampling reduces this while keeping useful data.
Sampling Strategies
Section titled “Sampling Strategies”| Strategy | How It Works | When to Use |
|---|---|---|
| Head-based | Decide at the start of the request whether to trace | Simple; low overhead |
| Tail-based | Collect all spans, then decide after the request completes | Keep errors and slow requests; discard healthy ones |
| Rate limiting | Trace N requests per second | Predictable volume |
| Probabilistic | Trace X% of requests | Simple; statistically representative |
Head-based is simpler but might miss interesting requests. Tail-based captures all errors and slow requests but requires a collector to buffer spans before deciding.
# OTel Collector tail-based samplingprocessors: tail_sampling: policies: - name: errors type: status_code status_code: {status_codes: [ERROR]} - name: slow-requests type: latency latency: {threshold_ms: 1000} - name: sample-rest type: probabilistic probabilistic: {sampling_percentage: 10}Jaeger
Section titled “Jaeger”Jaeger is an open-source, end-to-end distributed tracing system, originally built by Uber.
Architecture
Section titled “Architecture”┌─────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐│ Apps │────►│ Jaeger │────►│ Jaeger │────►│ Storage ││ (OTel) │ │ Collector │ │ Ingester │ │ (ES/ │└─────────┘ └──────────────┘ │ (optional) │ │ Cassand-│ └──────────────┘ │ ra) │ └──────────┘ ┌──────────────┐ │ │ Jaeger UI │◄──────────────────────┘ │ (query) │ └──────────────┘Quick Start (All-in-One)
Section titled “Quick Start (All-in-One)”# all-in-one image includes collector, query, and UI with in-memory storagedocker run -d --name jaeger \ -p 16686:16686 \ # Jaeger UI -p 4317:4317 \ # OTLP gRPC -p 4318:4318 \ # OTLP HTTP jaegertracing/all-in-one:latestOpen http://localhost:16686 to access the Jaeger UI.
Searching Traces
Section titled “Searching Traces”The Jaeger UI lets you:
- Search by service — Select a service and time range to see recent traces.
- Search by trace ID — Paste a trace ID directly (useful when you find it in a log line).
- Filter by tags — Find traces where
http.status_code=500orerror=true. - Compare traces — Side-by-side comparison of two traces to spot differences.
Grafana Tempo
Section titled “Grafana Tempo”Grafana Tempo is a high-scale, cost-effective trace backend that integrates natively with Grafana. Unlike Jaeger, Tempo stores traces in object storage (S3, GCS, Azure Blob) — no Elasticsearch or Cassandra needed.
Why Tempo?
Section titled “Why Tempo?”| Feature | Jaeger | Tempo |
|---|---|---|
| Storage | Elasticsearch, Cassandra | Object storage (S3, GCS) |
| Cost at scale | Higher (indexed storage) | Lower (no indexing, cheap storage) |
| Search | Full search by tags | Trace ID lookup + TraceQL |
| Grafana integration | Plugin | Native |
| Index | Full index of tags | Minimal index (by trace ID) |
Tempo trades full tag-based search for much lower storage costs. It compensates with:
- Trace ID lookup — Find a trace if you have its ID (from logs or metrics).
- TraceQL — A query language for searching traces by structure and attributes.
TraceQL
Section titled “TraceQL”TraceQL lets you search traces by span attributes, duration, and structure:
# Find traces where an HTTP span returned 500 and took > 1s{ span.http.status_code = 500 && duration > 1s }
# Find traces that touched the "payments" service{ resource.service.name = "payments" }
# Find traces where a database query was slow{ span.db.system = "postgresql" && duration > 500ms }Tempo Architecture
Section titled “Tempo Architecture”┌──────────┐ ┌──────────────┐ ┌──────────────┐│ Apps │────►│ Tempo │────►│ Object ││ (OTel) │ │ Distributor │ │ Storage │└──────────┘ └──────┬───────┘ │ (S3/GCS) │ │ └──────────────┘ ▼ │ ┌──────────────┐ │ │ Tempo │◄───────────┘ │ Querier │ └──────────────┘ ▲ ┌──────────────┐ │ Grafana │ └──────────────┘Deploying Tempo
Section titled “Deploying Tempo”# docker-compose snippettempo: image: grafana/tempo:latest command: ["-config.file=/etc/tempo.yaml"] volumes: - ./tempo.yaml:/etc/tempo.yaml ports: - "4317:4317" # OTLP gRPC - "3200:3200" # Tempo query API
# tempo.yamlserver: http_listen_port: 3200
distributor: receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317
storage: trace: backend: local # use s3/gcs in production local: path: /tmp/tempo/blocks wal: path: /tmp/tempo/wal
metrics_generator: processor: service_graphs: enabled: true span_metrics: enabled: true storage: path: /tmp/tempo/generator/wal remote_write: - url: http://prometheus:9090/api/v1/writeThe metrics_generator in Tempo can automatically create RED metrics (Rate, Errors, Duration) from traces and push them to Prometheus — bridging traces and metrics.
Connecting Traces to Logs and Metrics
Section titled “Connecting Traces to Logs and Metrics”The real power of distributed tracing comes from correlating all three signals.
Trace → Logs
Section titled “Trace → Logs”Embed the trace ID in your log lines:
import loggingfrom opentelemetry import trace
logger = logging.getLogger(__name__)
def handle_request(): span = trace.get_current_span() trace_id = format(span.get_span_context().trace_id, '032x') logger.info("Processing request", extra={"trace_id": trace_id})In Grafana, configure a derived field in Loki to make trace IDs clickable — clicking a trace ID in a log line jumps directly to the trace in Tempo.
# Grafana Loki data source config — derived fieldsderivedFields: - name: TraceID matcherRegex: "trace_id=(\\w+)" url: "${__data.fields.traceID}" datasourceUid: tempo urlDisplayLabel: "View Trace"Trace → Metrics (Exemplars)
Section titled “Trace → Metrics (Exemplars)”Exemplars attach a trace ID to a specific metric data point. When you see a spike in latency on a Grafana dashboard, click the exemplar dot to jump to the exact trace that caused it.
● ← exemplar (trace_id=abc123) ┌────────────┤ p99 │ │ │ ────────┤ p50 │ │ └────────────┴──────────── timeIn Prometheus, exemplars are stored alongside histogram buckets. Grafana displays them as dots on graphs.
The Full Loop
Section titled “The Full Loop”Dashboard spike → click exemplar → open trace → see slow span →click trace_id in logs → see error message → fix bugThis is the core value of distributed tracing in an observable system.
Key Takeaways
Section titled “Key Takeaways”- Distributed tracing follows requests across services — showing latency, dependencies, and bottlenecks as a span tree.
- Context propagation (W3C Trace Context headers) carries trace IDs between services.
- Sampling (head-based or tail-based) controls trace volume in production.
- Jaeger is a mature, full-featured tracing backend with tag search and a rich UI.
- Grafana Tempo uses cheap object storage and TraceQL for cost-effective tracing at scale.
- Correlating traces with logs and metrics (via trace IDs and exemplars) is what makes distributed tracing truly powerful.