Read Scaling

First PublishedMay 5, 2026ByAtif Alam

Read scaling answers how the system serves many concurrent readers without melting the primary database. The pattern set depends on cacheability, staleness tolerance, and geography.

Cache Before Replicas

An in-memory cache (often Redis or a host-local layer) in cache-aside form usually cuts load on the hottest keys faster than adding read replicas alone. Replicas help spread read QPS but still hit the database engine; caches remove repeat work for skewed access.

Pair with TTL policy and explicit staleness semantics; see Caching patterns.

CDN and Edge

A CDN offloads static assets and cacheable edge-friendly API responses. Useful when users are globally distributed and latency to origin dominates. Not a substitute for correct cache-control and security on dynamic data.

Fan-Out on Write

For read-heavy timelines (feeds, dashboards), fan-out on write precomputes read models per consumer or bucket on the write path so reads stay O(1) from a local store. Costs write amplification and reconciliation logic; document how partial failure is handled.

Materialized Views

Materialized views (or nightly rollups, streaming aggregates) front expensive aggregates so online queries hit precomputed data. Trade freshness for read cost; refresh strategy belongs in the review.

Replica Lag and Staleness

Read replicas serve eventually consistent reads. If the product implies “read your writes,” you need routing to primary, session stickiness, or version tokens — not blind replica reads.

Related: RDBMS reliability and on-call (replication lag), Caching patterns.

Cache Hit Rate Heuristic

If sustained hit rate stays far below ~80% for a cache that was supposed to protect the database, the key space, TTL, or eviction policy is often wrong — or the workload is not cacheable. Investigate before adding more hardware.

State the expected hit rate assumption in the design doc; it is reviewable and testable.

Related: Capacity estimation, CloudFront (AWS-flavored CDN).