Caching Patterns

First PublishedMay 5, 2026ByAtif Alam

Caching is a read-latency and database-protection tool first. This page focuses on patterns; read scaling covers where caching sits relative to replicas and CDN — see Read scaling for overlap and sequencing.

Cache Reads Before Writes

Operational default: expose stable read paths through cache rather than pretending writes “update the cache magically” everywhere.Write-heavy workloads still benefit from selective caching, but blindly caching mutable aggregates without invalidation invites stale reads and coherence bugs.

Cache-Aside

In cache-aside, application code reads the cache first, loads the backing store on miss, then populates the cache. Pros: straightforward and tolerant of transient cache outages (fall back to DB). Cons: thundering herd on expiry unless mitigated below.

Write-Through

Write-through updates cache and backing store together (ordering varies). Improves read consistency immediately after writes at the cost of write latency and write amplification. Use when stale reads after writes are unacceptable and working set fits cache economics.

Eviction Policies

LRU (least recently used) is a sensible default general-purpose eviction. Some workloads profit from TTL-only expiry with size caps, or LFU for stable hot sets. Tune based on measurable miss rates and eviction churn.

Cache Stampede

On expiry or bust, many concurrent misses slam the origin.Mitigations: probabilistic early refresh, mutex per key (single-flight), pre-warming critical keys, shorter TTL jitter, or layering (local L1 plus shared L2).

Thundering Herd

Distinct but related:many clients invalidate or cold-start together. Warm caches before TTL cliff; stagger TTLs; consider graceful staleness (serve slightly old value while rebuilding in background).

Horizontal Cache Scale

Redis Cluster (or similarly sharded caches) shards key space across nodes for throughput and memory headroom.Client routing must follow cluster topology moves; resilience planning matters when a node disappears.

Duplicate logic with read scaling pages is intentional: keep read-path narratives in sync when changing assumptions.

Related: Write scaling, Consistency and transactions, glossary (LRU, TTL, Redis).