Skip to content

Write Scaling

First PublishedByAtif Alam

Write scaling protects primary writers, transaction throughput, and downstream dependencies when ingestion outgrows a single node.

Message queues absorb spikes and decouple producers from consumers. They turn synchronous pressure into bounded lag the system can drain when load falls. Define ordering needs (per-partition FIFO vs at-least-once), retries, and dead-letter handling up front.

Where semantics allow, batch inserts or updates to cut round trips and amortize commit overhead. Balance latency (batches wait briefly to fill) versus throughput.

Move non-critical work (notifications, analytics enrichment, search index updates) off the critical path asynchronously. The user-facing operation returns after durable core state is written; secondary systems catch up via jobs or streams.

Avoid synchronous dual-writes where two systems must commit in one request without a durable outbox or saga — partial failure becomes data skew.

See Consistency and transactions.

Connection pools cap concurrent DB sessions and reuse TCP + auth setup. Pool exhaustion looks like outages; size pools with database limits and realistic concurrency. This applies to app servers, serverless with proxy pools, and workers alike.

When horizontal write scale is real, shard by a key with locality (user_id, tenant_id) so most transactions stay single-shard. Avoid “random” sharding that scatters every correlated write.

Measure primary writer CPU, disk IOPS, lock wait, and queue depth before committing to sharding. Sharding adds cross-shard pain; sometimes the fix is better indexes, partitioning, or archiving cold data.

Related: Data modeling and storage, Capacity estimation, Databases.