Data Modeling and Storage

First PublishedMay 5, 2026ByAtif Alam

Data choices outlive almost every service binary. Modeling discussions should anchor on access patterns, consistency, and operational burden — not trending databases.

SQL Versus Document and Wide-Column Stores

SQL (relational) remains the default when you need joins, transactions across rows, and strong invariants (uniqueness, foreign keys). NoSQL (document, key-value, wide-column) often fits high write rates, flexible schema evolution, or horizontal partition friendliness — at the cost of moving some invariants into application code.

“NoSQL for scale” without access-pattern analysis often recreates a fragile relational model in the app layer.

Normalize First, Denormalize for Reads

Normalization reduces update anomalies and duplication. Denormalize selectively when profiling shows predictable read hotspots — for example embedding read models, summary tables, or materialized aggregates. Document the write amplification tradeoff whenever you duplicate data.

See read scaling for read-model patterns.

Indexing Strategy

Place indexes on foreign keys and real predicate columns (WHERE, ORDER BY, JOIN). Unused indexes hurt write throughput and vacuum or maintenance overhead. Tie index design to measured or strongly hypothesized queries, not guesses.

Partitioning

Partition by time for append-heavy logs or events where queries are windowed. Partition by hash (or shard key — see below) when you need horizontal split of a large homogeneous table without natural time locality.

Soft Deletes

Soft deletes preserve rows with a “deleted” flag or timestamp — useful for audit, recovery, and referenced history. Pair with indexing and periodic archival so inactive rows do not bloat hot paths indefinitely.

Sharding Keys

Choose a shard key that aligns with queries that must stay local. Keys that scatter every request across all shards multiply latency and coordination. Minimize cross-shard joins or multi-key transactions; when they are unavoidable, model that cost explicitly.