Skip to content

Data Modeling and Storage

First PublishedByAtif Alam

Data choices outlive almost every service binary. Modeling discussions should anchor on access patterns, consistency, and operational burden — not trending databases.

SQL Versus Document and Wide-Column Stores

Section titled “SQL Versus Document and Wide-Column Stores”

SQL (relational) remains the default when you need joins, transactions across rows, and strong invariants (uniqueness, foreign keys). NoSQL (document, key-value, wide-column) often fits high write rates, flexible schema evolution, or horizontal partition friendliness — at the cost of moving some invariants into application code.

“NoSQL for scale” without access-pattern analysis often recreates a fragile relational model in the app layer.

Normalization reduces update anomalies and duplication. Denormalize selectively when profiling shows predictable read hotspots — for example embedding read models, summary tables, or materialized aggregates. Document the write amplification tradeoff whenever you duplicate data.

See read scaling for read-model patterns.

Place indexes on foreign keys and real predicate columns (WHERE, ORDER BY, JOIN). Unused indexes hurt write throughput and vacuum or maintenance overhead. Tie index design to measured or strongly hypothesized queries, not guesses.

Partition by time for append-heavy logs or events where queries are windowed. Partition by hash (or shard key — see below) when you need horizontal split of a large homogeneous table without natural time locality.

Soft deletes preserve rows with a “deleted” flag or timestamp — useful for audit, recovery, and referenced history. Pair with indexing and periodic archival so inactive rows do not bloat hot paths indefinitely.

Choose a shard key that aligns with queries that must stay local. Keys that scatter every request across all shards multiply latency and coordination. Minimize cross-shard joins or multi-key transactions; when they are unavoidable, model that cost explicitly.

Related: Databases overview, RDBMS reliability and on-call, Write scaling, Consistency and transactions.