Data Modeling and Storage
Data choices outlive almost every service binary. Modeling discussions should anchor on access patterns, consistency, and operational burden — not trending databases.
SQL Versus Document and Wide-Column Stores
Section titled “SQL Versus Document and Wide-Column Stores”SQL (relational) remains the default when you need joins, transactions across rows, and strong invariants (uniqueness, foreign keys). NoSQL (document, key-value, wide-column) often fits high write rates, flexible schema evolution, or horizontal partition friendliness — at the cost of moving some invariants into application code.
“NoSQL for scale” without access-pattern analysis often recreates a fragile relational model in the app layer.
Normalize First, Denormalize for Reads
Section titled “Normalize First, Denormalize for Reads”Normalization reduces update anomalies and duplication. Denormalize selectively when profiling shows predictable read hotspots — for example embedding read models, summary tables, or materialized aggregates. Document the write amplification tradeoff whenever you duplicate data.
See read scaling for read-model patterns.
Indexing Strategy
Section titled “Indexing Strategy”Place indexes on foreign keys and real predicate columns (WHERE, ORDER BY, JOIN). Unused indexes hurt write throughput and vacuum or maintenance overhead. Tie index design to measured or strongly hypothesized queries, not guesses.
Partitioning
Section titled “Partitioning”Partition by time for append-heavy logs or events where queries are windowed. Partition by hash (or shard key — see below) when you need horizontal split of a large homogeneous table without natural time locality.
Soft Deletes
Section titled “Soft Deletes”Soft deletes preserve rows with a “deleted” flag or timestamp — useful for audit, recovery, and referenced history. Pair with indexing and periodic archival so inactive rows do not bloat hot paths indefinitely.
Sharding Keys
Section titled “Sharding Keys”Choose a shard key that aligns with queries that must stay local. Keys that scatter every request across all shards multiply latency and coordination. Minimize cross-shard joins or multi-key transactions; when they are unavoidable, model that cost explicitly.
Related: Databases overview, RDBMS reliability and on-call, Write scaling, Consistency and transactions.