system-design/nosql-deep-dive.md

7. NoSQL Deep Dive: KV, Document, Wide-Column, Graph

"NoSQL" is a marketing umbrella for "not relational." The differences within NoSQL are larger than the differences from SQL. You must reason about each model on its own terms.

~6 min read·updated 5/29/2026

7. NoSQL Deep Dive: KV, Document, Wide-Column, Graph

"NoSQL" is a marketing umbrella for "not relational." The differences within NoSQL are larger than the differences from SQL. You must reason about each model on its own terms.

7.1 Why NoSQL emerged

By 2008-2010, three pressures hit:

  1. Scale: Google, Amazon, Facebook had data sets and write rates no single relational box could handle.
  2. Schema flexibility: rapid product iteration, semi-structured data (logs, events, user-generated content).
  3. Operational fit: many web workloads only need key-based access; the cost of full SQL features wasn't paying for itself.

Influential papers:

  • Bigtable (Google, 2006) → wide-column.
  • Dynamo (Amazon, 2007) → leaderless replication, eventual consistency, consistent hashing.
  • MapReduce + GFS (Google) → batch + distributed FS.

Today the lines blur: distributed SQL (Spanner, CockroachDB, Vitess) and document Postgres extensions narrow the gap. But the four NoSQL families remain conceptually distinct.

7.2 Key-Value stores

Simplest interface: get(k), put(k, v), sometimes delete, cas (compare-and-set).

Use cases

  • Cache (Redis, Memcached)
  • Session store (Redis)
  • Lookup (DynamoDB by partition key)
  • Counters / rate-limiting (Redis INCR, sorted sets)
  • Pub/sub (Redis, NATS)

Redis specifics

Not just KV. Has rich data types:

  • Strings (with INCR/DECR)
  • Lists (push/pop both ends; queues)
  • Sets (membership, intersection)
  • Hashes (small object store; Redis stores small hashes specially as a list to save memory)
  • Sorted sets (ZSET): score-ordered, supports range, used for leaderboards, top-N
  • Streams (Redis 5+, append-only log; consumer groups like Kafka-lite)
  • Bitmaps and HyperLogLog (probabilistic, see chapter 17)
  • Geospatial (geohash-backed sorted sets)

Persistence options:

  • RDB: periodic snapshot (fork → write whole memory). Crash loses recent writes.
  • AOF: append-only file of every command. Replayed on startup. appendfsync everysec (default) or always (per-write fsync, slower).

Threading: single-threaded for command execution → simple semantics, no locks. Backed by event loop. Redis 6+ optionally multi-threads I/O.

Cluster: hash slots (16384), partitioned across nodes. Client routes; nodes redirect on slot moves (MOVED/ASK).

DynamoDB specifics

  • Item model: PK = partition key (+ optional sort key) → row.
  • Provisioned or on-demand throughput.
  • Single-digit ms reads (consistent or eventually consistent).
  • Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI). GSIs are eventually consistent.
  • Streams (CDC): table → Lambda or Kinesis.
  • DAX: managed write-through cache.

Pricing model forces the design: hot partitions get throttled. Distribute writes across many partition keys.

7.3 Document stores

Each item is a structured document (JSON). Indexes on fields. Often schema-on-read.

MongoDB specifics

  • BSON storage; documents up to 16 MB.
  • Replica sets (1 primary + N secondaries) with automatic failover.
  • Sharding via mongos router; chunk-based, with balancer.
  • Multi-document transactions since 4.0 (replica set) and 4.2 (sharded). Costly; designed for the rare case, not the default.
  • Aggregation pipeline for ETL-like queries.
  • Has had a checkered history with default consistency settings (w:1 write concerns by default in old versions made it lose data on failover). Modern defaults are safer.

When document is right

  • Variable schema across items
  • Self-contained "object per row" — order with line items, blog post with comments
  • No (or rare) cross-document joins required

When document is wrong

  • Many-to-many relationships (you'll either denormalize and over-fetch or app-side join)
  • Transactional consistency across documents (works but slow)
  • Analytics workloads (use a column store)

7.4 Wide-column stores (column families)

Mental model: a sparse, distributed, multi-dimensional sorted map.

(row_key, column_family:column, timestamp) → value

Rows are sorted by row_key. Designed for writes at scale (LSM internals) and range scans on row keys.

Bigtable

Google's original (2006). Powers Search, Maps, Gmail, Analytics, etc.

  • Row key is the only "index". You design the row key to encode access patterns.
  • Single-row transactions only.
  • Tablet servers manage ranges of rows; Master coordinates assignment; Chubby provides locking.
  • Underlying storage: GFS / Colossus.
  • SSTable + memtable + WAL (the LSM pattern).

Cassandra

Open-source descendant blending Bigtable's data model with Dynamo's leaderless replication.

  • Tunable consistency: per-query R, W, N (replicas needed for read, write, total). For strong: R + W > N.
  • No master: gossip protocol; any node can serve any request, coordinates writes via consistent hashing.
  • Partition key + clustering key: data within a partition is sorted by clustering key.
  • CQL: SQL-like query language. Don't be fooled — no joins, no subqueries, no GROUP BY across partitions.
  • Anti-entropy: read repair, hinted handoff, Merkle-tree based repair process.

Modeling rule: one table per query. Denormalize aggressively.

HBase

Open-source Bigtable clone, on top of HDFS. Strong row consistency. Region servers; ZooKeeper for coordination.

When wide-column shines

  • Time-series / event data (key = (device_id, ts), range scan on time)
  • Hot writes (LSM throughput)
  • Sparse columns (millions of possible columns per row, but each row has few)
  • Multi-PB scale

When it hurts

  • Ad-hoc queries (you'd need a new table)
  • Transactions across rows
  • Joins of any kind

7.5 Graph databases

Native graph storage and traversal.

Property graphs (Neo4j, Neptune, JanusGraph)

  • Nodes and edges, both with properties.
  • Cypher query language: MATCH (a:User)-[:FRIEND]->(b:User) RETURN a, b
  • Optimized index-free adjacency: each node has direct pointers to its edges → constant-time hop, regardless of graph size.

Triple stores (RDF: subject, predicate, object)

  • SPARQL query language.
  • Used in semantic web, knowledge graphs.

When a graph DB beats a relational DB

  • Variable-depth traversals: shortest path, k-hop neighbors
  • Frequent recursive queries (Postgres CTEs work but get slow)
  • Pattern matching on relationships

When NOT to use one

  • Most "social" features (timeline, follow counts) are bounded-depth and fit relational fine.
  • The data model is more important than the storage. Unless you genuinely live in graphland, the operational cost of another system isn't worth it.

7.6 The Dynamo paper (and its descendants)

Amazon Dynamo (2007 SOSP paper) introduced ideas now ubiquitous:

  1. Consistent hashing for partitioning (chapter 17).
  2. Sloppy quorum + hinted handoff for write availability when replicas are down.
  3. Vector clocks for conflict detection (chapter 14).
  4. Read repair + anti-entropy (Merkle trees) for eventual consistency.
  5. No master: any node can coordinate.

Direct descendants: Cassandra, Riak, Voldemort, ScyllaDB. Modern DynamoDB (the AWS service) is not the same; it uses single-leader replication per partition with multi-AZ failover.

7.7 BASE vs ACID

The marketing counterpart to ACID:

  • Basically Available
  • Soft state
  • Eventual consistency

In practice, "BASE" is just shorthand for "we picked AP over CP, and we'll converge eventually." Don't quote it as a principle; reason about your actual consistency needs and the actual guarantees your store provides.

7.8 Choosing a NoSQL store

WorkloadPick
Cache, session, ephemeral countersRedis
Massive scale (PB), wide-column, time-seriesBigtable / Cassandra
Document model, simple horizontal scaleMongoDB / Firestore
Single-digit ms KV at AWSDynamoDB
Graph, deep traversalsNeo4j
Search-as-a-featureElasticsearch

Default to Postgres; reach for these when the workload truly needs them.

7.9 What Google interviewers love to hear

  • You know Bigtable's data model, can explain why row key design matters, and how it relates to Cassandra/HBase.
  • You know Spanner's contribution: it brought relational schemas + ACID transactions + horizontal scale together via TrueTime + Paxos. Before Spanner, "NoSQL or scale" was a forced choice.
  • You can articulate trade-offs: why use a column-family store over Postgres for clickstream events, but Postgres for orders.

Key takeaways

  • "NoSQL" is four families: KV, document, wide-column, graph. Each solves a different problem.
  • Cassandra / Bigtable / HBase = wide-column for write-heavy, scaled-out, denormalized workloads.
  • Redis is a Swiss army knife: KV, queue, pub/sub, geo, leaderboard.
  • Distributed SQL (Spanner) erodes the "scale → must give up SQL" narrative.

// 0 views

main
UTF-8·typescript