◐System Design
DDIA chapters, common designs, capacity planning, communication patterns.
fundamentals.md~5m1. Fundamentals: Reliability, Scalability, Maintainability
Three properties define a "good" production system. Every architectural choice is a trade-off between them.
numbers-estimation.md~5m2. Numbers & Capacity Estimation
You will estimate in every interview. The math is trivial; what matters is *speed and confidence*. Memorize the numbers below cold.
networking.md~7m3. Networking: TCP, HTTP, DNS, WebSockets
Distributed systems are conversations across networks. You must know what each layer guarantees and what it doesn't.
data-models.md~5m4. Data Models: Relational, Document, Graph
The data model is the *single most important architectural decision*. It shapes how you think, what queries are easy, what queries are impossible, and what your scale ceiling looks like.
storage-engines.md~6m5. Storage Engines: B-Trees vs LSM Trees
A *storage engine* sits below your query layer and decides how data lives on disk. The choice — B-tree or LSM-tree — drives performance characteristics that ripple through every higher layer.
sql-deep-dive.md~6m6. SQL Deep Dive: ACID, Isolation, MVCC
SQL is decades old and still the right default. To design with it well you must understand what transactions actually guarantee — and where they leak.
nosql-deep-dive.md~6m7. NoSQL Deep Dive: KV, Document, Wide-Column, Graph
"NoSQL" is a marketing umbrella for "not relational." The differences within NoSQL are larger than the differences from SQL. You must reason about each model on its own terms.
encoding-evolution.md~5m8. Encoding & Schema Evolution: JSON, Protobuf, Avro
Programs run on objects in memory. Networks and disks need bytes. Encoding is the bridge — and it's where forward/backward compatibility lives or dies.
caching.md~6m9. Caching: Strategies, Eviction, Redis vs Memcached
Caching is the highest-leverage optimization in system design. Memory is ~100,000× faster than disk seek; a cache that absorbs even 80% of reads is the difference between a melted database and a cool one.
replication.md~6m10. Replication: Single-Leader, Multi-Leader, Leaderless
Replication = keeping the same data on multiple machines. Why? Availability (survive node loss), read scaling (more replicas = more read capacity), latency (replica close to user), disaster recovery (cross-region copi…
partitioning.md~6m11. Partitioning & Sharding
When one machine can't hold all your data, you split it across many. Partitioning (= sharding) is how. The decision is irreversible-ish — pick wrong and you'll spend years regretting it.
consistency-cap.md~6m12. Consistency Models, CAP, PACELC
In a distributed system, "consistency" isn't binary. It's a spectrum. Knowing where your system sits — and where you *want* it to sit — is core to every design.
distributed-fundamentals.md~6m13. Distributed Systems Fundamentals: Faults & Networks
A distributed system is one where "a computer you didn't even know existed can render your computer unusable" (Lamport). Understanding *why* distributed systems are hard is the first step to designing them well.
clocks-time.md~5m14. Time, Clocks, and Ordering (Lamport, Vector, TrueTime)
In distributed systems, "what happened first?" is a much harder question than it sounds. Wall clocks lie. The real foundation is causality, captured by logical clocks.
consensus.md~6m15. Consensus: Paxos, Raft, ZAB
Consensus = a group of nodes agreeing on a single value (or a sequence of values), even when some nodes fail. It's the kernel of every reliable distributed system.
transactions.md~6m16. Transactions: Local, Distributed, 2PC, Saga, TCC
A transaction is a unit of work that should appear atomic. Local transactions are well-understood. Across services or shards, it gets dramatically harder.
distributed-data-structures.md~6m17. Probabilistic Structures: Bloom, Consistent Hashing, Merkle, HLL, Count-Min
A handful of clever data structures show up everywhere in distributed systems. Knowing them is interview table stakes.
load-balancing.md~6m18. Load Balancing: L4 vs L7, Algorithms
A load balancer (LB) distributes incoming requests across many servers. It's the first line of defense against an unhealthy node and the first lever for horizontal scale.
message-queues.md~6m19. Message Queues & Event Streaming
Async messaging decouples producers from consumers. Done well, it absorbs spikes, enables independent scaling, and provides reliability through retry. Done poorly, it creates inscrutable distributed bugs.
microservices.md~6m20. Microservices, Monoliths, Service Mesh
Microservices vs monolith is a religious war that should be a pragmatic choice. Both have their place. The right answer depends on team size, change rate, and operational maturity.
api-design.md~6m21. API Design: REST, GraphQL, gRPC
APIs are the contracts between systems. Designed well, they outlive their authors. Designed poorly, they outlive them too — painfully.
async-patterns.md~6m22. Async Patterns: Pub/Sub, CQRS, Event Sourcing, Saga
The patterns in this chapter all flow from one insight: the act of changing state and the consequences of that change can be decoupled in time. Used wisely, this unlocks scale, integration, and auditability.
rate-limiting.md~5m23. Rate Limiting & Throttling
Rate limiting protects your service from abuse, accidental hammering, and runaway clients. It's also a productizable feature (per-tier quotas). The algorithms are simple; the operational details are where it gets inte…
cdn-edge.md~6m24. CDN & Edge Computing
A CDN (Content Delivery Network) is a geographically-distributed cache + acceleration tier between your origin and your users. Done well, it serves 90%+ of bytes from the edge — cutting latency, bandwidth costs, and o…
search-systems.md~6m25. Search Systems: Inverted Index, Elasticsearch
"Find documents matching this query, ranked by relevance" is a deceptively complex problem. The inverted index is the foundational data structure; modern search systems layer ranking, filtering, and faceting on top.
observability.md~7m26. Observability: Logs, Metrics, Traces, SLOs
You can't operate what you can't see. Observability = the ability to ask arbitrary questions about your system's behavior in production. The "three pillars" are logs, metrics, traces — but the goal is unified investig…
security.md~7m27. Security: TLS, OAuth, JWT, OWASP
Security in a system design interview = a tax that pays itself back manyfold. Senior engineers think about it from the start; juniors bolt it on.
containers-orchestration.md~6m28. Containers, Kubernetes, Borg
Containers won the deployment war; Kubernetes won the orchestration war. Knowing how they work is now table stakes for any system design discussion.
deployment-patterns.md~6m29. Deployment Patterns: Blue/Green, Canary, Feature Flags
Shipping code is a system design problem. Bad deployment practices are how systems with great architecture still take outages.
batch-processing.md~6m30. Batch Processing: MapReduce & Spark
Batch processing handles large bounded data sets — terabytes to petabytes — producing reports, models, and derived data. The patterns shaped modern data engineering and big-data system design.
stream-processing.md~6m31. Stream Processing: Kafka Streams, Flink, Exactly-Once
Stream processing operates on unbounded data — events arriving continuously. Done right, you replace nightly batch jobs with seconds-fresh insights and react to events as they happen.
design-url-shortener.md~4m32. Design a URL Shortener (TinyURL / bit.ly)
The classic warm-up. Easy to start, deceptively rich. Gets at: ID generation, KV stores, caching, redirect mechanics, analytics fan-out.
design-twitter.md~4m33. Design Twitter / News Feed
The canonical fan-out problem. Tests: timeline construction, push vs pull, hot keys (celebrities), denormalization, caching at scale.
design-youtube.md~5m34. Design YouTube / Video Streaming
Tests: object storage, video encoding pipeline, CDN, adaptive bitrate streaming, recommendation, comments, view counting at scale.
design-uber.md~4m35. Design Uber / Ride Hailing
Tests: real-time location, geo-indexing (S2 / H3), matching, dispatch, payments, surge pricing, websockets at scale.
design-whatsapp.md~4m36. Design WhatsApp / Chat at Scale
Tests: long-lived connections, fan-out for messages, message ordering, presence, end-to-end encryption, group chat.
design-dropbox.md~4m37. Design Dropbox / Google Drive
Tests: file storage at scale, chunking, deduplication, sync engine, conflict resolution, sharing & permissions.
design-web-crawler.md~4m38. Design a Web Crawler
Tests: BFS at planet scale, politeness, deduplication, fault tolerance, distributed coordination.
design-typeahead.md~4m39. Design Typeahead / Autocomplete
Tests: low-latency suggestions, prefix data structures (trie / FST), ranking, real-time updates, caching.
design-yelp.md~2m40. Design Yelp / Geo-Search
Tests: spatial indexing, faceted search, ranking, photos, reviews, abuse handling.
design-distributed-cache.md~4m41. Design a Distributed Cache
Tests: consistent hashing, replication, eviction, hot keys, cluster membership, failure handling.
design-rate-limiter.md~4m42. Design a Rate Limiter Service
Tests: distributed counter consistency, token bucket math, hot key strategies, latency budget.
design-notification-system.md~4m43. Design a Notification System
Tests: fan-out, delivery via multiple channels (push/email/SMS), retries, idempotency, rate limiting, user preferences.
design-payment-system.md~4m44. Design a Payment System
Tests: idempotency, distributed transactions, integration with PSPs, ledger, fraud, regulatory compliance.
design-ad-system.md~4m45. Design an Ad Click / Counting System
Tests: extreme write throughput, deduplication, fraud detection, real-time aggregation, billing accuracy.
interview-framework.md~6m46. The Interview Framework (RESHADED)
System design interviews aren't about getting the "right" answer; they're about demonstrating structured thinking under uncertainty. The interviewer is hiring for whether you can lead a design discussion in a real mee…
common-tradeoffs.md~6m47. Cheat Sheet: Common Trade-offs & Patterns
Quick reference. The interviewer rewards reasoning about trade-offs; this distills the recurring ones.
google-systems.md~5m48. Google Papers: GFS, Bigtable, Spanner, MapReduce, Borg
Google publishes its internal systems via papers. The ideas in those papers seeded most modern distributed-systems infrastructure. For a Google interview, knowing them — at least at high level — signals fluency with t…
readme.md~2mSystem Design — End-to-End Curriculum
A comprehensive system design reference built for Google L3/L4 prep. Written from first principles, with trade-offs explicitly called out, and Google-relevant systems (GFS, Bigtable, Spanner, Borg) folded in.
system-design-syllabus.md~3mSystem Design Syllabus — 24 Weeks
Phase 2. Run in parallel with Neetcode (Phase 1) at 30 min/day, then ramp to 1h/day from Aug.