◐ system-design/networking.md
3. Networking: TCP, HTTP, DNS, WebSockets
Distributed systems are conversations across networks. You must know what each layer guarantees and what it doesn't.
~7 min read·updated 5/29/2026
3. Networking: TCP, HTTP, DNS, WebSockets
Distributed systems are conversations across networks. You must know what each layer guarantees and what it doesn't.
3.1 The OSI / TCP-IP stack
| Layer | Examples | What it does |
|---|---|---|
| L7 Application | HTTP, gRPC, DNS, SMTP | App-level protocols |
| L6 Presentation | TLS, JSON, Protobuf | Encoding, encryption |
| L5 Session | (rarely a separate thing in practice) | Conversation state |
| L4 Transport | TCP, UDP, QUIC | End-to-end delivery semantics |
| L3 Network | IP, ICMP | Routing across networks |
| L2 Data Link | Ethernet, Wi-Fi | Local network frames |
| L1 Physical | Copper, fiber, radio | Bits on wire |
In design interviews you mostly live at L4 (TCP vs UDP) and L7 (HTTP, gRPC, WebSockets). LB types are also categorized as L4 (transport-aware) vs L7 (HTTP-aware).
3.2 TCP
Transmission Control Protocol. Reliable, ordered, connection-oriented byte stream. The bedrock of HTTP, SMTP, SSH, MySQL wire protocol, etc.
Guarantees:
- Bytes delivered in order
- Bytes delivered exactly once (no duplicates surfaced to app)
- Lost bytes are retransmitted
- Flow control: receiver can slow down sender
- Congestion control: TCP backs off when network is loaded
The 3-way handshake
Client Server
| -------- SYN ------> |
| <----- SYN-ACK ----- |
| -------- ACK ------> |
(connection established)
This costs 1 RTT before any data can flow. Cross-continent (~150ms RTT), this is huge. TCP Fast Open and HTTP/2 connection reuse mitigate.
TCP gotchas
- Head-of-line blocking: TCP delivers in order. If packet 5 is lost, packets 6-10 wait in the kernel until 5 is retransmitted, even if your app doesn't care about ordering. HTTP/2 multiplexes streams over one TCP connection, so a single lost packet stalls all streams. HTTP/3 (over QUIC/UDP) fixes this.
- Connection state lives in the kernel. Each connection costs memory. C10K problem (10K concurrent connections) is solved; C10M (10M) requires kernel-bypass (DPDK, io_uring).
- TIME_WAIT. Closed connections linger ~60s in TIME_WAIT, holding port numbers. High-churn outbound clients can exhaust ephemeral ports.
3.3 UDP
User Datagram Protocol. Unreliable, unordered, connectionless datagram. Send a packet, hope it arrives.
Why use it?
- Lower latency (no handshake, no ACKs blocking on retransmission)
- Acceptable for: DNS queries (retry at app layer), VoIP (drop a sample, keep going), video conferencing, real-time games, QUIC (TCP-replacement)
UDP is a building block. Reliability is added at the app layer if needed.
3.4 HTTP
Application protocol on top of TCP (HTTP/1.1, HTTP/2) or QUIC (HTTP/3).
HTTP/1.0 → HTTP/1.1
- HTTP/1.0: one request per TCP connection. Slow.
- HTTP/1.1: persistent connections (keep-alive), pipelining (rarely used due to head-of-line). Six parallel connections per origin (browser limit).
HTTP/2
- Binary framing (no more parsing text headers)
- Multiplexing: many parallel streams over one TCP connection. Reduces TCP overhead, but TCP head-of-line blocking still bites a lost packet.
- Header compression (HPACK): repeated headers cost ~0 bytes after the first request.
- Server push (mostly removed; browsers deprecated it).
HTTP/3 (QUIC)
- Built on UDP, not TCP.
- No TCP head-of-line blocking: each stream is independent.
- 0-RTT resumption: returning clients can send data in the very first packet.
- Built-in TLS 1.3: handshake combined with transport.
- Adopted by Google, Meta, Cloudflare; ~30%+ of web traffic by 2024.
HTTP methods
| Method | Idempotent | Safe | Body |
|---|---|---|---|
| GET | Yes | Yes | No |
| HEAD | Yes | Yes | No |
| POST | No | No | Yes |
| PUT | Yes | No | Yes |
| DELETE | Yes | No | No |
| PATCH | No | No | Yes |
| OPTIONS | Yes | Yes | No |
Idempotent = same request multiple times has the same effect as once. Critical for retries: only retry idempotent methods automatically. POST is not idempotent unless you add an idempotency key (e.g., Stripe's Idempotency-Key header).
Status codes you must know
- 2xx success: 200 OK, 201 Created, 202 Accepted (async), 204 No Content
- 3xx redirect: 301 Moved Permanently, 302 Found, 304 Not Modified (caching)
- 4xx client error: 400 Bad Request, 401 Unauthorized (no auth), 403 Forbidden (auth but no permission), 404 Not Found, 409 Conflict, 410 Gone, 422 Unprocessable Entity, 429 Too Many Requests
- 5xx server error: 500 Internal, 502 Bad Gateway, 503 Service Unavailable (load shedding), 504 Gateway Timeout
Use 503 with Retry-After header for rate limit / overload — it tells well-behaved clients to back off.
Caching headers
Cache-Control: max-age=3600, public— cacheable for 1 hourCache-Control: no-store— don't cache anywhereCache-Control: private— browser only, not CDNETag: "abc123"+If-None-Match: "abc123"→ 304 if unchanged. Saves bandwidth.Last-Modified+If-Modified-Since→ similar, time-based.
3.5 HTTPS / TLS
TLS = Transport Layer Security. The thing the S adds in HTTPS. Provides:
- Confidentiality: encrypted in transit.
- Integrity: tamper-detected.
- Authentication: you're talking to the real server (via X.509 certificates).
TLS handshake (TLS 1.2 vs 1.3)
- TLS 1.2: 2 RTTs.
- TLS 1.3: 1 RTT (or 0 RTT for resumed sessions). All weak ciphers removed. Mandatory forward secrecy.
Certificates
A TLS cert binds a domain to a public key, signed by a CA (Certificate Authority). Browsers ship a list of trusted CAs. Let's Encrypt made free certs ubiquitous.
For internal services: mTLS (mutual TLS). Both client and server present certs. SPIFFE/SPIRE provides workload identity in service meshes.
3.6 DNS
Domain Name System. Translates google.com → 142.250.190.46.
Hierarchy
- Root servers (
.) - TLD servers (
.com,.org, country codes) - Authoritative servers for the zone (
google.com.) - Recursive resolvers (your ISP, 8.8.8.8, 1.1.1.1) — they ask 1, 2, 3 on your behalf and cache.
Record types
- A: IPv4 address
- AAAA: IPv6 address
- CNAME: alias (
www→example.com) - MX: mail server
- TXT: arbitrary text (used for SPF, DKIM, domain verification)
- NS: nameserver delegation
- SOA: start of authority (zone metadata)
TTL trade-off
- Long TTL → more cache hits, faster lookups, slower failover (clients use stale records).
- Short TTL → fast failover, more DNS load.
For DR and rapid failover, set TTL to 60s for the records that need to fail over. Pay the lookup cost.
DNS-based load balancing
Return multiple A records (round-robin) or use geo-DNS (return the IP closest to the client). Limited because clients cache. Used in CDN front doors (Akamai, CloudFront).
3.7 Real-time / push: WebSockets, SSE, polling
How does the server push to the client?
| Technique | Direction | Complexity | When to use |
|---|---|---|---|
| Polling | client pulls every X sec | trivial | infrequent updates, prototypes |
| Long polling | client opens, server holds, replies on event | moderate | moderate update rate, no WS infra |
| Server-Sent Events (SSE) | server → client only, over HTTP | low | dashboards, notifications, one-way streams |
| WebSockets | full-duplex | moderate | chat, collab editing, gaming |
| gRPC streaming | uni or bi-directional | moderate (HTTP/2) | service-to-service |
| WebRTC | peer-to-peer over UDP | high | video/voice, low latency |
WebSocket essentials
- Starts as HTTP, upgrades via
Upgrade: websocket. - Persistent TCP connection, full-duplex.
- Each connection costs memory on server (~10s of KB). 1M concurrent WS connections needs careful tuning (kernel params, file descriptors). Discord runs millions per node.
- Stateful: load balancer must route the same client to the same node, or have a shared pub/sub layer for fan-out.
Push at massive scale (chat/feeds)
- Persistent gateway tier (WS endpoints).
- A pub/sub backbone (Redis pub/sub, Kafka, Pulsar) so any backend can push to any user without knowing which gateway holds the connection.
- "Connection routing" map: user_id → gateway_id, kept in a fast KV store.
3.8 Network failure modes you must design for
- Packets lost or delayed indefinitely. Always set timeouts.
- Asymmetric partitions. A → B works; B → A doesn't.
- Slow but not dead. "Gray failures." Health checks pass, real requests time out. Hardest to detect. Tail latency monitoring helps.
- Clock skew. Don't trust client time. Even servers drift; NTP corrects ~ms-level. (See chapter 14.)
- DNS poisoning / hijacking. Use DNSSEC where it matters; use TLS to authenticate the destination.
Key takeaways
- TCP is reliable, in-order, but pays an RTT to handshake. UDP is fire-and-forget; HTTP/3 builds reliable streams over it without TCP's head-of-line.
- HTTP methods have semantics — only retry idempotent ones. Use idempotency keys for POST.
- TLS 1.3 is 1 RTT (or 0); always use it.
- DNS TTL is a knob: short for failover, long for performance.
- Real-time push: pick polling → SSE → WebSocket as needs grow.
// 1 view