22. Async Patterns: Pub/Sub, CQRS, Event Sourcing, Saga

The patterns in this chapter all flow from one insight: the act of changing state and the consequences of that change can be decoupled in time. Used wisely, this unlocks scale, integration, and auditability.

22.1 Pub/Sub vs queue (recap)

Queue: each message consumed by exactly one worker.
Pub/Sub: each message delivered to all subscribers.

Most modern brokers support both via consumer groups (Kafka): one consumer group per logical subscriber.

22.2 Event-driven architecture (EDA)

Services emit events ("something happened") rather than commanding others ("do this"). Other services react.

Example flow:

User places order → OrderService writes order, emits OrderPlaced event.
InventoryService subscribes → reserves stock.
PaymentService subscribes → charges card.
NotificationService subscribes → emails confirmation.
AnalyticsService subscribes → updates dashboards.

Adding a new behavior (e.g., loyalty points) = new subscriber, no change to OrderService.

Event vs command vs message

Command: imperative ("ChargeCard"). Targeted at a known recipient.
Event: declarative ("OrderPlaced"). Broadcast.
Message: any data passed between services.

Naming convention: events use past-tense verbs.

Event design

Include enough data so consumers don't need to call back ("event-carried state transfer").
But not so much that events become bloated and brittle.
Include event_id, event_type, event_version, occurred_at, aggregate_id, payload.
Schema-evolve carefully (chapter 8).

Trade-offs

+ Decoupled producers/consumers; new features without breaking changes.
+ Audit trail; replayable history.
− Hard to trace a flow; requires distributed tracing.
− Eventual consistency everywhere.
− Order matters but is harder to guarantee.

22.3 Event Sourcing

Don't store current state. Store the sequence of events that led to it. Derive state by replaying.

Example: bank account.

Traditional: a row {user, balance: 1500}.
Event sourcing: a stream Deposited(1000), Deposited(500), Withdrew(0). Current balance computed by sum.

Pros

Complete audit trail: every change is recorded with intent.
Time travel: state at any historical moment.
Replay for new projections: build a new view by re-running events.
Natural fit for CQRS: events are the input to read models.

Cons

Huge cognitive shift: most engineers have never modeled this way.
Event schemas evolve: events live forever; you must handle old schemas.
Snapshots needed: replaying millions of events is slow; periodically snapshot state.
Querying current state is non-trivial: you build read models for queries.
Overkill for most domains.

When it shines

Audit-heavy domains: finance, healthcare, legal.
Complex domain logic where causality matters.
Domains where new questions arise (replay events to answer).

When it hurts

CRUD apps where the only question is "what's the current state?"
Teams without strong DDD background.

Tools: EventStoreDB, Kafka (used as event store), Axon, custom on Postgres.

22.4 CQRS (Command Query Responsibility Segregation)

Separate the write model (commands) from the read model (queries).

Writes go through a command handler, validate, mutate.
Reads use a separate, optimized read model.
Read model is a projection of the write model (often eventually consistent).

Why

Read and write loads scale differently.
Reads need different shapes than writes.
One write event can update many read models (denormalized for fast queries).

Example

Write: PlaceOrder(items, address, payment) → write to orders table, emit OrderPlaced.
Read 1: customer order history view.
Read 2: warehouse pick list view.
Read 3: revenue dashboard.

Each read model is a separate denormalized projection updated from events.

CQRS without event sourcing

You can do CQRS without event sourcing — just write to the canonical store, then async-update read views. Common in any system with cache.

CQRS with event sourcing is the "full" pattern.

Cost

Eventual consistency between write and reads.
More moving parts.
"Where do I look up the current state?" answer becomes "depends on the question."

22.5 Saga (recap from chapter 16)

Multi-step distributed workflow with compensations.

Choreography: each service reacts to events; cycles emerge.
Orchestration: central coordinator (Temporal, Step Functions) drives.

Use when business operations span services and you need eventual completion.

22.6 The Outbox Pattern (deep dive)

The cornerstone of correct event-driven systems.

Problem

You write to your DB. Then you publish an event. If publish fails, DB has the change but no event. Inconsistency.

Solution

In one local transaction: write the change AND insert an outbox_events row with the event payload.
A separate process reads outbox rows → publishes to broker → marks outbox row sent.

BEGIN;
UPDATE orders SET status='paid' WHERE id=42;
INSERT INTO outbox(id, type, payload, created_at)
  VALUES (gen_uuid(), 'OrderPaid', '{"order_id":42}', NOW());
COMMIT;

Then a publisher polls (or streams via CDC) the outbox table and emits to Kafka. Once acked, marks published_at. Cleans up after retention.

CDC (Change Data Capture) variant

A CDC tool (Debezium) reads the DB's WAL/binlog and streams changes to Kafka. The outbox table is captured automatically — no polling, low latency.

This is the production way to do event-driven systems as of 2026.

22.7 Inbox pattern (consumer side)

Mirror of outbox.

Consumer receives event → checks inbox_events for event_id.
If present → already processed, skip.
Else → process, then write (event_id, processed_at) to inbox in the same transaction as the side effect.

Combined with outbox + at-least-once delivery → effectively-once processing.

22.8 Eventual consistency UX

Async patterns force the UI to deal with state in flight. Common patterns:

Optimistic UI: render assumed result immediately; reconcile later (Linear, Notion).
Status indicator: "Order placed (processing)" → "Order confirmed."
Polling: fetch status; show progress.
WebSocket update: server pushes when ready.

Hide eventual consistency from users who don't care; surface it when they do.

22.9 Idempotency, retries, dedup (recap)

Every async system needs:

Idempotency keys on commands.
Retries with backoff and jitter.
Dedup on the consumer (inbox).
DLQ for poison messages.

If you skip these, you'll have surprising bugs at 1 AM.

22.10 Choreography vs orchestration

Choreography:

Pro: decoupled, scales, no central failure point.
Con: hard to see the whole flow; cycles emerge; refactoring is painful.

Orchestration:

Pro: explicit workflow; observable; easier to debug; natural for human approvals.
Con: orchestrator is critical infra; another DB to operate.

Practical advice: start with choreography for simple flows; promote to orchestration when the flow has > 4-5 steps or human approvals.

Tools for orchestration:

Temporal: durable workflows; code-as-workflow; great DX. Best-in-class.
AWS Step Functions: serverless; JSON-defined state machines.
Apache Airflow: batch / data pipelines, less for online.
Cadence (Uber): predecessor to Temporal.

22.11 Event versioning

Events are forever. Schema evolution rules (chapter 8) apply hard.

Add fields with defaults.
Never remove fields.
New event types over modifying old ones.
Consumers must handle old + new versions during transition.

22.12 When NOT to go async

Synchronous, low-latency APIs (search results, login).
Tight transactional consistency required (move money between two of your accounts).
Tiny scale where the operational overhead isn't worth it.
Strong ordering with low throughput (queue paradigm fights you here).

A simple sync REST + DB is often the right answer. Async is a tool, not a religion.

22.13 Practical reference architecture

For a typical e-commerce or SaaS platform:

[ Client ]
    │
[ API Gateway ] ─── auth, rate limit, routing
    │
[ Service A ] ──┐
    │           │
[ Postgres A ]  │  outbox CDC → [ Kafka ]
                │
[ Service B ] ←─┘  reacts; updates own DB & emits
    │
[ Postgres B ]
    │
   ... and so on

Within Kafka:

Many topics by domain event.
Many consumer groups (one per consumer service).
Schema registry for compatibility.
Connectors to S3 (archive), Snowflake (analytics), Elasticsearch (search).

Key takeaways

Pub/sub + outbox is the modern reliable async pattern.
Event sourcing is powerful but heavy; reserve for audit-critical domains.
CQRS separates write & read models; pairs well with events.
Choreography for simple flows; orchestration (Temporal) when complexity grows.
Always: idempotency keys, backoff, dedup, DLQ.