3 min read By Vamsi Karuturi · Senior Backend Engineer at Salesforce

Event Sourcing

Real Incident: LMAX Exchange — 6 Million Transactions/Second on a Single Thread

In 2010, LMAX Exchange needed to process financial trades with microsecond latency. Traditional CRUD with a database? 10,000 ops/s max. Their solution: event sourcing with an in-memory model, replaying events at startup. The result — a single-threaded business logic processor handling 6 million transactions per second with deterministic replay for recovery. When a node crashes, it replays the event log and is back in service in seconds. Event sourcing didn't just solve their consistency problem — it gave them 600x throughput over traditional architectures.

Why This Comes Up in Interviews

Event sourcing appears in designs for financial systems, audit-heavy domains, and any system needing "time travel." Interviewers want to hear:

Why append-only is fundamentally different from update-in-place
How you handle event replay and system recovery
When CQRS is needed alongside event sourcing
Concrete trade-offs: storage cost, complexity, query patterns
How eventual consistency manifests and how you manage it

Event Sourcing vs Traditional CRUD

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph TB
    subgraph Traditional CRUD
        CMD1[Transfer $100] --> DB[(Database)]
        DB -->|UPDATE accounts SET balance = balance - 100| STATE1[Current State Only]
    end

    subgraph Event Sourcing
        CMD2[Transfer $100] --> STORE[(Event Store)]
        STORE --> E1[MoneyDeposited $500]
        STORE --> E2[MoneyWithdrawn $200]
        STORE --> E3[MoneyTransferred $100]
        E1 --> PROJECTION[Current State = Replay All Events]
        E2 --> PROJECTION
        E3 --> PROJECTION
    end

    style DB fill:#e74c3c,color:#fff
    style STORE fill:#27ae60,color:#fff
    style PROJECTION fill:#3498db,color:#fff

Aspect	Traditional CRUD	Event Sourcing
Storage	Current state only	Full history of changes
What's saved	`balance = 300`	`[deposited 500, withdrew 200, transferred 100]`
Audit trail	Must be bolted on (often incomplete)	Built-in, immutable, complete
Recovery	Restore from backup (point-in-time)	Replay events to any point in time
Schema changes	Migrate rows in-place	Old events preserved, new projections adapt
Debugging	"Why is balance 300?" — no idea	Replay shows exactly what happened and when
Storage cost	Lower (only current state)	Higher (every event stored forever)

Core Components

Component	Role	Example
Event	Immutable fact that happened	`OrderPlaced { orderId: 123, items: [...], timestamp: ... }`
Event Store	Append-only log of events	Kafka topic, EventStoreDB, DynamoDB stream
Aggregate	Domain entity rebuilt from events	`Order` aggregate replayed from its event stream
Projection	Read-optimized view built from events	"All orders by customer" table
Snapshot	Cached state at a point to avoid full replay	`Order#123 state at event #5000`

Event Store Design

Property	Requirement	Why
Append-only	Events are never updated or deleted	Immutability guarantees audit + replay correctness
Ordered	Events within a stream have strict ordering	Replay must be deterministic
Durable	Events survive crashes	They are the source of truth
Partitioned	Events grouped by aggregate ID	Efficient replay of single entity

Back-of-envelope: Storage sizing

Metric	Value
Events per order	~10 (placed, confirmed, paid, picked, shipped, delivered...)
Orders per day	1 million
Events per day	10 million
Average event size	500 bytes
Daily storage	5 GB
Annual storage	1.8 TB
With 3x replication	5.4 TB/year

At ~$0.02/GB/month for cold storage: **~$1,300/year** — cheap for a complete audit trail.

Snapshots — Avoiding Full Replay

The problem: An aggregate with 100,000 events takes seconds to replay on every request.

The solution: Periodically save a snapshot of the current state.

Strategy	How	When to Snapshot
Every N events	Snapshot after every 100 events	Simple, predictable
Time-based	Snapshot daily	Good for batch systems
On demand	Snapshot when replay > threshold	Adaptive

Replay with snapshot:

Load latest snapshot (state at event #9500)
Replay only events #9501 through #9612 (112 events instead of 9612)
Latency: 2ms instead of 200ms

CQRS Integration

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph LR
    subgraph Write Side
        CMD[Command] --> HANDLER[Command Handler]
        HANDLER --> VALIDATE{Validate}
        VALIDATE -->|OK| STORE[(Event Store)]
    end

    subgraph Read Side
        STORE -->|Subscribe| PROJECTOR[Event Projector]
        PROJECTOR --> READDB[(Read DB - Optimized)]
        QUERY[Query] --> READDB
    end

    style STORE fill:#27ae60,color:#fff
    style READDB fill:#3498db,color:#fff
    style HANDLER fill:#f39c12,color:#fff

Aspect	Write Model	Read Model
Optimized for	Consistency, business rules	Query speed, denormalization
Storage	Event store (append-only)	SQL/NoSQL/Elasticsearch (per query need)
Schema	Events (immutable facts)	Projections (derived, rebuildable)
Scaling	Fewer writes than reads	Scale independently, multiple projections
Consistency	Strongly consistent	Eventually consistent (lag = projection delay)

Why CQRS + Event Sourcing pair naturally: Events are great for writes (append-only, fast) but terrible for reads ("show all orders over $100 this month" requires replaying millions of events). CQRS lets you build query-optimized read models from the same event stream.

Event Versioning — Schema Evolution

Events are stored forever, but your domain evolves. How do you handle old events?

Strategy	How	Pros	Cons
Upcasting	Transform old events to new schema on read	Old events untouched, transparent	Upcast logic grows over time
Versioned events	`OrderPlacedV1`, `OrderPlacedV2`	Explicit, clear	Multiple handler versions
Weak schema	Store events as flexible JSON	Forward compatible	No compile-time safety
Copy-transform	Migrate event store to new schema	Clean slate	Expensive, risky for large stores

Best practice: Use upcasting for minor changes, versioned events for breaking changes. Never modify stored events.

Eventual Consistency — The Trade-off

Scenario	Consistency Lag	Acceptable?
User places order, sees it in "My Orders"	50-200ms projection lag	Usually yes (show optimistic UI)
Payment processed, inventory updated	100-500ms	Yes (async is fine)
Admin views real-time dashboard	1-5s	Yes (dashboards tolerate lag)
User withdraws money, checks balance	Must be immediate	No — use read-your-own-writes pattern

Read-your-own-writes pattern: After a command succeeds, the response includes the new version number. Client passes this version to read queries. Read side either waits for projection to catch up or reads directly from event store.

When to Use Event Sourcing

Good Fit	Poor Fit
Financial systems (audit trail required)	Simple CRUD apps (blog, todo list)
Systems needing "time travel" / replay	Systems where history has no value
Complex domain with many state transitions	High-write, low-read systems with no audit needs
Regulatory compliance (GDPR right to explanation)	Teams unfamiliar with eventual consistency
Event-driven microservices	Tight latency requirements on reads (without CQRS)

Real-World Implementations

System	How They Use Event Sourcing
LMAX	In-memory event sourcing, single-threaded replay, 6M tx/s
Event Store (Greg Young)	Purpose-built event store database with projections
Kafka	Append-only log as event store (with compaction trade-offs)
Axon Framework	Java framework for event sourcing + CQRS
DynamoDB Streams	Event stream from DynamoDB changes (CDC pattern)

Interview Framework

When event sourcing is relevant in system design:

Step 1: "For this domain (e.g., banking, e-commerce orders), I'd use event sourcing because we need a complete audit trail and the ability to reconstruct state at any point in time."

Step 2: "Every state change is stored as an immutable event in an append-only log. The current state is derived by replaying events. For performance, I'd snapshot every 100 events to avoid full replay."

Step 3: "For reads, I'd use CQRS — project events into query-optimized read models. This gives us fast reads without compromising the write model's simplicity."

Step 4: "The trade-off is eventual consistency between write and read sides (~100-200ms lag) and increased storage. But we gain: complete audit, easy debugging, ability to rebuild read models, and replay for recovery."

Quick Recall

Question	Answer
What is event sourcing?	Store state changes as immutable events; derive current state by replay
Why pair with CQRS?	Events are write-optimized; CQRS adds read-optimized projections
Snapshot purpose?	Avoid replaying entire event history; checkpoint state periodically
Biggest trade-off?	Eventual consistency on reads + storage growth + complexity
Event versioning?	Upcasting (transform old to new on read) or versioned event types
When NOT to use?	Simple CRUD, no audit needs, team inexperienced with async patterns
LMAX throughput?	6 million tx/s single-threaded via in-memory event sourcing
Storage cost estimate?	~1.8 TB/year for 1M orders/day at 10 events/order