1 min readBy Vamsi Karuturi · Senior Backend Engineer at Salesforce
Welcome to VamsiLabs
Sign in to unlock 100+ deep-dive notes, track your progress, and prep for FAANG interviews.
or use email
Enter your email and we'll send a reset link.
Multi-Region Architecture
Real Incident: AWS US-EAST-1 Outage (December 2021)
A networking issue in AWS's primary region took down thousands of services for 7+ hours — including Ring doorbells, Disney+, Tinder, and parts of AWS's own console. Companies with multi-region failover (Netflix, Slack) continued operating seamlessly. Single-region = single point of failure. Multi-region is the only path to true high availability.
Why This Comes Up in Interviews
Any system design targeting 99.99%+ availability or serving global users needs multi-region discussion. Interviewers want to hear:
Active-active vs active-passive trade-offs
Data replication and consistency challenges across regions
How you handle failover (DNS, traffic manager, health checks)
Both regions serve traffic simultaneously, data replicated bidirectionally
Failover
Automatic — healthy region absorbs failed region's traffic
Data conflicts
Possible — need conflict resolution (LWW, CRDTs, merge logic)
Cost
~2x+ (both regions fully provisioned)
Best for
Global low-latency, 99.99% availability requirements
3. Follow-the-Sun (Regional Primary)
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
subgraph "Each user's writes go to their home region"
US[US Region<br/>Primary for US users]
EU[EU Region<br/>Primary for EU users]
AP[APAC Region<br/>Primary for APAC users]
end
US <-->|Async replication| EU
EU <-->|Async replication| AP
AP <-->|Async replication| US
style US fill:#3b82f6,color:#fff
style EU fill:#22c55e,color:#fff
style AP fill:#f59e0b,color:#fff
Each user has a "home region" based on geography. Writes go to home region, reads can be served from any region (eventually consistent).
"Multi-region active-active. GeoDNS routes users to nearest region. If a region fails, traffic shifts to remaining regions in <30s. Data replicated asynchronously with conflict resolution."
"Active-passive vs active-active?"
"Active-passive: simpler, cheaper, but 30-120s failover + possible data loss. Active-active: instant failover, better latency, but need conflict resolution for concurrent writes."
"How to handle data consistency?"
"Route each user's writes to their home region (avoid conflicts). Cross-region reads are eventually consistent. For critical data (payments): synchronous replication to at least one other region."
"What about cost?"
"Active-active is ~2x cost. Justified when downtime costs exceed infrastructure costs. For e-commerce doing $1M/hour, 1 hour downtime > yearly multi-region premium."
"Conflict resolution?"
"Avoid conflicts by design: user writes go to home region. If unavoidable: LWW for simple data, CRDTs for counters/sets, application merge for complex business logic."