Design a Payment System
Handle millions of transactions with exactly-once processing, multi-currency support, and zero tolerance for data loss — the most critical system in any e-commerce platform.
Why This Is Asked
Payment systems test your understanding of distributed transactions, idempotency, eventual consistency, security, and fault tolerance — all in one question. Every fintech, e-commerce, and marketplace interview includes this.
Requirements
Functional
- Process payments (credit card, debit, wallets, bank transfer)
- Support multiple currencies with real-time exchange rates
- Handle refunds (full and partial)
- Payment status tracking (pending → processing → completed/failed)
- Retry failed payments with exponential backoff
- Reconciliation with payment providers (Stripe, PayPal)
Non-Functional
- Exactly-once processing — charge the customer exactly once (never double-charge)
- High availability — 99.99% uptime (4 min downtime/year)
- Low latency — < 500ms for payment initiation
- Auditability — full audit trail of every state change
- Security — PCI DSS compliance, tokenization, encryption at rest
Scale
- 1 million transactions/day (~12 TPS average, 100+ TPS peak)
- $10 billion annual volume
High-Level Architecture
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart TD
Client(("Client")) --> API{{"Payment API<br/>(idempotent)"}}
API --> PS["Payment Service<br/>(orchestrator)"]
PS --> PSP["Payment Service Provider<br/>(Stripe, Adyen, PayPal)"]
PS --> Ledger["Ledger Service<br/>(double-entry bookkeeping)"]
PS --> Queue["Message Queue<br/>(Kafka)"]
Queue --> Reconciliation["Reconciliation Service<br/>(daily batch)"]
Queue --> Notification["Notification Service"]
PS --> DB[("Payment DB<br/>(PostgreSQL)")]
Ledger --> LDB[("Ledger DB")]
style PS fill:#E3F2FD,stroke:#1565C0,color:#000
style Ledger fill:#E8F5E9,stroke:#2E7D32,color:#000
style PSP fill:#FEF3C7,stroke:#D97706,color:#000 Key Design Decisions
1. Idempotency — The Most Critical Requirement
The Double-Charge Problem
User clicks "Pay" → request times out → user retries → server processed BOTH requests → charged twice. This is the #1 bug in payment systems.
// Client sends an idempotency key with every request
POST /payments
Headers: Idempotency-Key: "pay_7f3a8b2c-unique-uuid"
// Server implementation
@Transactional
public PaymentResponse processPayment(String idempotencyKey, PaymentRequest request) {
// Check if we've seen this key before
Optional<Payment> existing = paymentRepository.findByIdempotencyKey(idempotencyKey);
if (existing.isPresent()) {
return existing.get().toResponse(); // return cached result, don't re-process!
}
// First time seeing this key — process normally
Payment payment = createPayment(request);
payment.setIdempotencyKey(idempotencyKey);
paymentRepository.save(payment);
return chargePaymentProvider(payment);
}
-- Idempotency key is UNIQUE — database guarantees no duplicates
CREATE TABLE payments (
id UUID PRIMARY KEY,
idempotency_key VARCHAR(255) UNIQUE NOT NULL,
status VARCHAR(50) NOT NULL,
amount DECIMAL(19,4) NOT NULL,
currency VARCHAR(3) NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL
);
2. Payment State Machine
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
stateDiagram-v2
[*] --> CREATED: Payment initiated
CREATED --> PROCESSING: Sent to PSP
PROCESSING --> COMPLETED: PSP confirms success
PROCESSING --> FAILED: PSP rejects
PROCESSING --> TIMEOUT: No response
TIMEOUT --> PROCESSING: Retry (max 3)
TIMEOUT --> FAILED: Max retries exceeded
FAILED --> CREATED: User retries
COMPLETED --> REFUND_PENDING: Refund requested
REFUND_PENDING --> REFUNDED: Refund confirmed 3. Double-Entry Ledger
Every payment creates TWO ledger entries (debit and credit). The sum of all entries must always be zero.
-- Ledger entries (immutable — never update, only append)
CREATE TABLE ledger_entries (
id BIGSERIAL PRIMARY KEY,
payment_id UUID NOT NULL,
account_id VARCHAR(255) NOT NULL, -- "customer:123" or "merchant:456"
entry_type VARCHAR(10) NOT NULL, -- 'DEBIT' or 'CREDIT'
amount DECIMAL(19,4) NOT NULL,
currency VARCHAR(3) NOT NULL,
created_at TIMESTAMP NOT NULL
);
-- For a $100 payment from customer to merchant:
-- Entry 1: DEBIT customer_account $100 (money leaves customer)
-- Entry 2: CREDIT merchant_account $100 (money enters merchant)
-- SUM of all entries = $0 (balanced!)
4. Handling PSP Failures
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
sequenceDiagram
participant C as Client
participant PS as Payment Service
participant PSP as Stripe/Adyen
participant Q as Kafka
C->>PS: POST /payments (idempotency_key)
PS->>PS: Save payment (status=CREATED)
PS->>PSP: Charge $50.00
alt Success
PSP->>PS: 200 OK (charge_id)
PS->>PS: Update status=COMPLETED
PS->>Q: Publish PaymentCompleted event
PS->>C: 200 OK
else Timeout / 5xx
PSP--xPS: No response / 500
PS->>PS: Update status=TIMEOUT
PS->>Q: Publish PaymentRetry event
Note over PS: Retry worker picks up after backoff
PS->>C: 202 Accepted (async processing)
else Declined
PSP->>PS: 402 Declined
PS->>PS: Update status=FAILED
PS->>C: 400 Payment Declined
end Reconciliation
Daily batch job that compares your records with the PSP's records to catch discrepancies:
| Your System Says | PSP Says | Action |
|---|---|---|
| COMPLETED | COMPLETED | ✅ Match — no action |
| COMPLETED | NOT FOUND | ⚠️ Phantom charge — investigate |
| TIMEOUT | COMPLETED | ⚠️ Update to COMPLETED, notify customer |
| TIMEOUT | NOT FOUND | ✅ Never charged — safe to retry or close |
| FAILED | COMPLETED | 🚨 Customer was charged but told it failed! Fix immediately |
T+1 Batch Reconciliation Process
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
sequenceDiagram
participant R as Reconciliation Service
participant DB as Payment DB
participant PSP as Stripe Settlement API
participant Alert as Alert System
Note over R: Runs daily at 2 AM (T+1)
R->>PSP: GET /settlements?date=yesterday
PSP->>R: Settlement file (all charges + refunds)
R->>DB: SELECT * FROM payments WHERE date = yesterday
R->>R: Compare line-by-line (amount, status, currency)
alt Mismatch found
R->>DB: INSERT INTO reconciliation_exceptions
R->>Alert: Page on-call if amount > $1000
end
Note over R: Generate daily report for finance team The settlement file from PSPs arrives T+1 (next business day). You cannot reconcile in real-time — PSPs batch their settlements. Your reconciliation service must handle:
- Currency rounding — PSP may round differently than you (match within ±$0.01)
- Timezone mismatches — your "yesterday" might span two days in the PSP's timezone
- Partial settlements — large merchants get multiple settlement files per day
The Late-Arriving Webhook Problem
The Scariest Edge Case
Your system says COMPLETED → you show the user a success page → 30 seconds later, Stripe sends a webhook saying the charge was declined by the issuing bank. This happens with 3D Secure, bank-level fraud checks, and cross-border transactions where the issuer does async verification.
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
sequenceDiagram
participant C as Client
participant PS as Payment Service
participant PSP as Stripe
participant Bank as Issuing Bank
C->>PS: Pay $200
PS->>PSP: Charge request
PSP->>PS: 200 OK (status: succeeded)
PS->>C: ✅ "Payment successful!"
Note over C: User sees success, leaves page
PSP->>Bank: Authorization (async for some issuers)
Bank->>PSP: DECLINED (fraud rule triggered)
PSP->>PS: Webhook: charge.dispute / charge.refunded
Note over PS: 😱 We already told the user it succeeded! How to handle it:
@PostMapping("/webhooks/stripe")
public ResponseEntity<Void> handleWebhook(@RequestBody StripeEvent event) {
if (event.getType().equals("charge.failed") || event.getType().equals("charge.disputed")) {
Payment payment = paymentRepo.findByPspChargeId(event.getChargeId());
if (payment.getStatus() == PaymentStatus.COMPLETED) {
// Late reversal — we already told the user it worked
payment.setStatus(PaymentStatus.REVERSED);
paymentRepo.save(payment);
// Reverse the ledger entries
ledgerService.reverseEntry(payment.getId());
// Notify the user immediately
notificationService.sendPaymentReversed(payment.getUserId(), payment);
// If order was fulfilled, trigger compensation
if (orderService.isShipped(payment.getOrderId())) {
compensationService.createRecoveryCase(payment);
}
}
}
return ResponseEntity.ok().build();
}
Design principles for late webhooks:
| Principle | Implementation |
|---|---|
Never treat PSP 200 OK as final | Mark as COMPLETED_PENDING_SETTLEMENT internally |
| Webhook idempotency | Deduplicate by event ID, process at-least-once |
| Grace period before fulfillment | Wait 60s after payment before shipping/delivering |
| Compensation over prevention | You can't prevent all late reversals — design for recovery |
Security Considerations
| Concern | Solution |
|---|---|
| Card number storage | Never store — use PSP tokenization |
| PCI DSS compliance | Use hosted payment pages (Stripe Elements) |
| Man-in-the-middle | TLS 1.3 everywhere |
| SQL injection | Parameterized queries, ORM |
| Internal fraud | Audit logs, 4-eyes principle for refunds |
| Amount tampering | Validate amount server-side, never trust client |
Interview Tips
How to approach this in a 45-min interview
- Clarify requirements (2 min) — Scale? Multi-currency? Which payment methods?
- High-level design (10 min) — Draw the architecture diagram above
- Deep dive: Idempotency (10 min) — This is the most important part. Explain the double-charge problem and solution.
- Deep dive: Reliability (10 min) — State machine, retry, reconciliation
- Discuss trade-offs (8 min) — Sync vs async processing, consistency vs availability
- Bonus: Ledger (5 min) — Double-entry bookkeeping shows you understand financial systems