1 min readBy Vamsi Karuturi · Senior Backend Engineer at Salesforce
Welcome to VamsiLabs
Sign in to unlock 100+ deep-dive notes, track your progress, and prep for FAANG interviews.
or use email
Enter your email and we'll send a reset link.
Database Indexing Deep Dive
Real Incident: Shopify Black Friday 2020
A missing index on the orders table's created_at column caused a dashboard query to do a full table scan across 500M rows. Query time: 45 seconds. During Black Friday traffic surge, this query ran every 30 seconds, locking the database. Adding a single index dropped it to 3ms — a 15,000x improvement from one line of DDL.
Why This Comes Up in Interviews
Every system design involves databases, and indexes are the #1 performance lever. Interviewers want to hear:
How indexes work internally (B-Tree vs Hash vs LSM)
When to add an index vs when NOT to (write amplification trade-off)
Composite index design and column ordering
How to identify missing indexes from query patterns
How Indexes Work — The Mental Model
Without index: Database reads every row to find matches (full table scan) With index: Database jumps directly to matching rows (like a book's index)
EXPLAINANALYZESELECT*FROMordersWHEREcustomer_id=123;-- Bad: "Seq Scan on orders" (full scan)-- Good: "Index Scan using idx_orders_customer" -- Key metrics:-- Actual rows: how many rows examined-- Planning time vs Execution time-- "Rows Removed by Filter" = wasted work
Index Anti-Patterns
Anti-Pattern
Problem
Fix
Index on every column
Write performance destroyed
Index only queried columns
Index on low-cardinality (boolean)
B-Tree scan is barely better than full scan
Use partial index instead
Unused indexes
Waste write performance + disk
Monitor pg_stat_user_indexes
Function on indexed column
WHERE UPPER(email) = 'X' bypasses index
Expression index or generated column
Leading wildcard
WHERE name LIKE '%smith' can't use B-Tree
Full-text search (GIN) or trigram index
Interview Cheat Sheet
Question
Answer
"How do indexes speed up queries?"
"B-Tree: O(log N) lookup instead of O(N) scan. 10M rows → 23 comparisons instead of 10M. Tree depth rarely exceeds 4 levels."
"When NOT to index?"
"Write-heavy tables with few reads. Low-cardinality columns (boolean). Tiny tables (<1000 rows). Columns rarely in WHERE/JOIN."
"Composite index column order?"
"Equality columns first, range columns last. Matches left-to-right only — skipping the first column means the index isn't used."
"Covering index?"
"Index contains all columns the query needs (via INCLUDE). Database returns data from index without touching the table — index-only scan."
"How many indexes per table?"
"Typically 3-7 for OLTP. Each adds ~10-20% write cost. Monitor unused indexes and drop them."
Back-of-Envelope: Index Size
Table: 100M rows, indexed column is BIGINT (8 bytes)