Design a Search Autocomplete System
Return relevant suggestions in <100ms as the user types — the system behind every search bar at Google, Amazon, and YouTube.
Why This Is Asked
Tests your knowledge of: trie data structures, caching strategies, ranking algorithms, read-heavy optimization, and real-time data pipelines. Plus it has measurable latency constraints.
Requirements
Functional
- Return top 5-10 suggestions as the user types each character
- Suggestions ranked by popularity (search frequency)
- Support personalization (recent searches, user preferences)
- Handle typos/fuzzy matching
- Filter inappropriate content
- Multi-language support
Non-Functional
- Latency: < 100ms p99 (users expect instant feedback)
- Availability: 99.99% (search is the primary entry point)
- Scale: 5 billion searches/day, 100K QPS for autocomplete
- Freshness: Trending topics appear within minutes
High-Level Architecture
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart TD
User(("User types 'pyth'")) --> LB{{"Load Balancer"}}
LB --> API["Autocomplete Service<br/>(stateless)"]
API --> Cache[("Redis Cluster<br/>(prefix → suggestions)")]
Cache -->|"miss"| Trie["Trie Service<br/>(in-memory trie)"]
subgraph DataPipeline["Offline Data Pipeline"]
Logs["Search Logs<br/>(Kafka)"] --> Agg["Aggregation Service<br/>(hourly/daily)"]
Agg --> Builder["Trie Builder"]
Builder --> Trie
Builder --> Cache
end
style Cache fill:#FFCDD2,stroke:#C62828,color:#000
style Trie fill:#E8F5E9,stroke:#2E7D32,color:#000
style DataPipeline fill:#F5F5F5,stroke:#616161,color:#000 Core Data Structure: Trie
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart TD
Root["(root)"] --> P["p"]
Root --> J["j"]
P --> PY["py"]
PY --> PYT["pyt"]
PYT --> PYTH["pyth<br/>⭐ python (freq: 50K)<br/>⭐ pythagoras (freq: 8K)"]
PYTH --> PYTHO["pytho"]
PYTHO --> PYTHON["python<br/>⭐ python tutorial (freq: 45K)<br/>⭐ python download (freq: 30K)"]
J --> JA["ja"]
JA --> JAV["jav"]
JAV --> JAVA["java<br/>⭐ java tutorial (freq: 40K)<br/>⭐ javascript (freq: 55K)"]
style PYTH fill:#E8F5E9,stroke:#2E7D32,color:#000
style PYTHON fill:#E3F2FD,stroke:#1565C0,color:#000
style JAVA fill:#FEF3C7,stroke:#D97706,color:#000 Trie Node Design
Java
public class TrieNode {
Map<Character, TrieNode> children = new HashMap<>();
// Top-K suggestions cached at each node (avoids traversal at query time)
List<ScoredSuggestion> topSuggestions = new ArrayList<>(10);
boolean isEndOfWord;
}
public record ScoredSuggestion(String text, long frequency)
implements Comparable<ScoredSuggestion> {
@Override
public int compareTo(ScoredSuggestion other) {
return Long.compare(other.frequency, this.frequency); // descending
}
}
Query Flow
Text Only
User types: "pyth"
1. Traverse trie: root → p → py → pyt → pyth
2. At node "pyth", return pre-computed topSuggestions:
["python" (50K), "python tutorial" (45K), "python download" (30K), ...]
3. Return in < 1ms (O(length of prefix), no tree traversal needed)
Caching Strategy
Two-Layer Cache
Text Only
Layer 1: Browser cache (client-side)
- Cache "a", "ap", "app" responses for 5 minutes
- Eliminates repeated requests for same prefix
Layer 2: Redis cluster (server-side)
- Key: "prefix:pyth" → Value: ["python", "python tutorial", ...]
- TTL: 1 hour (refreshed by offline pipeline)
- 95%+ hit rate (most queries are popular prefixes)
YAML
# Redis schema
prefix:p → ["python", "pizza near me", "paypal login", ...]
prefix:py → ["python", "python tutorial", "python download", ...]
prefix:pyt → ["python", "python tutorial", "python download", ...]
prefix:pyth → ["python", "python tutorial", "pythagoras", ...]
Optimization: Only Cache Short Prefixes
Cache prefixes of length 1-4 (covers 95% of queries). Longer prefixes are less frequent and can hit the trie directly. This keeps Redis memory manageable: - 26 one-char prefixes × 10 suggestions = 260 entries - 26² two-char = 6,760 entries - 26³ three-char = 175,760 entries - Total: ~200K entries → easily fits in memory
Data Collection & Ranking Pipeline
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
subgraph Realtime["Real-time (minutes)"]
S["User searches"] --> K["Kafka<br/>(search events)"]
K --> SC["Stream Counter<br/>(Flink/Spark)"]
SC --> TH["Trending<br/>Hashtable"]
end
subgraph Batch["Batch (hourly)"]
K --> DW["Data Warehouse<br/>(search logs)"]
DW --> AGG["Aggregate<br/>frequencies"]
AGG --> TB["Rebuild Trie<br/>+ update Redis"]
end
style Realtime fill:#E8F5E9,stroke:#2E7D32,color:#000
style Batch fill:#E3F2FD,stroke:#1565C0,color:#000 Ranking Formula
Text Only
score = (search_frequency × recency_weight) + personalization_boost
where:
recency_weight = decay_factor ^ hours_since_trending
personalization_boost = user_history_match × 1.5
| Factor | Weight | Example |
|---|---|---|
| Global frequency | 1.0x | "python" has 50K/day |
| Recency (trending) | 2.0x for last hour | Breaking news terms |
| Personalization | 1.5x | User searched "python" before |
| Geography | 1.2x | "pizza near me" in user's city |
Handling Edge Cases
Typo Tolerance / Fuzzy Matching
Text Only
User types: "pythn" (missing 'o')
Solution: Edit distance matching (Levenshtein ≤ 2)
- "pythn" → "python" (distance 1, insert 'o')
Implementation:
1. Try exact prefix match first (fast path)
2. If no results, try BK-tree or SymSpell for fuzzy matches
3. Return fuzzy matches with lower ranking score
Filtering Inappropriate Content
Java
// Blocklist filter applied before returning suggestions
public List<String> filterSuggestions(List<String> suggestions) {
return suggestions.stream()
.filter(s -> !blocklist.contains(s.toLowerCase()))
.filter(s -> !toxicityClassifier.isInappropriate(s))
.limit(10)
.toList();
}
Scaling
| Component | Strategy |
|---|---|
| Autocomplete API | Stateless, horizontal scaling behind LB |
| Redis cache | Redis Cluster (sharded by prefix hash) |
| Trie service | Replicated in-memory (each pod has full trie) |
| Data pipeline | Kafka + Flink for real-time, Spark for batch |
| Trie rebuild | Blue-green: build new trie, swap atomically |
Memory Estimation
Text Only
- Unique search terms: 100 million
- Average term length: 20 characters
- Trie overhead (pointers, metadata): ~5x raw text
- Total trie size: 100M × 20B × 5 = ~10GB per replica
- Fits in memory of a large instance (feasible)
Interview Framework
45-minute approach
- Requirements (3 min): Confirm scale, latency SLA, personalization scope
- High-level design (8 min): Client → API → Cache → Trie, plus data pipeline
- Trie deep dive (10 min): Node structure, pre-computed top-K, query complexity
- Caching (8 min): Two-layer (browser + Redis), key design, TTL, hit rates
- Data pipeline (8 min): Real-time trending + batch frequency aggregation
- Scaling (5 min): Sharding, replication, blue-green trie rebuild
- Trade-offs (3 min): Memory vs freshness, personalization vs privacy