Rate Limiting in 5 Minutes
Real Incident: GitHub API Abuse (2023)
A misconfigured CI pipeline sent 50,000 API requests/minute to GitHub from a single token. Without rate limiting, it would have degraded the API for all users. GitHub's rate limiter returned 429 Too Many Requests after 5,000/hour, protecting the platform. Rate limiting is the immune system of your API.
The One-Liner
Rate limiting controls how many requests a client can make in a given time window, protecting your service from abuse, bugs, and overload.
How It Works
%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
flowchart LR
C[Client] --> RL{Rate Limiter}
RL -->|Under limit| App[Application]
RL -->|Over limit| R[429 Too Many Requests]
style RL fill:#ef4444,color:#fff
style R fill:#fca5a5,color:#7f1d1d - Client sends request → rate limiter checks counter for that client/IP/API key
- Under limit: forward request to application, increment counter
- Over limit: reject immediately with
429+Retry-Afterheader - Counters stored in fast in-memory store (Redis) for distributed systems
Algorithms
| Algorithm | How | Pros | Cons | Best For |
|---|---|---|---|---|
| Fixed Window | Count per time window (e.g., per minute) | Simple | Burst at window edges (2x spike) | Simple APIs |
| Sliding Window Log | Store timestamp of each request | Accurate | Memory-heavy (stores all timestamps) | Low-volume, precise |
| Sliding Window Counter | Weighted average of current + previous window | Good accuracy, low memory | Slight approximation | Most APIs (best default) |
| Token Bucket | Tokens refill at steady rate, request costs 1 token | Allows controlled bursts | Slightly complex | AWS, Stripe, most cloud APIs |
| Leaky Bucket | Requests queue and drain at fixed rate | Smooth output rate | No bursts allowed | Steady-rate processing |
Key Trade-offs
| Strict Limiting | Lenient Limiting |
|---|---|
| Protects backend aggressively | Better user experience |
| May reject legitimate spikes | Risk of overload during abuse |
| Simple to reason about | Needs burst allowance logic |
429 frustrates good clients | Bad actors get more runway |
Interview Cheat Sheet
- "Token bucket for most APIs — allows short bursts while enforcing average rate"
- "Sliding window counter for the best accuracy-to-memory trade-off"
- "Distributed rate limiting: Redis + Lua script for atomic increment-and-check"
- "Rate limit by: API key (per-customer), IP (anonymous), user_id (authenticated)"
- "Return headers:
X-RateLimit-Remaining,X-RateLimit-Reset,Retry-After"
When to Use / When NOT to Use
| Use When | Don't Use When |
|---|---|
| Public API (protect from abuse) | Internal service-to-service (use circuit breakers) |
| Shared resource (DB, external API) | Single-user local application |
| Need fair multi-tenant access | Latency-critical path (adds ~1ms) |
| DDoS/bot protection | Already behind a WAF/CDN with rate limiting |