3 min read By Vamsi Karuturi · Senior Backend Engineer at Salesforce

Geohashing & Spatial Indexing

Real Incident: Uber Driver Matching at Scale

At peak hours, Uber processes 1 million active drivers and 500,000 ride requests simultaneously. The core challenge: "Find the 10 closest available drivers to this rider within 3 seconds." A naive approach — computing distance to all 1M drivers per request — requires 1M distance calculations x 500K requests = 500 billion operations per second. Uber solved this with H3 hexagonal geohashing: partition the world into cells, index drivers by cell, and only search adjacent cells. The query goes from O(N) to O(k) where k is drivers in nearby cells (~50-200). Geospatial indexing turned an impossible problem into a 10ms lookup.

Why This Comes Up in Interviews

Any system involving location — ride-sharing, food delivery, dating apps, store locators, real-time tracking — requires spatial indexing. Interviewers want to hear:

Why latitude/longitude queries are inherently expensive without spatial indexing
How geohashing converts 2D coordinates into 1D sortable strings
Trade-offs between geohashing, quadtrees, and R-trees
How precision levels map to real-world distances
How systems like Uber/Lyft handle the "nearby" problem at scale

The Problem: Why Naive Approaches Fail

Query: "Find all restaurants within 1km of me" (lat: 37.7749, lng: -122.4194)

Approach	How	Why It Fails
Full scan	Check distance to every restaurant	O(N) per query — 10M restaurants = 10M calculations
Bounding box + B-tree	`WHERE lat BETWEEN x1 AND x2 AND lng BETWEEN y1 AND y2`	Two separate indexes don't compose efficiently
Geospatial index	Convert 2D → 1D prefix, use single index lookup	O(1) cell lookup + O(k) neighbors — fast

Geohashing Algorithm

Core idea: Recursively divide the world into grid cells. Encode each cell as a string. Nearby locations share common prefixes.

Encoding process for (37.7749, -122.4194):

Step	Action	Result
1	Divide world: lat [-90, 90], lng [-180, 180]	Full map
2	Is lng (-122.4) in left [-180,0] or right [0,180]?	Left → bit 0
3	Is lat (37.7) in bottom [-90,0] or top [0,90]?	Top → bit 1
4	Continue subdividing, interleaving lng/lat bits	Binary string
5	Encode binary as base32	`9q8yyk`

Result: San Francisco = 9q8yyk... — any location in the same neighborhood shares the prefix 9q8yy.

Precision Levels

Geohash Length	Cell Width	Cell Height	Use Case
1	5,009 km	4,992 km	Continent-level
2	1,252 km	624 km	Large country region
3	156 km	156 km	State/province
4	39.1 km	19.5 km	City
5	4.9 km	4.9 km	Neighborhood
6	1.2 km	0.6 km	Street-level
7	153 m	153 m	Building block
8	38 m	19 m	Building
9	4.8 m	4.8 m	Parking spot

Interview rule of thumb: For "find nearby" features, precision 5-6 (1-5km) is typical. Adjust based on expected search radius.

Proximity Search Algorithm

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph TD
    QUERY["Query: Find drivers near (37.77, -122.41)"] --> ENCODE["Encode to geohash: 9q8yyk"]
    ENCODE --> SELF["Search cell 9q8yyk"]
    ENCODE --> NEIGHBORS["Search 8 neighboring cells"]
    NEIGHBORS --> N1["9q8yym"]
    NEIGHBORS --> N2["9q8yyj"]
    NEIGHBORS --> N3["9q8yys"]
    NEIGHBORS --> N4["9q8yy7"]
    NEIGHBORS --> N5["... (4 more)"]
    SELF --> MERGE["Merge results from all 9 cells"]
    N1 --> MERGE
    N2 --> MERGE
    N3 --> MERGE
    N4 --> MERGE
    N5 --> MERGE
    MERGE --> RANK["Rank by actual distance, return top K"]

    style QUERY fill:#3498db,color:#fff
    style MERGE fill:#27ae60,color:#fff
    style RANK fill:#f39c12,color:#fff

Why search neighbors? A location near a cell boundary might have closest results in an adjacent cell. Searching the center cell alone misses edge cases (literally).

Algorithm:

Compute geohash of query point at desired precision
Find 8 neighboring cell hashes (well-defined algorithm)
Query all 9 cells from database (single prefix query per cell)
Compute actual Haversine distance for results
Return top-K by real distance

Geohashing Edge Cases

Problem	Cause	Solution
Boundary problem	Two nearby points in different cells	Always search 8 neighbors
Non-uniform distribution	Manhattan has 10,000 drivers per cell; rural has 2	Adaptive precision or quadtrees
Z-curve discontinuities	Geohash uses Z-order curve; some adjacent cells have very different prefixes	Always compute actual neighbors, don't rely on prefix similarity alone
Poles and antimeridian	Wrapping at 180/-180 longitude	Special handling at boundaries

Quadtrees — Adaptive Spatial Indexing

%%{init: {'theme': 'base', 'themeVariables': {'fontSize': '13px', 'fontFamily': 'Inter, -apple-system, sans-serif'}, 'flowchart': {'nodeSpacing': 30, 'rankSpacing': 50, 'padding': 12, 'curve': 'basis'}, 'sequence': {'actorMargin': 60, 'messageMargin': 40}, 'class': {'padding': 12}}}%%
graph TD
    ROOT["World (root cell)"] --> NW["NW Quadrant"]
    ROOT --> NE["NE Quadrant"]
    ROOT --> SW["SW Quadrant"]
    ROOT --> SE["SE Quadrant"]
    NE --> NE_NW["NE-NW (sparse: leaf)"]
    NE --> NE_NE["NE-NE (sparse: leaf)"]
    NE --> NE_SW["NE-SW (dense: subdivide)"]
    NE --> NE_SE["NE-SE (sparse: leaf)"]
    NE_SW --> DEEP1["... (keep subdividing)"]

    style ROOT fill:#e74c3c,color:#fff
    style NE_SW fill:#f39c12,color:#fff
    style DEEP1 fill:#27ae60,color:#fff

Property	Geohash	Quadtree	R-tree
Structure	Fixed-precision grid	Adaptive tree (subdivide dense areas)	Bounding-box tree
Precision	Uniform worldwide	Adapts to data density	Adapts to object shapes
Best for	Point data, proximity search	Non-uniform point distributions	Rectangles, polygons, ranges
Storage	Simple (string per point)	Tree in memory	Complex disk-based tree
Update cost	O(1) per point	O(log N) rebalancing	O(log N) with splits
Used by	Redis GEO, Elasticsearch	In-memory spatial services	PostGIS, MongoDB 2dsphere

When to use which:

Geohash: Distributed systems (easy to shard by prefix), Redis/DynamoDB, simple proximity
Quadtree: In-memory indexes, non-uniform data (cities vs rural), game engines
R-tree: Complex shapes (polygons), range queries, traditional databases

Real-World Implementations

System	Technology	How
Uber H3	Hexagonal hierarchical grid	Hexagons (equal-area, 6 equidistant neighbors). Avoids square grid distortion.
Google S2	Sphere-projected Hilbert curve cells	Maps sphere to cube faces, uses Hilbert curve. Cells have ~equal area at any level.
Redis GEO	Geohash in sorted set	`GEOADD`, `GEORADIUS` — stores geohash as score in sorted set
Elasticsearch	Geohash + BKD tree	`geo_distance` query uses geohash prefix filtering + exact distance
PostGIS	R-tree (GiST index)	`ST_DWithin` for proximity, handles complex polygons

Back-of-Envelope: Uber Driver Matching

Parameter	Value
Active drivers (peak)	1,000,000
Ride requests/second	10,000
Search radius	3 km
Geohash precision	6 (1.2km cells)
Cells to search	9 (center + 8 neighbors)
Avg drivers per cell (urban)	50-200
Candidates per query	~500
Distance calculations per query	500 (not 1,000,000)
Query latency	<10ms
Driver location update frequency	Every 4 seconds
Location updates/second	250,000

Sharding strategy: Partition by geohash prefix (first 3 chars = city-level). Each shard handles one metro area. Hot cities (NYC, SF) get dedicated shards.

Interview Framework

When designing location-based systems:

Step 1: "For the proximity search requirement, I'd use geohashing to convert 2D coordinates into 1D sortable strings. This lets us use standard database indexes for spatial queries."

Step 2: "At precision level 6 (~1.2km cells), I'd search the target cell plus its 8 neighbors — 9 cells total. This guarantees we catch all points within our search radius."

Step 3: "For non-uniform density (Manhattan vs suburbs), I'd consider Uber's H3 hexagonal grid — hexagons have equidistant neighbors and equal area, avoiding the square grid's corner-distance problem."

Step 4: "For sharding, geohash prefixes are natural partition keys. Prefix 9q8 covers San Francisco — all SF drivers on the same shard for fast local queries."

Step 5: "Driver locations update every 4 seconds. I'd use Redis GEO (geohash in sorted set) for real-time positions — O(log N) updates, O(log N + K) radius queries."

Quick Recall

Question	Answer
What does geohashing solve?	Converts 2D spatial problem into 1D prefix-searchable string
Why search 8 neighbors?	Points near cell boundaries may be closer than points in same cell
Precision 6 cell size?	~1.2km x 0.6km (street/neighborhood level)
Geohash vs quadtree?	Geohash: uniform grid, easy sharding. Quadtree: adaptive density, in-memory.
What is Uber H3?	Hexagonal hierarchical grid — equal area, 6 equidistant neighbors
What is Google S2?	Sphere-to-cube projection with Hilbert curve — near-equal area cells on a sphere
Redis GEO internally?	Stores geohash as sorted set score; radius query = range scan + distance filter
Sharding strategy?	Geohash prefix = partition key (city-level shards)