Design a Proximity Server (Yelp / Nearby Friends)

Interview Prompt

Design Proximity Server (Yelp / Nearby Friends).

Clarifying Questions (ask before designing)

Question	Why it matters
Which of these is highest priority: Geohash vs QuadTree vs S2, Spatial indexing in Redis, Radius search?	Forces scope negotiation — senior candidates trim before drawing boxes.
What scale should we design for — DAU, QPS, data volume?	Drives every capacity decision; shows structured thinking.
What are the read vs write patterns on the critical path?	Determines caching, DB choice, and replication topology.
What consistency and durability guarantees are required?	Separates strong-consistency paths from eventual ones — a senior differentiator.

Scope

In scope

Geohash vs QuadTree vs S2
Spatial indexing in Redis
Radius search
Ranking by distance + relevance
Geo-sharding
Capacity estimation with shown math

Out of scope (state explicitly)

Full payment processing (#24)
Turn-by-turn map rendering (#54)
Driver/rider identity verification and background checks

Assumptions

Single metro / region unless interviewer asks for multi-city
Mobile clients with intermittent connectivity — server is source of truth
Managed geo + messaging infra (Kafka, Redis, RDS) is acceptable

Search nearby places: Given user's location and a radius, return nearby businesses/places (restaurants, gas stations, etc.)
Filter by category: Filter results by type (restaurant, hospital, ATM, etc.)
Sort by: Distance, rating, popularity, relevance
Business details: Name, address, phone, hours, photos, reviews
Add/update business: Business owners can add and update their listings
Reviews and ratings: Users can write reviews and rate businesses
Nearby Friends (variant): Show friends who are nearby in real-time

Metric	Calculation	Value
Total businesses	Given	200M
Business record size	Given	1 KB
Total business data	Given	200 GB
Search QPS	Given	100K
Business updates / day	Given	1M (low frequency)
Nearby Friends: Active users streaming location	Given	10M
Location updates / sec	10M users ÷ 4s	2.5M (10M users × 1 update per 4s)

Loading...

Core Algorithm: Geospatial Indexing

Option 1: GeoHash

Divide the world into 2 halves alternately (longitude, latitude), encode as Base32. Nearby locations share a common GeoHash prefix.

GeoHash Precision:
  Length 1: ~5,000 km cell
  Length 4: ~39 km cell
  Length 5: ~4.9 km cell  ← Good for nearby search
  Length 6: ~1.2 km cell  ← Good for walking distance
  Length 7: ~150 m cell
  Length 8: ~38 m cell

Key property: Nearby locations share a common GeoHash prefix. "9q8yy" and "9q8yz" are adjacent cells.

Search algorithm:
1. Compute user's GeoHash at precision 5 (4.9 km cells)
2. Find all businesses in user's cell: WHERE geohash LIKE '9q8yy%'
3. Also search the 8 neighboring cells (edge cases: user near cell boundary)
4. Filter by actual distance (GeoHash cells are rectangular approximations)
5. Sort by distance

Option 2: QuadTree ⭐ (recommended for in-memory)

Start with the entire world as the root node. Recursively divide each node into 4 quadrants (NW, NE, SW, SE) until a node has ≤ K businesses (e.g., K=100). Leaf nodes contain the actual business locations.

Root (World)
├── NW (Americas)
│   ├── NW (Canada)
│   ├── NE (NE US) → [biz1, biz2, ..., biz87]  (leaf: 87 businesses)
│   ├── SW (SW US)
│   │   ├── NW (Bay Area) → [biz1, ..., biz95]  (leaf)
│   │   ├── NE (Central CA)
│   │   └── ...
│   └── SE (SE US)
├── NE (Europe/Asia)
└── ...

Search algorithm:
1. Start at root, traverse to leaf containing user's location
2. Collect all businesses in that leaf
3. If not enough results (radius extends beyond leaf) → check parent's sibling leaves
4. Continue expanding until enough results found or radius exceeded
5. Filter by actual distance

Building QuadTree: O(N log N), where N = number of businesses
Search: O(log N + M), where M = businesses in result
Memory: 200M businesses × 32 bytes = ~6.4 GB (fits in memory on a single server)
Rebuild: Every night. Incremental updates during the day

Option 3: Google S2 Geometry

Uses a space-filling curve (Hilbert curve) to map 2D space to 1D
Each cell has a 64-bit Cell ID
Minimal cell covering set logic for arbitrary shapes and polygons
More uniform cell sizes than GeoHash
Used by: Google Maps, Foursquare

Option 4: Uber H3 Index

Hexagonal grid system rather than square/rectangular
Hexagons have uniform distance to all 6 immediate neighbors (unlike squares, where diagonal distance is different)
Resolution 9 (~174m edge) for precise tracking; Resolution 7 (~1.2km edge) for surge pricing zones
Finding neighbors is a fast O(1) memory lookup

Proximity Search Service

Receive query: {lat, lng, radius, category, sort_by}
Query geospatial index (QuadTree) for businesses in radius
Filter by category
Rank by sort criteria (distance, rating, relevance)
Enrich with business details from Redis cache (or fallback to MySQL)
Return paginated results

Business Service

CRUD operations for business listings
On create/update → update MySQL → async update geospatial index
Hot business profiles cached in Redis to minimize database loads

Kafka CDC: Incremental QuadTree Updates

Full QuadTree rebuild from MySQL runs nightly (~5 minutes for 200M businesses)
Debezium CDC on MySQL captures row-level changes → publishes to Kafka topic business-changes
QuadTree server consumes changes:
- Insert: Add business to leaf node; if leaf exceeds K=100, split into 4 quadrants
- Update location: Remove from old leaf, insert into new leaf
- Update metadata: Update in-place
- Delete: Remove from leaf node
CDC stream also updates Redis business cache and Redis GeoIndex
Lag: business changes reflected in search within ~2 seconds

Nearby Friends: Real-Time Architecture

Different from business search (businesses are static; friends move)
How it works:
- Each user streams location every 4 seconds via WebSocket
- Location stored in Redis Geo Set (user_locations)
- When user opens "Nearby Friends": query Redis for each friend's distance
- Optimization: Use GeoHash to pre-filter and only check friends in same/adjacent GeoHash cells

Event Bus Design (Kafka)

Topic: proximity_server-events
  Partitions: 64 (scale consumers horizontally)
  Partition key: entity_id (user_id / order_id — preserves per-entity ordering)
  Retention: 7 days (compliance) or 24h (high-volume telemetry)
  Replication factor: 3, min.insync.replicas: 2

Producer: idempotent producer enabled (enable.idempotence=true)
Consumer: consumer group "proximity_server-processors"
  - At-least-once delivery + idempotent handlers (dedup by event_id)
  - DLQ topic: proximity_server-events-dlq (poison messages after 3 retries)
  - Lag alert: consumer lag > 60s → scale workers

Design a Proximity Server (Yelp / Nearby Friends): async side effects MUST NOT block the synchronous API response.
  Sync path: validate → persist source of truth → publish event → return 201
  Async path: consumers update caches, indexes, notifications, aggregates

Search Nearby

HTTP

GET /api/v1/search/nearby?lat=37.7749&lng=-122.4194&radius=5000&category=restaurant&sort=distance&limit=20
Response: 200 OK
{
  "businesses": [
    {
      "business_id": "biz-uuid",
      "name": "Joe's Pizza",
      "category": "restaurant",
      "address": "123 Main St, San Francisco, CA",
      "lat": 37.7751,
      "lng": -122.4190,
      "distance_meters": 45,
      "rating": 4.5,
      "review_count": 234,
      "price_level": "$$",
      "is_open": true,
      "photo_url": "..."
    }
  ],
  "total": 85
}

Get Business Details

HTTP

GET /api/v1/businesses/{business_id}

Add/Update Business

HTTP

POST /api/v1/businesses
{ "name": "Joe's Pizza", "lat": 37.7751, "lng": -122.4190, ... }

Nearby Friends (Real-Time Variant)

WebSocket /ws/v1/nearby-friends
{
  "type": "location_update",
  "lat": 37.7749,
  "lng": -122.4194
}

Server Push:
{
  "type": "nearby_friends",
  "friends": [
    {"user_id": "...", "name": "Alice", "distance_meters": 120, "last_updated": "10s ago"}
  ]
}

Common Error Responses

400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
440 Login Timeout: WebSocket session expired; reconnect required

MySQL: Business Listings

SQL

CREATE TABLE businesses (
    business_id     BIGINT PRIMARY KEY,
    name            VARCHAR(256),
    category        VARCHAR(64),
    address         TEXT,
    city            VARCHAR(128),
    state           VARCHAR(64),
    country         VARCHAR(64),
    lat             DECIMAL(10,7),
    lng             DECIMAL(10,7),
    geohash         VARCHAR(12),
    phone           VARCHAR(20),
    website         TEXT,
    price_level     TINYINT,
    rating          DECIMAL(2,1),
    review_count    INT DEFAULT 0,
    hours           JSON,
    photos          JSON,
    is_active       BOOLEAN DEFAULT TRUE,
    created_at      TIMESTAMP,
    updated_at      TIMESTAMP,
    INDEX idx_geohash (geohash),
    INDEX idx_category_geohash (category, geohash),
    INDEX idx_city (city)
);

In-Memory QuadTree Node

JAVA

class QuadTreeNode {
    double x1, y1, x2, y2;    // bounding box
    List<Business> businesses; // non-null only for leaf nodes (max 100)
    QuadTreeNode NW, NE, SW, SE; // children (null for leaves)
}

class Business {
    long businessId;
    double lat, lng;
    String category;
    float rating;
}

Redis: Business Cache & GeoIndex

# Business details cache
Key:    biz:{business_id}
Value:  Hash { name, category, lat, lng, rating, ... }
TTL:    3600

# Redis GeoIndex (alternative to in-memory QuadTree)
GEOADD businesses:restaurant {lng} {lat} {business_id}
GEORADIUS businesses:restaurant {lng} {lat} 5 km WITHCOORD WITHDIST COUNT 20 ASC

Redis: Nearby Friends (Real-Time)

# User's current location
GEOADD user_locations {lng} {lat} {user_id}

# Query friends nearby
for friend_id in user.friends:
    GEODIST user_locations {user_id} {friend_id} m
    if distance < 5000:
        add to result

Concern	Solution
QuadTree server failure	Multiple replicas; rebuild from MySQL in < 5 minutes
MySQL read failure	Read replicas in multiple AZs
Stale geospatial index	Incremental updates during day; full rebuild nightly
Cache miss storm	Request coalescing + stale-while-revalidate

Specific: GeoHash / Cell Boundary Problem

A user at the edge of a cell might miss businesses just across the border
Solution: Always search the 8 neighboring cells in addition to the user's cell
For QuadTree: expand search to sibling nodes when results near the boundary

GeoHash vs QuadTree vs Redis GeoSet vs S2 vs H3

Approach           Query Type    Scale      Pros                                  Cons                                        Best For
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
GeoHash (DB index) Radius        200M+      Simple, SQL compatible, B-tree index  Cell boundary edge cases, need 8-neighbor   Small datasets, MySQL/Postgres geo
                                                                                  search
QuadTree (in-mem)  Radius + arb  200M+      Fast, <6.4 GB for 200M biz, flexible  Requires dedicated servers, rebuild         Yelp-scale static data
                                            query                                 overhead
Redis GeoSet       Radius only   100M       Zero infrastructure, built-in         No polygon queries, precision degrades      Simple proximity (Tinder, Uber)
                                            GEORADIUS                             near poles
Google S2          Polygon/rad   Unlimited  Most accurate, hierarchical,          Application-level library, no built-in DB   Google Maps, complex geo queries
                                            Hilbert curve
PostGIS            Any           100M       Full spatial SQL, arbitrary shapes    Harder to scale horizontally                Complex shapes, GIS apps

Rule of thumb: Static data (businesses, POIs) → QuadTree or GeoHash. Moving data (users, vehicles, drivers) → Redis GeoSet.

Grid Search vs Pure Radius Trade-off

Two common implementations for "find businesses within 5 km":

Approach 1: Cell-based (GeoHash / QuadTree)
  Find the cell(s) covering the search area → return all businesses in those cells
  
  Pros: Fast! Single index lookup
  Cons: Returns businesses in rectangular area, not a circle
    → Must post-filter: calculate actual distance for each result
    → More results than needed (corner of rectangle extends beyond radius)
    
  Optimization: use tighter cells (resolution 6: ~1.2 km)
    More cells to query (25 cells for 5 km radius at res 6 vs 9 at res 5)
    But fewer false positives → less post-filtering

Approach 2: Sorted distance (brute force in memory)
  Load all nearby businesses, compute exact Haversine distance, sort
  Only works if "nearby" is well-bounded (< 100K candidates)
  
  For dense cities: 100K businesses in 5 km → too slow
  For sparse areas: 50 businesses in 50 km → trivial
  
  Best practice:
    1. Cell query to get candidate set (~1K businesses)
    2. Exact Haversine distance calculation to filter to actual radius
    3. Sort by distance, paginate (no need to return all 1K)

Flat-earth approximation (fast, no trig, accurate to 0.1% for < 50 km):
  d = sqrt((Δlat)² + (Δlng × cos(lat))²) × 111 km/degree

Nearby Friends: Location Privacy and Precision Levels

Privacy concern: sharing exact GPS coordinates exposes home/work locations.

Solution: Use precision-reduced location sharing

Level 1 — Exact (high privacy risk):
  Share full GPS coordinates: lat=37.774929, lng=-122.419418
  Distance accurate to meters → reveals exact building
  AVOID for social features

Level 2 — Fuzzy location (recommended for Nearby Friends) ⭐:
  Truncate coordinates to 3 decimal places: lat=37.774, lng=-122.419
  Precision: ~100 meter radius
  "Alice is about 500m away" — can't pinpoint which house
  Still accurate enough for "Alice is in this neighborhood"

Level 3 — GeoHash cell (for "in the area"):
  Share GeoHash of precision 5 (4.9 km × 4.9 km cells)
  "Alice is in this part of downtown" — approximate zone, not location
  Works for: "Friends in the same neighborhood" features

Implementation:
  User shares precise GPS with backend (needed for computation)
  Backend stores precise location in Redis (TTL: 5 min)
  API response: returns ONLY distance in meters, never raw coordinates
  Frontend: shows "Alice is 800m away" → cannot triangulate exact position
  
  Privacy toggle: user can set "share location with friends" ON/OFF
  When OFF: friends see "Alice is in San Francisco" (city-level only)

SLOs & Error Budgets

Metric	Target	Rationale
Core user-facing availability	99.95%	Budget for planned maintenance + unplanned failures without user-visible outage.
p99 latency (critical path)	Problem-specific — state target early and tie to capacity math	Interview credibility comes from connecting SLO to architecture choices.
Error rate (5xx)	< 0.1%	Distinguishes transient blips from systemic failure requiring rollback.
Data durability	99.999999999% (11 nines) for committed writes	Define which operations require fsync/quorum vs async replication.

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Primary database unavailable	Health check failures, connection pool exhaustion alerts, elevated 5xx	Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists
Traffic spike (10× normal)	RPS anomaly alert, autoscaling lag, latency SLO burn rate	Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations
Bad deploy causing elevated errors	Canary metric regression, error budget burn, deployment correlation	Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility

Cost Drivers (Staff lens)

Egress bandwidth and CDN (often dominates media/data-heavy systems)
Database storage + IOPS at scale (plan compaction, TTL, tiering)
Compute for async pipelines (right-size workers, spot instances for batch)
Managed service premiums vs operational headcount trade-off

Multi-Region & DR

Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.

Interview Prompt

Clarifying Questions (ask before designing)

Scope

In scope

Out of scope (state explicitly)

Assumptions

Core Algorithm: Geospatial Indexing

Proximity Search Service

Business Service

Kafka CDC: Incremental QuadTree Updates

Nearby Friends: Real-Time Architecture

Event Bus Design (Kafka)

Search Nearby

Get Business Details

Add/Update Business

Nearby Friends (Real-Time Variant)

Common Error Responses

MySQL: Business Listings

In-Memory QuadTree Node

Redis: Business Cache & GeoIndex

Redis: Nearby Friends (Real-Time)

Specific: GeoHash / Cell Boundary Problem

Scaling the QuadTree

Density-Based Search Radius

Business Hours and Open/Closed Status

Interview Walkthrough

GeoHash vs QuadTree vs Redis GeoSet vs S2 vs H3

Grid Search vs Pure Radius Trade-off

Nearby Friends: Location Privacy and Precision Levels

Phase 1: MVP (0 to 100K users)

Phase 2: Growth (100K to 10M users)

Phase 3: Scale (10M+ users)

SLOs & Error Budgets

Incident Scenarios (2am reality)

Cost Drivers (Staff lens)

Multi-Region & DR