Design Foursquare (Check-ins and Recommendations)

This problem appears in multiple sheets. Depth expectations increase as you progress:

Track	What to demonstrate
Arch 25	Postgres for check-ins, ES for geo search, Kafka for async side effects. Draw check-in write path and mayorship Redis ZINCRBY. Anti-fraud: 200m radius + velocity.
Arch 50	Kafka consumer idempotency, mayorship tie-breaking, ES vs PostGIS trade-off, CDC sync lag. Capacity: 115 writes/sec is trivial — focus on read path (10K search QPS).
Arch 75	Staff: venue dedup ML pipeline, privacy-preserving location history, recommendation embedding freshness, and multi-city trending hot keys.

Interview Prompt

Design a location-based check-in system like Foursquare. Users check in at venues, discover nearby places, get recommendations, earn mayorships, and see friends' activity.

Clarifying Questions (ask before designing)

Question	Why it matters
How strict is location verification?	GPS spoofing vs user experience — 200m radius is industry standard; tighter causes false rejections indoors.
Real-time or async for mayorship and feed updates?	Sync path must be < 200ms; gamification can lag seconds via Kafka consumers.
How do recommendations balance exploration vs exploitation?	Pure CF creates filter bubbles; geo + context signals need ε-greedy exploration.

Scope

In scope

Check-in with location validation
Nearby venue search (< 200ms)
Personalized recommendations
Mayorship gamification
Friend activity feed
Tips and venue pages

Out of scope (state explicitly)

Full mapping/navigation (see problem #54)
Payment at venue
Venue owner dashboard

Assumptions

100M venues, 15M DAU, 10M check-ins/day (~115/sec)
10K venue search QPS peak
Check-in sync path < 200ms; async side effects < 5s

Check-in: Users check in at venues with optional text/photo
Venue discovery: Search nearby venues by category, distance, rating
Recommendations: Personalized venue suggestions
Venue pages: Rich profiles with photos, tips, hours, menu, ratings
Tips: Users leave short tips/reviews
Lists: Curated venue lists
Friends: Social layer: see where friends checked in
Mayorships: Most frequent check-in at a venue earns "Mayor" status
Trending: Show what's trending nearby right now
Explore feed: Discovery feed of nearby popular/new venues

Metric	Calculation	Value
Total venues	Given (assumption documented in value)	100M
DAU	Given (15M daily active users)	15M
Check-ins / day	Given (10M check-ins/day)	10M
Check-ins / sec	Given	~115 (peak 500)
Venue search queries / sec	Given (10K QPS peak)	10K
Recommendation queries / sec	Given (5K QPS peak)	5K
Tips / day	Given (2M tips/day)	2M
Total venue data	100M × 2 KB	200 GB

Loading...

Venue Discovery: Nearby Search

Approach 1: PostGIS spatial query
  Accurate, supports complex filters. DB query per request → may not handle 10K QPS.

Approach 2: Elasticsearch geo query ⭐
  Full-text + geo + filtering in one query. Handles 10K+ QPS.

Recommended: Elasticsearch for user-facing search; PostGIS for admin queries

Check-in Flow and Kafka Event Pipeline

1. POST /api/v1/checkins — synchronous path (< 200ms):
   a. Validate venue exists (Redis cache → Postgres fallback)
   b. Anti-fraud: Haversine distance < 200m + velocity check (< 500 km/h)
   c. INSERT checkin into PostgreSQL (source of truth)
   d. Return 201 immediately with checkin_id

2. Async via Kafka topic "checkin-events" (partition key: user_id):
   Consumer: Feed Service     → LPUSH friend_feed:{friend_id}
   Consumer: Mayorship        → ZINCRBY mayor:{venue_id} (atomic)
   Consumer: Venue Stats        → INCR venue checkin_count
   Consumer: Trending           → ZINCRBY trending:{city} (1-hour window)
   Consumer: Recommendations    → update user preference vector
   Consumer: Notifications      → "Alice checked in at Tartine"

3. Idempotency: Idempotency-Key header → dedup table prevents double check-in
   on client retry. At-least-once Kafka + idempotent consumers = no duplicate side effects.

Recommendation Service: Personalized Venue Discovery

Recommendation signals:
  1. User Profile Vector (cuisine, price, time preferences)
  2. Collaborative Filtering
  3. Context-Aware: time of day, day of week, weather
  4. Venue Quality: rating, check-in count, tip sentiment

ML Architecture:
  Offline: Spark computes user/venue embeddings (128-dim) nightly
  Online: candidates from ES geo query → score = dot_product × recency × distance × context
  Cache in Redis (TTL: 15 min). Latency budget: ~100 ms total.

Mayorship: Race Conditions and Tie-Breaking

"Mayor" = user with most check-ins at a venue in the last 60 days

Redis: mayor:{venue_id} → Sorted Set { user_id: checkin_count_60d }

On check-in (async consumer):
  ZINCRBY mayor:{venue_id} 1 {user_id}
  current_mayor = ZREVRANGE mayor:{venue_id} 0 0
  if user_id == current_mayor → notify "You're the new mayor!"

Race: two users tie at same count → tie-break by earliest check-in timestamp
  (store score as count * 1e10 - timestamp for lexicographic ordering)

Daily cron: recompute from PostgreSQL for venues with disputed mayorships
  (handles Redis eviction or consumer lag > 24h)

Stale mayor window: if no check-in for 60 days, mayorship expires

Check In

HTTP

POST /api/v1/checkins
{
  "venue_id": "v-uuid",
  "lat": 37.7749,
  "lng": -122.4194,
  "shout": "Best croissants ever!",
  "photo_url": "https://cdn.example.com/photos/abc.jpg",
  "visibility": "friends"
}
Response: 201 Created
{
  "checkin_id": "ci-uuid",
  "venue": {"name": "Tartine Bakery", "category": "Bakery"},
  "points_earned": 5,
  "is_mayor": false
}

Common Error Responses

400 Bad Request    — missing venue_id or invalid coordinates
401 Unauthorized   — invalid token
403 Forbidden      — user banned or venue check-in disabled
404 Not Found      — venue_id does not exist
409 Conflict       — duplicate check-in within 4 hours (idempotent retry returns 200)
422 Unprocessable  — user > 200m from venue (location fraud)
429 Too Many Requests — > 20 check-ins/hour (anti-spam)

Search Nearby Venues

HTTP

GET /api/v1/venues/search?q=pizza&lat=37.7749&lng=-122.4194&radius=2000&category=restaurant&min_rating=4.0&limit=20

Get Explore Recommendations

HTTP

GET /api/v1/explore?lat=37.7749&lng=-122.4194&limit=20&time=2025-03-14T19:00:00Z

Add Tip

HTTP

POST /api/v1/venues/{venue_id}/tips
{
  "text": "Try the almond croissant — it's life-changing!",
  "photo_url": "https://cdn.example.com/photos/tip.jpg"
}

Get Friends' Recent Check-ins

HTTP

GET /api/v1/feed/friends?limit=20&cursor={last}

PostgreSQL + PostGIS

SQL

CREATE TABLE venues (
    venue_id        UUID PRIMARY KEY,
    name            VARCHAR(255) NOT NULL,
    category_id     INT NOT NULL,
    location        GEOMETRY(Point, 4326) NOT NULL,
    lat             DECIMAL(10,7), lng DECIMAL(10,7),
    address         TEXT, city VARCHAR(100), country_code CHAR(2),
    phone           VARCHAR(20), website TEXT,
    price_tier      SMALLINT, rating DECIMAL(2,1),
    checkin_count   INT DEFAULT 0, tip_count INT DEFAULT 0,
    hours           JSONB, verified BOOLEAN DEFAULT FALSE,
    created_at      TIMESTAMP, updated_at TIMESTAMP
);
CREATE INDEX idx_location ON venues USING GIST(location);

CREATE TABLE checkins (
    checkin_id      UUID PRIMARY KEY,
    user_id         UUID NOT NULL, venue_id UUID NOT NULL,
    lat DECIMAL(10,7), lng DECIMAL(10,7),
    shout TEXT, photo_url TEXT,
    visibility ENUM('public','friends','private') DEFAULT 'friends',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_user (user_id, created_at DESC),
    INDEX idx_venue (venue_id, created_at DESC)
);

CREATE TABLE tips (
    tip_id  UUID PRIMARY KEY,
    venue_id UUID NOT NULL, user_id UUID NOT NULL,
    text TEXT NOT NULL, photo_url TEXT,
    upvote_count INT DEFAULT 0, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Redis: Caches and Real-time

mayor:{venue_id} → Sorted Set { user_id: count }
venue:{venue_id} → Hash { name, category, rating, lat, lng }
recs:{user_id}:{geohash} → List of venue_ids (TTL: 900)
trending:{city} → Sorted Set { venue_id: check-in count in last hour }
user_emb:{user_id} → Binary (128-dim float32 vector)
friend_feed:{user_id} → List of recent friend check-in IDs

Concern	Solution
Check-in location fraud	Haversine distance + velocity check + rate limiting
Elasticsearch down	Fallback to PostGIS for search (degraded but functional)
Recommendation service down	Serve cached recs; fallback to popularity-based
Mayorship race condition	Redis ZINCRBY is atomic
Venue data inconsistency	ES synced from PostgreSQL via CDC (Debezium)

SLOs & Error Budgets

Metric	Target	Rationale
Check-in API p99	< 200ms	Sync path only — user waits for response
Venue search p99	< 200ms	ES geo query on critical discovery path
Mayorship update lag	< 5s	Async via Kafka — gamification can lag slightly

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
ES cluster yellow — search degraded	ES health API yellow; search p99 > 1s; error rate spike	Fallback to PostGIS with rate limit; scale ES data nodes; check Debezium CDC lag
Trending venue hot key in Redis	Single Redis shard CPU > 90%; trending:{city} key latency spike	Shard by geohash prefix; local cache at API layer; sliding window decay

Cost Drivers (Staff lens)

ES: 100M venues × 2KB doc = 200GB index + replicas
Postgres: 10M check-ins/day × 365 × 500B = ~1.8 TB/year
ML embedding compute: nightly Spark on 15M users × 128-dim

Multi-Region & DR

Venues are global; check-ins stored in user's home region. Search uses geo-routed ES cluster per continent. Trending is per-city (no cross-region consistency needed).

Interview Prompt

Clarifying Questions (ask before designing)

Scope

In scope

Out of scope (state explicitly)

Assumptions

Venue Discovery: Nearby Search

Check-in Flow and Kafka Event Pipeline

Recommendation Service: Personalized Venue Discovery

Mayorship: Race Conditions and Tie-Breaking

Check In

Common Error Responses

Search Nearby Venues

Get Explore Recommendations

Add Tip

Get Friends' Recent Check-ins

PostgreSQL + PostGIS

Redis: Caches and Real-time

Venue Deduplication

Privacy-Preserving Location History

Interview Walkthrough

PostgreSQL vs DynamoDB vs Cassandra for Check-ins

Elasticsearch vs PostGIS for Venue Search

Phase 1 — Monolith + PostGIS

Phase 2 — ES + Kafka async

Phase 3 — ML recommendations

SLOs & Error Budgets

Incident Scenarios (2am reality)

Cost Drivers (Staff lens)

Multi-Region & DR