This problem appears in multiple sheets. Depth expectations increase as you progress:
| Track | What to demonstrate |
|---|---|
| Arch 25 | Postgres for check-ins, ES for geo search, Kafka for async side effects. Draw check-in write path and mayorship Redis ZINCRBY. Anti-fraud: 200m radius + velocity. |
| Arch 50 | Kafka consumer idempotency, mayorship tie-breaking, ES vs PostGIS trade-off, CDC sync lag. Capacity: 115 writes/sec is trivial — focus on read path (10K search QPS). |
| Arch 75 | Staff: venue dedup ML pipeline, privacy-preserving location history, recommendation embedding freshness, and multi-city trending hot keys. |
Interview Prompt
Design a location-based check-in system like Foursquare. Users check in at venues, discover nearby places, get recommendations, earn mayorships, and see friends' activity.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| How strict is location verification? | GPS spoofing vs user experience — 200m radius is industry standard; tighter causes false rejections indoors. |
| Real-time or async for mayorship and feed updates? | Sync path must be < 200ms; gamification can lag seconds via Kafka consumers. |
| How do recommendations balance exploration vs exploitation? | Pure CF creates filter bubbles; geo + context signals need ε-greedy exploration. |
Scope
In scope
- Check-in with location validation
- Nearby venue search (< 200ms)
- Personalized recommendations
- Mayorship gamification
- Friend activity feed
- Tips and venue pages
Out of scope (state explicitly)
- Full mapping/navigation (see problem #54)
- Payment at venue
- Venue owner dashboard
Assumptions
- 100M venues, 15M DAU, 10M check-ins/day (~115/sec)
- 10K venue search QPS peak
- Check-in sync path < 200ms; async side effects < 5s
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Check-in: Users check in at venues with optional text/photo
- Venue discovery: Search nearby venues by category, distance, rating
- Recommendations: Personalized venue suggestions
- Venue pages: Rich profiles with photos, tips, hours, menu, ratings
- Tips: Users leave short tips/reviews
- Lists: Curated venue lists
- Friends: Social layer: see where friends checked in
- Mayorships: Most frequent check-in at a venue earns "Mayor" status
- Trending: Show what's trending nearby right now
- Explore feed: Discovery feed of nearby popular/new venues
- Low Latency: Nearby venue search in < 200 ms; recommendations in < 500 ms
- Location Accuracy: Relevant to user's exact location
- Scalability: 100M+ venues, 50M+ MAU, 10M+ check-ins/day
- Freshness: New venues appear in minutes; counts update in real-time
- Personalization: Recommendations improve with history
- Availability: 99.99%
- Privacy: Users control check-in visibility
| Metric | Calculation | Value |
|---|---|---|
| Total venues | Given (assumption documented in value) | 100M |
| DAU | Given (15M daily active users) | 15M |
| Check-ins / day | Given (10M check-ins/day) | 10M |
| Check-ins / sec | Given | ~115 (peak 500) |
| Venue search queries / sec | Given (10K QPS peak) | 10K |
| Recommendation queries / sec | Given (5K QPS peak) | 5K |
| Tips / day | Given (2M tips/day) | 2M |
| Total venue data | 100M × 2 KB | 200 GB |
Venue Discovery: Nearby Search
Approach 1: PostGIS spatial query Accurate, supports complex filters. DB query per request → may not handle 10K QPS. Approach 2: Elasticsearch geo query ⭐ Full-text + geo + filtering in one query. Handles 10K+ QPS. Recommended: Elasticsearch for user-facing search; PostGIS for admin queries
Check-in Flow and Kafka Event Pipeline
1. POST /api/v1/checkins — synchronous path (< 200ms):
a. Validate venue exists (Redis cache → Postgres fallback)
b. Anti-fraud: Haversine distance < 200m + velocity check (< 500 km/h)
c. INSERT checkin into PostgreSQL (source of truth)
d. Return 201 immediately with checkin_id
2. Async via Kafka topic "checkin-events" (partition key: user_id):
Consumer: Feed Service → LPUSH friend_feed:{friend_id}
Consumer: Mayorship → ZINCRBY mayor:{venue_id} (atomic)
Consumer: Venue Stats → INCR venue checkin_count
Consumer: Trending → ZINCRBY trending:{city} (1-hour window)
Consumer: Recommendations → update user preference vector
Consumer: Notifications → "Alice checked in at Tartine"
3. Idempotency: Idempotency-Key header → dedup table prevents double check-in
on client retry. At-least-once Kafka + idempotent consumers = no duplicate side effects.Recommendation Service: Personalized Venue Discovery
Recommendation signals: 1. User Profile Vector (cuisine, price, time preferences) 2. Collaborative Filtering 3. Context-Aware: time of day, day of week, weather 4. Venue Quality: rating, check-in count, tip sentiment ML Architecture: Offline: Spark computes user/venue embeddings (128-dim) nightly Online: candidates from ES geo query → score = dot_product × recency × distance × context Cache in Redis (TTL: 15 min). Latency budget: ~100 ms total.
Mayorship: Race Conditions and Tie-Breaking
"Mayor" = user with most check-ins at a venue in the last 60 days
Redis: mayor:{venue_id} → Sorted Set { user_id: checkin_count_60d }
On check-in (async consumer):
ZINCRBY mayor:{venue_id} 1 {user_id}
current_mayor = ZREVRANGE mayor:{venue_id} 0 0
if user_id == current_mayor → notify "You're the new mayor!"
Race: two users tie at same count → tie-break by earliest check-in timestamp
(store score as count * 1e10 - timestamp for lexicographic ordering)
Daily cron: recompute from PostgreSQL for venues with disputed mayorships
(handles Redis eviction or consumer lag > 24h)
Stale mayor window: if no check-in for 60 days, mayorship expiresCheck In
POST /api/v1/checkins
{
"venue_id": "v-uuid",
"lat": 37.7749,
"lng": -122.4194,
"shout": "Best croissants ever!",
"photo_url": "https://cdn.example.com/photos/abc.jpg",
"visibility": "friends"
}
Response: 201 Created
{
"checkin_id": "ci-uuid",
"venue": {"name": "Tartine Bakery", "category": "Bakery"},
"points_earned": 5,
"is_mayor": false
}Common Error Responses
400 Bad Request — missing venue_id or invalid coordinates 401 Unauthorized — invalid token 403 Forbidden — user banned or venue check-in disabled 404 Not Found — venue_id does not exist 409 Conflict — duplicate check-in within 4 hours (idempotent retry returns 200) 422 Unprocessable — user > 200m from venue (location fraud) 429 Too Many Requests — > 20 check-ins/hour (anti-spam)
Search Nearby Venues
GET /api/v1/venues/search?q=pizza&lat=37.7749&lng=-122.4194&radius=2000&category=restaurant&min_rating=4.0&limit=20Get Explore Recommendations
GET /api/v1/explore?lat=37.7749&lng=-122.4194&limit=20&time=2025-03-14T19:00:00ZAdd Tip
POST /api/v1/venues/{venue_id}/tips
{
"text": "Try the almond croissant — it's life-changing!",
"photo_url": "https://cdn.example.com/photos/tip.jpg"
}Get Friends' Recent Check-ins
GET /api/v1/feed/friends?limit=20&cursor={last}PostgreSQL + PostGIS
CREATE TABLE venues (
venue_id UUID PRIMARY KEY,
name VARCHAR(255) NOT NULL,
category_id INT NOT NULL,
location GEOMETRY(Point, 4326) NOT NULL,
lat DECIMAL(10,7), lng DECIMAL(10,7),
address TEXT, city VARCHAR(100), country_code CHAR(2),
phone VARCHAR(20), website TEXT,
price_tier SMALLINT, rating DECIMAL(2,1),
checkin_count INT DEFAULT 0, tip_count INT DEFAULT 0,
hours JSONB, verified BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP, updated_at TIMESTAMP
);
CREATE INDEX idx_location ON venues USING GIST(location);
CREATE TABLE checkins (
checkin_id UUID PRIMARY KEY,
user_id UUID NOT NULL, venue_id UUID NOT NULL,
lat DECIMAL(10,7), lng DECIMAL(10,7),
shout TEXT, photo_url TEXT,
visibility ENUM('public','friends','private') DEFAULT 'friends',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_user (user_id, created_at DESC),
INDEX idx_venue (venue_id, created_at DESC)
);
CREATE TABLE tips (
tip_id UUID PRIMARY KEY,
venue_id UUID NOT NULL, user_id UUID NOT NULL,
text TEXT NOT NULL, photo_url TEXT,
upvote_count INT DEFAULT 0, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);Redis: Caches and Real-time
mayor:{venue_id} → Sorted Set { user_id: count }
venue:{venue_id} → Hash { name, category, rating, lat, lng }
recs:{user_id}:{geohash} → List of venue_ids (TTL: 900)
trending:{city} → Sorted Set { venue_id: check-in count in last hour }
user_emb:{user_id} → Binary (128-dim float32 vector)
friend_feed:{user_id} → List of recent friend check-in IDs| Concern | Solution |
|---|---|
| Check-in location fraud | Haversine distance + velocity check + rate limiting |
| Elasticsearch down | Fallback to PostGIS for search (degraded but functional) |
| Recommendation service down | Serve cached recs; fallback to popularity-based |
| Mayorship race condition | Redis ZINCRBY is atomic |
| Venue data inconsistency | ES synced from PostgreSQL via CDC (Debezium) |
Venue Deduplication
Dedup pipeline: 1. Normalize name (lowercase, punctuation, abbreviations) 2. Geo-cluster within 50m 3. Name similarity (Jaccard/edit distance) 4. Category match 5. ML model: combine features → P(same_venue) 6. P > 0.85 → auto-merge; 0.5-0.85 → human review
Privacy-Preserving Location History
1. Store venue_id, not raw GPS (venue location is public) 2. Check-in aging: after 2 years, remove specific timestamps 3. Embeddings are irreversible 4. Friend visibility: server-side filtering 5. GDPR export/delete cascades to all data
Interview Walkthrough
- Explain check-in flow: verify user near venue (geospatial radius) → write check-in → update feeds/leaderboards.
- Cover venue search via geospatial index + text search hybrid (Elasticsearch geo_query).
- Discuss duplicate check-in prevention (one per venue per day) with idempotent keys.
- Mention friend activity feed as fan-out-on-write for normal users, read for high-follower accounts.
- Cover mayor/badges as async aggregation jobs, not inline on check-in path.
- Common pitfall: accepting check-ins without GPS proximity validation — spoofed locations break venue analytics.
PostgreSQL vs DynamoDB vs Cassandra for Check-ins
PostgreSQL ✓: ACID for check-in + tip in one txn, PostGIS geo queries, rich analytics JOINs, ~115 writes/sec (trivial at this scale) DynamoDB: 25-item transaction limit, no JOINs, cost adds up for geo queries Cassandra: Overkill for 115 writes/sec, no multi-row transactions At Foursquare scale (~10M check-ins/day): PostgreSQL is correct. Scale path: read replicas for analytics, ES for search, Redis for hot counts.
Elasticsearch vs PostGIS for Venue Search
PostGIS: accurate spatial queries, complex polygon filters, but ~100 QPS per shard without tuning. Good for admin/backfill.
Elasticsearch geo_query: inverted index + geo hash grid → 10K+ QPS with filters (category, rating, price). Slight staleness (CDC lag ~1s) acceptable for search. Recommendation: ES for user-facing; PostGIS as source of truth synced via Debezium CDC.
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1 — Monolith + PostGIS
Single Postgres with PostGIS, sync mayorship update, no Kafka.
Key components: PostgreSQL · PostGIS · Sync side effects
Move to next phase when: Search latency > 500ms or check-in response blocked by side effects
Phase 2 — ES + Kafka async
Postgres source of truth, ES for search via CDC, Kafka consumers for feed/mayor/trending.
Key components: PostgreSQL · Elasticsearch · Debezium CDC · Kafka · Redis
Move to next phase when: 10K search QPS; mayorship races under concurrent check-ins
Phase 3 — ML recommendations
128-dim embeddings, Redis rec cache, venue dedup ML, multi-city trending sharding.
Key components: Spark embeddings · Redis rec cache · Dedup ML pipeline · Trending shards
Move to next phase when: Recommendation p99 > 500ms; filter bubble complaints
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Check-in API p99 | < 200ms | Sync path only — user waits for response |
| Venue search p99 | < 200ms | ES geo query on critical discovery path |
| Mayorship update lag | < 5s | Async via Kafka — gamification can lag slightly |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| ES cluster yellow — search degraded | ES health API yellow; search p99 > 1s; error rate spike | Fallback to PostGIS with rate limit; scale ES data nodes; check Debezium CDC lag |
| Trending venue hot key in Redis | Single Redis shard CPU > 90%; trending:{city} key latency spike | Shard by geohash prefix; local cache at API layer; sliding window decay |
Cost Drivers (Staff lens)
- ES: 100M venues × 2KB doc = 200GB index + replicas
- Postgres: 10M check-ins/day × 365 × 500B = ~1.8 TB/year
- ML embedding compute: nightly Spark on 15M users × 128-dim
Multi-Region & DR
Venues are global; check-ins stored in user's home region. Search uses geo-routed ES cluster per continent. Trending is per-city (no cross-region consistency needed).