Interview Prompt
Design Review and Rating System.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Which of these is highest priority: Aggregated rating computation, Spam detection, Verified purchase? | Forces scope negotiation — senior candidates trim before drawing boxes. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- Aggregated rating computation
- Spam detection
- Verified purchase
- Helpful vote ranking
- Capacity estimation with shown math
Out of scope (state explicitly)
- Full catalog/search infrastructure (#12)
- Payment checkout flow (#24)
- Fraud and abuse ML pipelines
Assumptions
- Clarify scale (DAU, QPS, data volume) for review rating system in the first 5 minutes
- Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
- Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Write reviews: Text reviews with 1-5 star rating for products/services
- Photo/video reviews: Attach media to reviews
- Helpful votes: Other users vote reviews as "helpful" or "not helpful"
- Verified purchase badge: Mark reviews from actual buyers
- Aggregate ratings: Average rating, rating distribution histogram, total count per product
- Sort/filter reviews: By recency, helpfulness, rating, verified purchase
- Seller responses: Sellers can respond publicly to reviews
- Edit/delete own reviews: Users manage their own reviews
- Anti-abuse: Detect and filter fake/spam reviews via ML + rules
- Review summary: AI-generated summary of common themes across reviews
- Low Latency: Aggregate ratings in < 50 ms (shown on every product page)
- Eventual Consistency: New review reflected in aggregates within 5 minutes
- Scale: 500M+ products, 5B+ reviews, 200K new reviews/day
- Availability: 99.99% read; 99.9% write
- Fraud Resistant: Detect and suppress fake reviews within hours
| Metric | Calculation | Value |
|---|---|---|
| Total products | Given | 500M |
| Total reviews | Given | 5B |
| New reviews / day | Given (assumption documented in value) | 200K |
| Review reads / sec | Derived from daily volume ÷ 86400 (+ peak factor) | 100K |
| Aggregate rating reads / sec | Derived from daily volume ÷ 86400 (+ peak factor) | 500K |
| Total review storage | Given | 5 TB text + 100 TB media |
Aggregate Rating — Pre-Computed
Event Bus Design (Kafka)
Topic: review_rating_system-events Partitions: 64 (scale consumers horizontally) Partition key: entity_id (user_id / order_id — preserves per-entity ordering) Retention: 7 days (compliance) or 24h (high-volume telemetry) Replication factor: 3, min.insync.replicas: 2 Producer: idempotent producer enabled (enable.idempotence=true) Consumer: consumer group "review_rating_system-processors" - At-least-once delivery + idempotent handlers (dedup by event_id) - DLQ topic: review_rating_system-events-dlq (poison messages after 3 retries) - Lag alert: consumer lag > 60s → scale workers Design a Review and Rating System: async side effects MUST NOT block the synchronous API response. Sync path: validate → persist source of truth → publish event → return 201 Async path: consumers update caches, indexes, notifications, aggregates
Bayesian Average
Product A: 1 review, 5.0 avg. Product B: 10K reviews, 4.5 avg. Simple average ranks A higher. That's statistically wrong. Bayesian Average (what Amazon/IMDB actually use): bayesian_avg = (C * m + sum_of_ratings) / (C + total_reviews) C = 25 (minimum reviews to trust) m = 3.7 (global average rating) Product A: (25*3.7 + 5) / (25+1) = 3.75 Product B: (25*3.7 + 45000) / (25+10000) = 4.498 Product B correctly ranks higher. As review count grows, bayesian_avg converges to true average.
Fake Review Detection
Signals for ML fraud detection: Behavioral: - Account age < 7 days + review posted -> suspicious - 50 reviews in 1 day -> bot - All 5-star or all 1-star -> biased - Copy-pasted text across products Purchase verification: - No purchase record -> unverified (lower weight in aggregates) - Purchased and returned same day -> suspicious Text analysis (NLP): - Generic text: "Great product! Highly recommended!" (no specifics) - Sentiment mismatch: positive text + 1 star - Language similarity between reviews from same IP Network analysis: - Multiple reviews from same IP/device for same product - Review rings: group of accounts all reviewing same products Timing: - Product gets 500 5-star reviews in 1 hour (normally 5/day) -> burst ML Model (XGBoost): Features: all signals above. Output: P(fake) score. P(fake) > 0.8: auto-suppress + queue for human review P(fake) 0.5-0.8: flag, reduced weight in aggregate P(fake) < 0.5: show normally Suppressed reviews excluded from aggregate rating. Nightly recomputation corrects for retroactively suppressed reviews.
Write Review
POST /api/v1/products/{product_id}/reviews
{
"rating": 4,
"title": "Great quality, slow shipping",
"body": "The product quality is excellent but shipping took 2 weeks...",
"photos": ["https://cdn.../photo1.jpg"]
}
Response: 201 Created
{ "review_id": "rev-uuid", "status": "published" }Get Reviews for Product
GET /api/v1/products/{product_id}/reviews?sort=helpful&rating=4&verified=true&page=1&limit=10
Response: 200 OK
{
"aggregate": {
"avg_rating": 4.3, "total": 50000,
"distribution": {"5": 30000, "4": 10000, "3": 5000, "2": 3000, "1": 2000}
},
"reviews": [
{ "review_id": "rev-uuid", "user": {"name": "John D.", "verified": true},
"rating": 4, "title": "Great quality", "body": "...",
"helpful_count": 234, "photos": [...], "created_at": "2026-03-10",
"seller_response": {"body": "Thank you for your feedback!", "responded_at": "2026-03-11"} }
]
}Vote Helpful
POST /api/v1/reviews/{review_id}/helpful
{ "helpful": true }
Response: 200 OK
{ "helpful_count": 235 }Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON 401 Unauthorized: missing or invalid auth token or API key 403 Forbidden: authenticated but insufficient permissions 404 Not Found: resource ID does not exist 409 Conflict: duplicate write or version conflict; retry with idempotency key 422 Unprocessable Entity: valid syntax but invalid business logic 429 Too Many Requests: rate limit exceeded; honor Retry-After header 500 Internal Error: unexpected server fault; retry with idempotency key 503 Service Unavailable: dependency down or overloaded; use exponential backoff
PostgreSQL: Reviews
CREATE TABLE reviews (
review_id UUID PRIMARY KEY,
product_id VARCHAR(50) NOT NULL,
user_id UUID NOT NULL,
order_id UUID,
rating SMALLINT NOT NULL CHECK (rating BETWEEN 1 AND 5),
title VARCHAR(200),
body TEXT,
photos JSONB,
verified_purchase BOOLEAN DEFAULT FALSE,
helpful_count INT DEFAULT 0,
unhelpful_count INT DEFAULT 0,
status ENUM('published','suppressed','pending_review','deleted') DEFAULT 'published',
fraud_score DECIMAL(4,3),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ,
UNIQUE (product_id, user_id), -- one review per user per product
INDEX idx_product (product_id, status, created_at DESC),
INDEX idx_product_helpful (product_id, status, helpful_count DESC),
INDEX idx_user (user_id, created_at DESC)
);
CREATE TABLE product_ratings (
product_id VARCHAR(50) PRIMARY KEY,
avg_rating DECIMAL(3,2),
total_reviews INT DEFAULT 0,
star_1 INT DEFAULT 0, star_2 INT DEFAULT 0, star_3 INT DEFAULT 0,
star_4 INT DEFAULT 0, star_5 INT DEFAULT 0,
updated_at TIMESTAMPTZ
);
CREATE TABLE review_votes (
user_id UUID NOT NULL,
review_id UUID NOT NULL,
helpful BOOLEAN NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (user_id, review_id)
);
CREATE TABLE seller_responses (
response_id UUID PRIMARY KEY,
review_id UUID NOT NULL UNIQUE,
seller_id UUID NOT NULL,
body TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);Redis
# Aggregate ratings cache
rating:{product_id} --> Hash { avg, total, s1, s2, s3, s4, s5 }
TTL: 300
# Helpful vote dedup
voted:{user_id}:{review_id} --> "1"
TTL: 86400
# Review count per user per day (rate limit)
review_rate:{user_id}:{date} --> INT
TTL: 86400Elasticsearch: Review Search
{
"review_id": "rev-uuid", "product_id": "prod-123",
"rating": 4, "title": "Great quality",
"body": "The product quality is excellent but shipping was slow...",
"verified": true, "helpful_count": 234, "created_at": "2026-03-10"
}
// Queries: "battery life" within reviews of product X
// Filter by: rating >= 4, verified only, sort by helpful| Concern | Solution |
|---|---|
| Review written but aggregate not updated | Kafka at-least-once delivery to aggregation worker; idempotent UPDATE |
| Duplicate review | UNIQUE(product_id, user_id) constraint prevents duplicates |
| Rating cache stale | TTL 5 min; invalidated on write; worst case 5-min lag |
| Fake review flood | Rate limit per user (max 5 reviews/day); async ML detection |
| Vote spam | One vote per user per review (DB primary key constraint + Redis dedup) |
| Aggregate drift | Nightly batch: recompute all aggregates from reviews table; overwrite running values |
Interview Walkthrough
- Start with trust: verified-purchase badges and one-review-per-order prevent fake reviews from dominating the signal.
- Explain write path to PostgreSQL (source of truth) and async indexing to Elasticsearch for full-text search and faceted filters.
- Cover aggregate rating updates: recompute product average on each new review, cache in Redis with invalidation on write.
- Walk through helpful-vote ranking with Wilson score — balances ratio and sample size so 10/10 beats 100/200.
- Mention moderation pipeline: ML toxicity score → auto-reject, borderline → human queue, approved → publish event to ES.
- Discuss photo reviews stored in S3 with CDN delivery, metadata indexed in ES alongside text for "reviews with photos" filters.
- Common pitfall: sorting reviews by raw helpful_count — old reviews accumulate votes forever and bury fresh, relevant feedback.
Review Ordering: Wilson Score Interval
Problem: Sort by helpful_count is biased toward old reviews (more time to accumulate votes). Wilson Score balances helpfulness RATIO with confidence interval: wilson = (p + z^2/(2n) - z*sqrt((p(1-p) + z^2/(4n))/n)) / (1 + z^2/n) p = helpful_votes / total_votes, n = total_votes, z = 1.96 (95% confidence) Review A: 10/10 helpful -> wilson = 0.72 Review B: 100/200 helpful -> wilson = 0.46 10/10 correctly ranks higher despite fewer total votes. This is what Reddit uses for comment ranking. Amazon uses a similar approach.
AI Review Summaries
Instead of reading 50K reviews, show: "Customers frequently mention excellent battery life (45%), lightweight design (38%). Common complaints: slow charging (12%), screen glare (8%)." Implementation: 1. Weekly batch per product (products with > 50 reviews) 2. Collect all review texts -> topic extraction (BERT NER / LDA) 3. Identify top 5 positive and top 3 negative themes 4. Generate summary using LLM (GPT-4 / Claude) 5. Cache in product_review_summary table + Redis Cost: ~$0.01 per summary. Products with > 50 reviews: ~10M. Total: $100K one-time. Refresh weekly for active products (~$20K/week).
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core review rating system flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.