This problem appears in multiple sheets. Depth expectations increase as you progress:
Interview Prompt
Design Content Moderation System.
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Which of these is highest priority: Multi-modal detection (text, image, video), ML classifier + human review queue, Appeal flow? | Forces scope negotiation — senior candidates trim before drawing boxes. |
| What scale should we design for — DAU, QPS, data volume? | Drives every capacity decision; shows structured thinking. |
| What are the read vs write patterns on the critical path? | Determines caching, DB choice, and replication topology. |
| What consistency and durability guarantees are required? | Separates strong-consistency paths from eventual ones — a senior differentiator. |
Scope
In scope
- Multi-modal detection (text, image, video)
- ML classifier + human review queue
- Appeal flow
- False positive handling
- Latency vs accuracy trade-off
- Capacity estimation with shown math
Out of scope (state explicitly)
- Detailed frontend/UI pixel implementation
- Org structure, staffing, and hiring plan
Assumptions
- Clarify scale (DAU, QPS, data volume) for content moderation system in the first 5 minutes
- Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
- Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Multi-modal moderation: Moderate text, images, video, and audio content
- Real-time scoring: Score content for violations before or immediately after publishing
- Policy engine: Configurable rules per content type, region, and community standards
- Violation categories: Hate speech, nudity/NSFW, violence/gore, spam, misinformation, harassment, copyright, CSAM
- Action framework: Auto-remove (high confidence), auto-flag for review (medium), allow (low risk)
- Human review queue: Prioritized queue for flagged content with analyst tooling
- Appeals: Users can appeal moderation decisions
- User reporting: Users report content; reports feed into moderation pipeline
- Audit trail: Every moderation decision logged with reason, model version, reviewer
- Feedback loop: Reviewer decisions retrain ML models
- Low Latency: Pre-publish moderation in < 500 ms (text); < 5 seconds (image); < 30 seconds (video)
- High Recall: Catch > 99.5% of CSAM, > 95% of hate speech
- Acceptable Precision: False positive rate < 5%
- Scale: 500M+ posts/day, 100M+ images/day, 10M+ videos/day
- Availability: 99.99%
- Regional Compliance: Different rules per country
- Reviewer Wellbeing: Limit harmful content exposure; rotate, counsel
| Metric | Calculation | Value |
|---|---|---|
| Text posts / day | Given | 500M |
| Images / day | Given | 100M |
| Videos / day | Given | 10M |
| Text moderation / sec | Derived from daily volume ÷ 86400 (+ peak factor) | ~6K |
| Image moderation / sec | Derived from daily volume ÷ 86400 (+ peak factor) | ~1.2K |
| Video moderation / sec | Derived from daily volume ÷ 86400 (+ peak factor) | ~120 |
| Human review cases / day | ~1% of 610M total | 5M (~1% of total) |
| Human reviewers | Given | ~15K |
Text Moderation Pipeline
Layer 1: Keyword/Regex Filter (< 1ms) Bloom filter + Aho-Corasick for known bad terms Catches: obvious slurs, known spam phrases Layer 2: ML Text Classifier (< 50ms) Fine-tuned BERT/DistilBERT Multi-label: P(hate_speech), P(harassment), P(spam), P(violence) Batch of 32 texts on GPU in ~50ms = 1.5ms per text Layer 3: LLM-based Analysis (borderline cases only, < 2s) Only if Layer 2 score is 0.3-0.7 Expensive: ~$0.01 per text -> only 10% of traffic Layer 4: Context Enrichment User's history, conversation context, community norms
Image Moderation Pipeline
Step 1: Hash matching - PhotoDNA / pHash (< 10ms) Compare against known illegal content database (CSAM, terrorism) Match found -> IMMEDIATE removal + report to NCMEC Step 2: ML Classification (< 200ms) EfficientNet/ResNet-50 fine-tuned on moderation data P(nudity), P(violence), P(hate_symbol), P(drugs), P(gore) If P(minor) > 0.5 AND P(nudity) > 0.5 -> escalate to CSAM team Step 3: OCR + Text Moderation (for text in images) Step 4: Object Detection (YOLO/Faster-RCNN) Weapons, flags/symbols associated with extremism
Video Moderation Pipeline
Approach: Sample + Classify (not every frame) Step 1: Extract key frames (every 2 seconds) + audio track 10-min video -> 300 frames + audio Step 2: Run image moderation on each key frame (parallel) Step 3: Audio moderation Speech-to-text -> text pipeline; gunshots/screams detection; copyright matching Optimization - Prioritized Scanning: First pass: sample 10 frames -> quick assessment If clean -> low-priority full scan later If flagged -> immediately scan all 300 frames Result: 90% of videos need only 10 frames analyzed
Policy Engine: Regional Compliance
Same content may be legal in one country and illegal in another. Implementation: Policy rules configurable per: country, content type, user age, community type On moderation: apply model scores against regional thresholds Geo-restricted removal: content removed in Germany but visible in US
Human Review Queue: Prioritization
5M flagged items/day. 15K reviewers. Not all items equal priority.
Priority scoring:
priority = severity_weight x reach x time_sensitivity
severity: CSAM=1000, Violence=100, Hate=50, Nudity=30, Spam=10
reach: Viral 10x, Regular 1x, Private 0.5x
time: Going viral 10x, Steady 1x, Old 0.5x
Queue: Redis sorted set ZADD review_queue {priority} {content_id}
Reviewer dequeues: ZPOPMAX review_queueScore Content (Pre-Publish)
POST /api/v1/moderation/score
{
"content_id": "post-uuid",
"content_type": "text",
"text": "This is the post content...",
"user_id": "user-uuid",
"region": "US",
"community_id": "comm-uuid"
}
Response:
{
"decision": "allow",
"scores": { "hate_speech": 0.05, "spam": 0.02, "violence": 0.01 },
"flags": [], "review_required": false
}Report Content
POST /api/v1/moderation/report
{ "content_id": "post-uuid", "reporter_id": "user-uuid",
"reason": "hate_speech", "details": "Contains racial slurs" }
-> { "report_id": "rpt-uuid", "status": "received" }Review Decision & Appeal
POST /api/v1/moderation/review/{content_id}/decide
{ "reviewer_id": "...", "decision": "remove",
"violation_category": "hate_speech", "notes": "..." }
-> { "decision_id": "...", "action_taken": "removed" }
POST /api/v1/moderation/appeal
{ "content_id": "post-uuid", "user_id": "user-uuid",
"reason": "This is a news quote, not hate speech" }
-> { "appeal_id": "app-uuid", "status": "under_review" }Common Error Responses
400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
202 Accepted: job queued; poll GET /jobs/{id} for status
408 Request Timeout: job still processing; continue pollingPostgreSQL: Moderation Decisions & Policies
CREATE TABLE moderation_decisions (
decision_id UUID PRIMARY KEY, content_id UUID NOT NULL,
content_type ENUM('text','image','video','audio') NOT NULL,
user_id UUID NOT NULL, auto_scores JSONB,
auto_decision ENUM('allow','review','remove'),
final_decision ENUM('allow','remove','restrict','warning'),
violation_category VARCHAR(50), reviewer_id UUID,
policy_version VARCHAR(20), model_version VARCHAR(20),
region VARCHAR(10), appealed BOOLEAN DEFAULT FALSE,
created_at TIMESTAMPTZ DEFAULT NOW(), decided_at TIMESTAMPTZ
);
CREATE TABLE moderation_policies (
policy_id UUID PRIMARY KEY, region VARCHAR(10) NOT NULL,
content_type VARCHAR(20) NOT NULL,
violation_category VARCHAR(50) NOT NULL,
auto_remove_threshold DECIMAL(4,3) DEFAULT 0.90,
review_threshold DECIMAL(4,3) DEFAULT 0.30,
enabled BOOLEAN DEFAULT TRUE, effective_from TIMESTAMPTZ,
UNIQUE (region, content_type, violation_category, effective_from)
);Redis: Real-Time State
review_queue:{category} -> Sorted Set { content_id: priority_score }
user_trust:{user_id} -> FLOAT (0.0 = untrusted, 1.0 = highly trusted)
strikes:{user_id} -> INT
mod_score:{content_hash} -> JSON (scores) TTL: 3600| Concern | Solution |
|---|---|
| Moderation service down | Fail-close for new accounts; fail-open for trusted users |
| ML model degradation | Monitor precision/recall daily; auto-rollback if drops > 5% |
| Review queue backlog | Auto-adjust thresholds; hire surge reviewers |
| False positive spike | Circuit breaker: switch all decisions to 'review' |
| Hash database unavailable | Continue with ML-only; maintain local cache + replicas |
| Regional policy update | Version policies with effective dates; hot-reload every 60s |
Handling Viral Harmful Content
Scenario: Terrorist attack live-streamed. Video going viral in real-time. Response playbook: 1. CSAM/terrorism hash match -> immediate auto-removal 2. Hash all known copies (perceptual hash survives re-encoding) 3. Block ALL uploads that match hash within 10 seconds 4. ML model: flag visually similar content 5. Keyword filters: block titles/descriptions referencing the event 6. Rate limit new account uploads 7. Human review: "war room" mode Prevention: shared industry hash databases (GIFCT, PhotoDNA/NCMEC)
Interview Walkthrough
- Frame as a multi-modal pipeline with different latency budgets: text (< 500 ms), image (< 5 s), video (sampled key frames, not every frame).
- Describe tiered filtering: Bloom filter + regex → ML classifier (BERT) → LLM for borderline cases only — cost scales with uncertainty, not volume.
- Image path: PhotoDNA hash match for CSAM first (immediate removal + NCMEC report), then CV classification for nudity/violence/gore.
- Three-way action framework: auto-remove (high confidence), auto-flag for human review (medium), allow (low risk) — thresholds differ per violation category.
- Prioritize the human review queue with
severity × reach × time_sensitivityin a Redis sorted set — 5M flagged items/day cannot be FIFO. - Hybrid pre-publish vs post-publish routing: strict for new/low-trust users, async for established accounts with high trust scores.
- Common pitfall: using the same ML threshold for CSAM and spam — CSAM demands maximum recall at any precision cost; spam tolerates false positives.
Pre-Publish vs Post-Publish Moderation
Pre-publish: moderate BEFORE content is visible ✓ Harmful content never seen ✗ Adds latency (500ms text, 5s image, 30s video) Use for: high-risk content types Post-publish: publish immediately, moderate async ✓ Zero latency ✗ Harmful content visible for seconds-to-minutes Use for: low-risk users with trust score > 0.8 Hybrid (recommended): New users/low trust: pre-publish (stricter) Established users/high trust: post-publish (faster)
Model Accuracy vs Coverage (Precision-Recall)
Threshold = 0.95: Precision 99%, Recall 70% Threshold = 0.70: Precision 85%, Recall 95% Different thresholds per category: CSAM: threshold 0.50 (maximize recall at all costs) Spam: threshold 0.80 (false positives acceptable) Hate speech: threshold 0.90 (context matters) Nudity: threshold 0.85 Principle: the higher the harm of a false negative, the lower the threshold.
Cost of Moderation at Scale
ML inference costs (per day): Text: 500M x $0.0001 = $50K Images: 100M x $0.001 = $100K Videos: 10M x $0.01 = $100K LLM (borderline): 50M x $0.01 = $500K Total ML: ~$750K/day = ~$274M/year Human review: 15K reviewers x $15/hr x 8hr = $1.8M/day Optimized cost: ~$150-200M/year (vs $900M+ without optimization) Trust-based routing, efficient models, tiered approach, hash matching
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: MVP (0 to 100K users)
Monolith or minimal services proving core content moderation system flows. Optimize for shipping speed and correctness over scale.
Key components: Single region · Primary DB + Redis cache · Synchronous core path · Basic monitoring
Move to next phase when: p99 latency exceeds SLO or DB CPU sustained above 70%
Phase 2: Growth (100K to 10M users)
Split read/write paths, introduce async processing for non-critical work, add caching layers and horizontal scaling.
Key components: Read replicas or CQRS · Message queue for async work · CDN / edge caching · Service-level SLOs
Move to next phase when: Hot keys, fan-out bottlenecks, or ops toil from manual scaling
Phase 3: Scale (10M+ users)
Shard data plane, multi-region active-active or active-passive, formal DR runbooks, cost optimization.
Key components: Database sharding / partitioning · Multi-region replication · Auto-scaling + chaos testing · Dedicated platform/SRE ownership
Move to next phase when: Regional failure domain risk, compliance data residency, or linear cost growth unsustainable
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Core user-facing availability | 99.95% | Budget for planned maintenance + unplanned failures without user-visible outage. |
| p99 latency (critical path) | Problem-specific — state target early and tie to capacity math | Interview credibility comes from connecting SLO to architecture choices. |
| Error rate (5xx) | < 0.1% | Distinguishes transient blips from systemic failure requiring rollback. |
| Data durability | 99.999999999% (11 nines) for committed writes | Define which operations require fsync/quorum vs async replication. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Primary database unavailable | Health check failures, connection pool exhaustion alerts, elevated 5xx | Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists |
| Traffic spike (10× normal) | RPS anomaly alert, autoscaling lag, latency SLO burn rate | Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations |
| Bad deploy causing elevated errors | Canary metric regression, error budget burn, deployment correlation | Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility |
Cost Drivers (Staff lens)
- Egress bandwidth and CDN (often dominates media/data-heavy systems)
- Database storage + IOPS at scale (plan compaction, TTL, tiering)
- Compute for async pipelines (right-size workers, spot instances for batch)
- Managed service premiums vs operational headcount trade-off
Multi-Region & DR
Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.