Design a Content Moderation System

This problem appears in multiple sheets. Depth expectations increase as you progress:

Track	What to demonstrate
Arch 50	Show domain depth beyond the baseline: async pipelines, consistency semantics, and operational SLOs.
Arch 75	Staff angles: partition behavior, cost drivers, and MVP → production evolution with clear triggers.

Interview Prompt

Design Content Moderation System.

Clarifying Questions (ask before designing)

Question	Why it matters
Which of these is highest priority: Multi-modal detection (text, image, video), ML classifier + human review queue, Appeal flow?	Forces scope negotiation — senior candidates trim before drawing boxes.
What scale should we design for — DAU, QPS, data volume?	Drives every capacity decision; shows structured thinking.
What are the read vs write patterns on the critical path?	Determines caching, DB choice, and replication topology.
What consistency and durability guarantees are required?	Separates strong-consistency paths from eventual ones — a senior differentiator.

Scope

In scope

Multi-modal detection (text, image, video)
ML classifier + human review queue
Appeal flow
False positive handling
Latency vs accuracy trade-off
Capacity estimation with shown math

Out of scope (state explicitly)

Detailed frontend/UI pixel implementation
Org structure, staffing, and hiring plan

Assumptions

Clarify scale (DAU, QPS, data volume) for content moderation system in the first 5 minutes
Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks

Multi-modal moderation: Moderate text, images, video, and audio content
Real-time scoring: Score content for violations before or immediately after publishing
Policy engine: Configurable rules per content type, region, and community standards
Violation categories: Hate speech, nudity/NSFW, violence/gore, spam, misinformation, harassment, copyright, CSAM
Action framework: Auto-remove (high confidence), auto-flag for review (medium), allow (low risk)
Human review queue: Prioritized queue for flagged content with analyst tooling
Appeals: Users can appeal moderation decisions
User reporting: Users report content; reports feed into moderation pipeline
Audit trail: Every moderation decision logged with reason, model version, reviewer
Feedback loop: Reviewer decisions retrain ML models

Metric	Calculation	Value
Text posts / day	Given	500M
Images / day	Given	100M
Videos / day	Given	10M
Text moderation / sec	Derived from daily volume ÷ 86400 (+ peak factor)	~6K
Image moderation / sec	Derived from daily volume ÷ 86400 (+ peak factor)	~1.2K
Video moderation / sec	Derived from daily volume ÷ 86400 (+ peak factor)	~120
Human review cases / day	~1% of 610M total	5M (~1% of total)
Human reviewers	Given	~15K

Loading...

Text Moderation Pipeline

Layer 1: Keyword/Regex Filter (< 1ms)
  Bloom filter + Aho-Corasick for known bad terms
  Catches: obvious slurs, known spam phrases

Layer 2: ML Text Classifier (< 50ms)
  Fine-tuned BERT/DistilBERT
  Multi-label: P(hate_speech), P(harassment), P(spam), P(violence)
  Batch of 32 texts on GPU in ~50ms = 1.5ms per text

Layer 3: LLM-based Analysis (borderline cases only, < 2s)
  Only if Layer 2 score is 0.3-0.7
  Expensive: ~$0.01 per text -> only 10% of traffic

Layer 4: Context Enrichment
  User's history, conversation context, community norms

Image Moderation Pipeline

Step 1: Hash matching - PhotoDNA / pHash (< 10ms)
  Compare against known illegal content database (CSAM, terrorism)
  Match found -> IMMEDIATE removal + report to NCMEC

Step 2: ML Classification (< 200ms)
  EfficientNet/ResNet-50 fine-tuned on moderation data
  P(nudity), P(violence), P(hate_symbol), P(drugs), P(gore)
  If P(minor) > 0.5 AND P(nudity) > 0.5 -> escalate to CSAM team

Step 3: OCR + Text Moderation (for text in images)

Step 4: Object Detection (YOLO/Faster-RCNN)
  Weapons, flags/symbols associated with extremism

Video Moderation Pipeline

Approach: Sample + Classify (not every frame)

Step 1: Extract key frames (every 2 seconds) + audio track
  10-min video -> 300 frames + audio

Step 2: Run image moderation on each key frame (parallel)

Step 3: Audio moderation
  Speech-to-text -> text pipeline; gunshots/screams detection; copyright matching

Optimization - Prioritized Scanning:
  First pass: sample 10 frames -> quick assessment
  If clean -> low-priority full scan later
  If flagged -> immediately scan all 300 frames
  Result: 90% of videos need only 10 frames analyzed

Policy Engine: Regional Compliance

Same content may be legal in one country and illegal in another.

Implementation:
  Policy rules configurable per: country, content type, user age, community type
  On moderation: apply model scores against regional thresholds
  Geo-restricted removal: content removed in Germany but visible in US

Human Review Queue: Prioritization

5M flagged items/day. 15K reviewers. Not all items equal priority.

Priority scoring:
  priority = severity_weight x reach x time_sensitivity

  severity: CSAM=1000, Violence=100, Hate=50, Nudity=30, Spam=10
  reach: Viral 10x, Regular 1x, Private 0.5x
  time: Going viral 10x, Steady 1x, Old 0.5x

Queue: Redis sorted set ZADD review_queue {priority} {content_id}
Reviewer dequeues: ZPOPMAX review_queue

Score Content (Pre-Publish)

HTTP

POST /api/v1/moderation/score
{
  "content_id": "post-uuid",
  "content_type": "text",
  "text": "This is the post content...",
  "user_id": "user-uuid",
  "region": "US",
  "community_id": "comm-uuid"
}
Response:
{
  "decision": "allow",
  "scores": { "hate_speech": 0.05, "spam": 0.02, "violence": 0.01 },
  "flags": [], "review_required": false
}

Report Content

HTTP

POST /api/v1/moderation/report
{ "content_id": "post-uuid", "reporter_id": "user-uuid",
  "reason": "hate_speech", "details": "Contains racial slurs" }
-> { "report_id": "rpt-uuid", "status": "received" }

Review Decision & Appeal

HTTP

POST /api/v1/moderation/review/{content_id}/decide
{ "reviewer_id": "...", "decision": "remove",
  "violation_category": "hate_speech", "notes": "..." }
-> { "decision_id": "...", "action_taken": "removed" }

POST /api/v1/moderation/appeal
{ "content_id": "post-uuid", "user_id": "user-uuid",
  "reason": "This is a news quote, not hate speech" }
-> { "appeal_id": "app-uuid", "status": "under_review" }

Common Error Responses

400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
202 Accepted: job queued; poll GET /jobs/{id} for status
408 Request Timeout: job still processing; continue polling

PostgreSQL: Moderation Decisions & Policies

SQL

CREATE TABLE moderation_decisions (
    decision_id UUID PRIMARY KEY, content_id UUID NOT NULL,
    content_type ENUM('text','image','video','audio') NOT NULL,
    user_id UUID NOT NULL, auto_scores JSONB,
    auto_decision ENUM('allow','review','remove'),
    final_decision ENUM('allow','remove','restrict','warning'),
    violation_category VARCHAR(50), reviewer_id UUID,
    policy_version VARCHAR(20), model_version VARCHAR(20),
    region VARCHAR(10), appealed BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMPTZ DEFAULT NOW(), decided_at TIMESTAMPTZ
);

CREATE TABLE moderation_policies (
    policy_id UUID PRIMARY KEY, region VARCHAR(10) NOT NULL,
    content_type VARCHAR(20) NOT NULL,
    violation_category VARCHAR(50) NOT NULL,
    auto_remove_threshold DECIMAL(4,3) DEFAULT 0.90,
    review_threshold DECIMAL(4,3) DEFAULT 0.30,
    enabled BOOLEAN DEFAULT TRUE, effective_from TIMESTAMPTZ,
    UNIQUE (region, content_type, violation_category, effective_from)
);

Redis: Real-Time State

review_queue:{category}  -> Sorted Set { content_id: priority_score }
user_trust:{user_id}     -> FLOAT (0.0 = untrusted, 1.0 = highly trusted)
strikes:{user_id}        -> INT
mod_score:{content_hash} -> JSON (scores) TTL: 3600

Concern	Solution
Moderation service down	Fail-close for new accounts; fail-open for trusted users
ML model degradation	Monitor precision/recall daily; auto-rollback if drops > 5%
Review queue backlog	Auto-adjust thresholds; hire surge reviewers
False positive spike	Circuit breaker: switch all decisions to 'review'
Hash database unavailable	Continue with ML-only; maintain local cache + replicas
Regional policy update	Version policies with effective dates; hot-reload every 60s

Handling Viral Harmful Content

Scenario: Terrorist attack live-streamed. Video going viral in real-time.

Response playbook:
  1. CSAM/terrorism hash match -> immediate auto-removal
  2. Hash all known copies (perceptual hash survives re-encoding)
  3. Block ALL uploads that match hash within 10 seconds
  4. ML model: flag visually similar content
  5. Keyword filters: block titles/descriptions referencing the event
  6. Rate limit new account uploads
  7. Human review: "war room" mode

Prevention: shared industry hash databases (GIFCT, PhotoDNA/NCMEC)

SLOs & Error Budgets

Metric	Target	Rationale
Core user-facing availability	99.95%	Budget for planned maintenance + unplanned failures without user-visible outage.
p99 latency (critical path)	Problem-specific — state target early and tie to capacity math	Interview credibility comes from connecting SLO to architecture choices.
Error rate (5xx)	< 0.1%	Distinguishes transient blips from systemic failure requiring rollback.
Data durability	99.999999999% (11 nines) for committed writes	Define which operations require fsync/quorum vs async replication.

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Primary database unavailable	Health check failures, connection pool exhaustion alerts, elevated 5xx	Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists
Traffic spike (10× normal)	RPS anomaly alert, autoscaling lag, latency SLO burn rate	Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations
Bad deploy causing elevated errors	Canary metric regression, error budget burn, deployment correlation	Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility

Cost Drivers (Staff lens)

Egress bandwidth and CDN (often dominates media/data-heavy systems)
Database storage + IOPS at scale (plan compaction, TTL, tiering)
Compute for async pipelines (right-size workers, spot instances for batch)
Managed service premiums vs operational headcount trade-off

Multi-Region & DR

Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.

Interview Prompt

Clarifying Questions (ask before designing)

Scope

In scope

Out of scope (state explicitly)

Assumptions

Text Moderation Pipeline

Image Moderation Pipeline

Video Moderation Pipeline

Policy Engine: Regional Compliance

Human Review Queue: Prioritization

Score Content (Pre-Publish)

Report Content

Review Decision & Appeal

Common Error Responses

PostgreSQL: Moderation Decisions & Policies

Redis: Real-Time State

Handling Viral Harmful Content

Interview Walkthrough

Pre-Publish vs Post-Publish Moderation

Model Accuracy vs Coverage (Precision-Recall)

Cost of Moderation at Scale

Phase 1: MVP (0 to 100K users)

Phase 2: Growth (100K to 10M users)

Phase 3: Scale (10M+ users)

SLOs & Error Budgets

Incident Scenarios (2am reality)

Cost Drivers (Staff lens)

Multi-Region & DR