Design Top K Most Shared Articles

Interview Prompt

Design Design Top K Most Shared Articles.

Clarifying Questions (ask before designing)

Question	Why it matters
Which of these is highest priority: Lossy counting, Time-window leaderboards, Map-reduce aggregation?	Forces scope negotiation — senior candidates trim before drawing boxes.
What scale should we design for — DAU, QPS, data volume?	Drives every capacity decision; shows structured thinking.
What are the read vs write patterns on the critical path?	Determines caching, DB choice, and replication topology.
What consistency and durability guarantees are required?	Separates strong-consistency paths from eventual ones — a senior differentiator.

Scope

In scope

Lossy counting
Time-window leaderboards
Map-reduce aggregation
Approximation algorithms
Capacity estimation with shown math

Out of scope (state explicitly)

Detailed frontend/UI pixel implementation
Org structure, staffing, and hiring plan

Assumptions

Clarify scale (DAU, QPS, data volume) for top k shared articles in the first 5 minutes
Standard reliability target 99.9%–99.99% unless problem implies higher (payments, booking)
Managed cloud services (RDS, S3, Kafka, Redis) are acceptable building blocks

Track shares: Record every time an article/URL is shared
Top-K ranking: Return the K most shared articles in a given time window (1h, 24h, 7d, all-time)
Real-time: Rankings update within 1-5 minutes of new shares
Category filtering: Top K per category (tech, sports, politics)
Regional: Top K per country/region
Trending detection: Highlight articles with fastest share velocity

Metric	Calculation	Value
Share events / day	Given	1B
Share events / sec	1B ÷ 86400	~12K (peak 100K)
Unique articles shared / day	Given	10M
Top-K returned	K	100 typically
Share event size	Given (assumption documented in value)	100 bytes
Time windows	Given (assumption documented in value)	1h, 24h, 7d, 30d

Loading...

Multi-Window Counting with Panes

Pane-based approach ⭐:
  Divide time into 5-minute "panes"
  Each pane stores: count per article in that 5-min period
  
  1-hour count = sum of last 12 panes
  24-hour count = sum of last 288 panes
  7-day count = sum of last 2016 panes
  
  Optimization: Only track articles with > 10 shares in any pane
    → ~100K active articles × 2016 panes = 800 MB → manageable

Approximate Top-K: Space-Saving Algorithm

Maintain a fixed-size map of K=1000 entries: {article_id → count}

On each share event:
  If article already in map → increment count
  If article not in map AND map has < K entries → add with count=1
  If article not in map AND map is full:
    Find entry with MINIMUM count (= min_count)
    Replace it: article_id = new_article, count = min_count + 1

Guarantee: The true top-K is always contained in the result
Memory: O(K) = O(1000) regardless of how many articles exist

Viral Article: Single Key Hot Spot

One article goes viral → 1M shares in 10 minutes → all INCR operations 
hit ONE Flink key/partition → backpressure

Solutions:
  a. Pre-aggregate at service layer:
     Each API server buffers share counts in-memory for 1 second
     Flush batch count to Kafka: {article_id: "art-123", count: 342}
     
  b. Use sub-keys → partition by hash(article_id + random(0..7))
     Flink: aggregate across sub-keys in a second stage

Event Bus Design (Kafka)

Topic: top_k_shared_articles-events
  Partitions: 64 (scale consumers horizontally)
  Partition key: entity_id (user_id / order_id — preserves per-entity ordering)
  Retention: 7 days (compliance) or 24h (high-volume telemetry)
  Replication factor: 3, min.insync.replicas: 2

Producer: idempotent producer enabled (enable.idempotence=true)
Consumer: consumer group "top_k_shared_articles-processors"
  - At-least-once delivery + idempotent handlers (dedup by event_id)
  - DLQ topic: top_k_shared_articles-events-dlq (poison messages after 3 retries)
  - Lag alert: consumer lag > 60s → scale workers

Design Top K Most Shared Articles: async side effects MUST NOT block the synchronous API response.
  Sync path: validate → persist source of truth → publish event → return 201
  Async path: consumers update caches, indexes, notifications, aggregates

HTTP

GET /api/v1/top-articles?window=24h&region=US&category=tech&limit=50
→ 200 OK
{
  "articles": [
    {"article_id": "art-123", "url": "https://...", "title": "...", 
     "share_count": 152340, "rank": 1}
  ],
  "window": "24h",
  "computed_at": "2026-03-14T10:05:00Z"
}

Common Error Responses

400 Bad Request: invalid input, missing fields, or malformed JSON
401 Unauthorized: missing or invalid auth token or API key
403 Forbidden: authenticated but insufficient permissions
404 Not Found: resource ID does not exist
409 Conflict: duplicate write or version conflict; retry with idempotency key
422 Unprocessable Entity: valid syntax but invalid business logic
429 Too Many Requests: rate limit exceeded; honor Retry-After header
500 Internal Error: unexpected server fault; retry with idempotency key
503 Service Unavailable: dependency down or overloaded; use exponential backoff
504 Gateway Timeout: index shard slow; narrow query or retry

Redis

Key:    topk:{window}:{region}:{category}
Type:   Sorted Set
Score:  share_count
Member: article_id
Ops:    ZREVRANGE for top-K, ZADD for updates

Key:    article_meta:{article_id}
Type:   Hash
Fields: url, title, image_url, publisher, category

ClickHouse

SQL

CREATE TABLE share_events (
    article_id   String,
    user_id      String,
    platform     LowCardinality(String),
    region       LowCardinality(String),
    category     LowCardinality(String),
    shared_at    DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(shared_at)
ORDER BY (region, category, shared_at);

CREATE MATERIALIZED VIEW share_counts_hourly
ENGINE = SummingMergeTree()
ORDER BY (article_id, region, category, hour)
AS SELECT
    article_id, region, category,
    toStartOfHour(shared_at) AS hour,
    count() AS share_count
FROM share_events
GROUP BY article_id, region, category, hour;

Concern	Solution
Flink failure	Checkpoint to S3 every 60s; resume from checkpoint
Redis loss	Flink recomputes full top-K within 60 seconds from pane state
Late events	Flink watermark allows 5-min late arrivals
Article metadata missing	Async enrichment; display URL if title unavailable
Spam shares	Rate limit per user (max 100 shares/hour); bot detection

URL Deduplication for Articles

URL normalization pipeline:
  1. Expand shortened URLs (follow redirects)
  2. Strip tracking params (utm_source, utm_medium, fbclid)
  3. Normalize: lowercase, remove trailing slash, remove www prefix
  4. Compute canonical_url_hash = SHA256(normalized_url)
  5. Use canonical_url_hash as the article_id for counting

Click-Bait and Spam Filtering

Detection signals:
  1. Share source diversity: if >80% of shares from <50 unique users → spam
  2. Article domain reputation: known spam domains get score penalty
  3. Share velocity anomaly: organic articles grow gradually; spam jumps 0→100K
  4. User account age: if >50% of sharers have accounts <7 days old → bot

Action: Apply penalty multiplier (0.1×) or suppress entirely

Virality Detection

Track share velocity per article (shares per minute, sliding window)
  Stage 1: "Emerging" — velocity > 50 shares/min
  Stage 2: "Viral"    — velocity > 500 shares/min
  Stage 3: "Mega-viral" — velocity > 5000 shares/min

On stage transition → publish Kafka event for:
  Editorial dashboard, push notification, CDN pre-warming

Interview Walkthrough

Normalize URLs before counting: strip tracking params, expand short links, lowercase — otherwise the same article splits across multiple IDs.
Use a sliding time window (last 24h) with decay, not lifetime totals — stale evergreen articles must not dominate the leaderboard.
Apply Space-Saving or Count-Min Sketch for approximate per-article counts — O(1) per share event instead of O(N log N) global sort.
Run spam detection on share velocity anomalies: organic growth is gradual, bot campaigns jump from 0 to 100K shares in minutes.
Penalize articles where >80% of shares come from <50 unique accounts or from accounts younger than 7 days.
Publish virality stage transitions (Emerging → Viral → Mega-viral) via Kafka to trigger CDN pre-warming and editorial alerts.
Quantify: 100K shares/sec × 200 bytes ≈ 20 MB/s ingest — partition Kafka by canonical URL hash for ordered per-article counting.
Common pitfall: running an exact global sort on every share event — memory and CPU explode as the article catalog grows past 10M entries.

Exact vs Approximate Top-K

Approach	Memory	Accuracy	Speed
Exact (Full Count)	O(N) = 10M+ entries	Perfect	Expensive sort
Space-Saving ⭐	O(K) = 1000 entries	Approximate	O(1) per event
Hybrid (Count-Min + HashMap + Sort)	O(10K)	Exact top-100	Fast

MapReduce vs Stream Processing

Batch (Spark): scan all events → aggregate → rank. Latency: hours. Exact.
Stream (Flink) ⭐: continuous processing. Latency: seconds. Near real-time.

Best: Flink for real-time + hourly Spark job to reconcile exact counts

Fixed Tumbling Window vs Exponential Decay

Fixed tumbling window: cliff effect — article drops off suddenly at T+1h
Exponential decay: score = shares × e^(-λ × age), no hard cliff

Recommended hybrid:
  Real-time: Redis INCR in 1-minute buckets for last 24 hours
  Batch: Spark/Flink hourly computes decay-weighted score
  Serve: Redis sorted set ZREVRANGE for top-K query

SLOs & Error Budgets

Metric	Target	Rationale
Core user-facing availability	99.95%	Budget for planned maintenance + unplanned failures without user-visible outage.
p99 latency (critical path)	Problem-specific — state target early and tie to capacity math	Interview credibility comes from connecting SLO to architecture choices.
Error rate (5xx)	< 0.1%	Distinguishes transient blips from systemic failure requiring rollback.
Data durability	99.999999999% (11 nines) for committed writes	Define which operations require fsync/quorum vs async replication.

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Primary database unavailable	Health check failures, connection pool exhaustion alerts, elevated 5xx	Failover to replica / promote standby; enable read-only degraded mode if writes impossible; queue writes if async path exists
Traffic spike (10× normal)	RPS anomaly alert, autoscaling lag, latency SLO burn rate	Rate limit non-critical endpoints; scale read path horizontally; pre-warm caches; shed load on expensive operations
Bad deploy causing elevated errors	Canary metric regression, error budget burn, deployment correlation	Automated rollback within 5 minutes; feature flag kill switch; maintain N-1 compatibility

Cost Drivers (Staff lens)

Egress bandwidth and CDN (often dominates media/data-heavy systems)
Database storage + IOPS at scale (plan compaction, TTL, tiering)
Compute for async pipelines (right-size workers, spot instances for batch)
Managed service premiums vs operational headcount trade-off

Multi-Region & DR

Start single-region with cross-AZ redundancy. Add read replicas in secondary region for DR. Move to active-active only when latency SLO or data residency requires it — accept conflict resolution complexity explicitly.