System Design Problem

Design a User Analytics Pipeline (like Google Analytics)

Commonly Asked By:GoogleMixpanelAmplitudeSegment

  • Event tracking: Capture client-side page views, link clicks, scroll events, and custom events across web and mobile.
  • Real-time dashboard: Show active current users, global events/sec, and top trending pages instantaneously.
  • Historical reports: Run aggregations (e.g. pageviews over 30 days grouped by country or device category).
  • Funnel conversion: Trace user conversion journeys (e.g. Landing Page → Registration → Check Out).
  • Cohort retention: Group users by registration cohorts to measure long-term user retention.
  • Custom dimensions: Allow developers to attach arbitrary key-value metadata to events.
  • Session management: Stitch independent event streams into discrete user sessions with a 30-minute inactivity threshold.

Web and Mobile SDKs batch and forward events to a geo-distributed Edge Collector Cluster (which executes validation, GeoIP enrichment, and bot filtering). Enriched events pass through Kafka queues. Apache Flink aggregates real-time metrics (writing to Redis), while Spark jobs build session models and load columnar ClickHouse databases for historical OLAP queries.

Loading...