System Design Problem

Design a Distributed Stream Processing System (Apache Flink)

Commonly Asked By:NetflixUberLinkedInStripe

  • Process unbounded (infinite) event streams in real-time with sub-second latency
  • Support windowed aggregations: tumbling, sliding, session, and global windows
  • Handle event-time semantics with watermarks for out-of-order data
  • Provide exactly-once processing guarantees even across failures
  • Support stateful operators: keyed state (per-key counters, aggregates, ML features) persisted durably
  • Stream-to-stream joins: join two event streams on a key within a time window
  • Stream-to-table enrichment joins: enrich streaming events with slowly-changing dimension data
  • SQL interface for analysts (Flink SQL) alongside programmatic DataStream API for engineers
  • Connectors to sources (Kafka, Kinesis, files) and sinks (Kafka, Elasticsearch, PostgreSQL, S3, ClickHouse)
  • Savepoints: manually triggered consistent snapshots for version upgrades and job migration
  • Backpressure handling: slow operators should not cause data loss
Loading...