System Design Problem

Design a Change Data Capture (CDC) Pipeline

Commonly Asked By:NetflixDebeziumStripeLinkedIn

  • Capture every INSERT, UPDATE, DELETE from source databases in real-time
  • Propagate changes to downstream consumers with < 5 second latency
  • Maintain strict ordering of changes per row/primary key
  • Initial snapshot + ongoing incremental streaming
  • Schema evolution: handle column adds, renames, type changes
  • Multi-source: PostgreSQL, MySQL, MongoDB, SQL Server, Oracle
  • Multi-sink: Kafka, Elasticsearch, S3/Parquet, Redis, ClickHouse
  • Effectively-once delivery via at-least-once capture + idempotent consumers
  • Filtering and transformations: rename fields, mask PII, enrich
Loading...