System Design Problem

Design a Distributed Job Scheduler (Quartz / Airflow)

Commonly Asked By:GoogleNetflixAirbnbUber

  • Submit jobs: Submit jobs to be executed at a specific time or on a recurring schedule (cron-like)
  • One-time jobs: Execute once at a specific time (e.g., "send email at 3pm tomorrow")
  • Recurring jobs: Execute on a schedule (e.g., "every 5 minutes", "daily at midnight", cron expressions)
  • Job priorities: Support priority levels (critical, high, normal, low)
  • Job dependencies: Job B runs only after Job A completes (DAG execution)
  • Retry on failure: Configurable retry policy (max retries, backoff strategy)
  • Job status: Track status (pending, scheduled, running, completed, failed, cancelled)
  • Job cancellation: Cancel a pending or scheduled job
  • Effectively-once per schedule: At-least-once dispatch with idempotent workers + dispatch locks: duplicate runs produce the same side effect once
Loading...

The system splits jobs between PostgreSQL (cold/durable) and Redis (hot/fast). A leader-elected Schedule Manager scans Redis buckets every second and dispatches due jobs to Kafka priority topics.