System Design Problem

Design a Circuit Breaker

Commonly Asked By:NetflixMicrosoftGoogleAWS

  • Monitor health of downstream service calls (success/failure rates)
  • Automatically stop sending requests to unhealthy services (circuit OPEN)
  • Periodically probe to check if service has recovered (HALF-OPEN state)
  • Resume traffic when service recovers (circuit CLOSED)
  • Configurable thresholds: failure rate, slow-call rate, minimum call volume
  • Support per-service, per-endpoint, per-client circuit breakers
  • Dashboard showing circuit breaker states across all services
  • Integration with service mesh (Envoy/Istio sidecar) or application library
  • Fallback responses when circuit is open (cached response, default value, degraded mode)
  • Manual override: force-open or force-close circuits

An in-process circuit breaker library wrapping calls to downstream services. The state machine transitions between CLOSED (normal operation), OPEN (rejecting requests), and HALF-OPEN (probing recovery). Metrics are emitted to Prometheus for dashboarding.

Loading...