System Design Problem

Design a Distributed Consensus System (Raft / Paxos)

Commonly Asked By:GoogleCoreOSHashiCorpMicrosoft

  • Implement a replicated state machine across N nodes (typically 3, 5, or 7)
  • Leader election: automatically elect a leader; re-elect on failure
  • Log replication: leader replicates log entries to followers in order
  • Linearizable reads and writes: clients see the most recent committed value
  • Membership changes: add/remove nodes without downtime (joint consensus)
  • Snapshot support: compact log by snapshotting state machine
  • Client request forwarding: followers redirect to leader
Loading...

Raft Leader Election

Normal operation:
  - Leader sends heartbeats (empty AppendEntries) every 100ms
  - Followers reset election timer on heartbeat receipt

Leader failure:
  1. Follower's election timer expires (random: 150-300ms)
  2. Follower increments term, transitions to CANDIDATE
  3. Votes for self, sends RequestVote to all peers
  4. Includes: candidate's term, last log index, last log term
  5. Other nodes vote YES if: candidate's term >= voter's term, voter hasn't voted,
     candidate's log is at least as up-to-date
  6. If candidate gets majority → becomes LEADER
  7. If another leader discovered (higher term) → revert to FOLLOWER

Log Replication

1. Client sends write request to leader
2. Leader appends entry to local log: {term, index, command}
3. Leader sends AppendEntries RPC to all followers
4. Follower checks log consistency at prev_log_index + prev_log_term
5. If mismatch → follower rejects; leader backtracks until logs match
6. When majority reply success → leader commits entry, applies to state machine
7. Commit index propagated to followers in next heartbeat