Chandy-Lamport Algorithm(2) - In depth

Migo·2025년 1월 3일
0

Distributed system

목록 보기
9/12

The Chandy–Lamport algorithm is a snapshot algorithm that is used in distributed systems for recording a consistent global state of an asynchronous system.

Before we begin...

Interesting thing about Chandy-Lamport algorithm is that

  • ANY process can start the snapshotting.
  • It doesn't have to announce to other processes before the beginning.
  • It has NOTHING to do with total order so it does NOT prevent total order anomaly.
  • It assumes messages arrive in FIFO order.
  • Marker messages will be sent N*(N-1) times where N is the number of processes in the system
  • This is decentralized snapshot algorithm so even if snapshoting is initiated by two processes concurrenly, it still works

At a glance

Initiator process

As mentioned, there is no cooridnator that ochestrates snapshotting process. It's just any random coordinator that begins the process and let's call this initiator process. It does:

  • Records its own state.
  • Sends a marker message out on all its outgoing channels.
  • Starts recording the messages it receives on all its incoming channels.

Receiver of marker message

When a process Pi gets a marker message on channel Cki(channel from Pk to Pi) :

1) If the marker message is the first marker Pi has seen:

  • Pi records its state
  • Pi marks channel Cki as empty
  • Pi sends a marker out on every outgoing channel(Cij)
  • Pi starts recoding incoming messages on all its incoming channels except Cki which was the one on which the first marker came in.

(TL;DR;record your start & start recoding incoming messages)

2) If the process has already seen the marker message:

What counds as seeing? If you have done anything at all, you've seen this.

  • Pi stops recoding recoding on Cki and sets the channel, Cki's final state as the sequence of all the incoming messages that arrived on Cki since recoding began.

The reason FIFO delivery guarantee must be assumed

What if the marker message arrives before C in the following?

In this case, s2 gets smaller, capturing only B and not C. And marker message would sent back like this:

Remember that before doing anything else like sending C, P2 has to send marker messages out on all of its outgoing channels. If cross over happens between marker message going back to p1 and C as follows, P1 will include D in its snapshot.

More examples

Let's say we have the following processes and events.

1) Say A is the initiator and it starts snapshotting right after event B. Suppose it just happens that sending marker message for s1 takes quite a long and it lands on P2 after I and on P2 after H.

2) P1 starts recoding on C21 and C31. And P3 receives marker message as follows, marking C13 as empty and recording C23.

3) P1 got marker message back which it has seen before. So it stops recording on C31, marking it as empty:

4) P2 gets to record its state(right after H), mark empty on the channel that the marker message arrived on and send marker message on all of its outgoing channels.

Now, P1 gets P2 which was waiting for the marker message to come in to finish up the snapshot. And from the time P1 sent the marker message, P1 received D from P2 so it's included in snapshot.

On P3, it hasn't received any message through C23 so when it receives marker message from P2, it just marks the channel as empty.

Therefore the end result will be:

Questions

When does a given process know when snapshot is done?

It knows its done when it has recorded its own state and the state of all of its incoming channel.

When does the entire system know when snapshot is done?

This is a problem that gets solved EXTERNALLY to the Chandy-Lamport algorithm. What the algorithm does is establish a way for every process to take its own individual snapshot.

Why doesn't the algorithm record the internal event like C, J?

They would, actually. The only reason it didn't get recorded was because when, for example, P1 decided to take snapshot of its own state, C hadn't happened yet. This is important to make sure that every process has the snapshot of causally-related snapshot.

Is this algorithm guaranteed to be terminated?

There are a lot of assumptions involved:

  • Messages are reliably delivered
  • Processes don't crash
  • FIFO delivery is guaranteed

In the next post, I'll talk more about this assumptions this algorithm makes.

profile
Dude with existential crisis

0개의 댓글