Two phase commit in distributed system

Migo·2025년 1월 30일

Distributed system

목록 보기

12/12

1) Language issue: Leader / coordinator

this is the same word that describes

the actor that manages consensus in consensus algorithm
the actor that manages replication within the shard

However, "cohorts" in 2PC are usually partitions(shards) that operate over disjoint datasets.

The implication of this is, you may not apply 2PC in the context of replication at all.

2) Votes

During the first phase, it collects votes from cohorts. This, again, may mislead people into thinking that the vote is the same as vote in consensus. 2PC do NOT allow a rejection - it's all or none.

3) Transaction log

Given the "usual" fact that each cohort participant is responsible for disjoint dataset, it is not surprising that they manage their own transaction logs. These are used to overcome failure cases.

Say, if one of the cohorts has failed after the first phase, given the transaction log, it can learn the last result from coordinator after it comes back.

By the same logic even if it doesn't fail, when a cohort hasn't received the result(commit or abort) from coordinator, it can attempt to find the result from the coordinator predicated on the timing assumption.

4) 2PC in MSA

Think of each subdomain in micro-service as cohort participant.

We have process-manager that's practically the same as coordinator(yet another language problem - same concept, different term).

We may have either synchronous or asynchronous voting.

We should have transaction log.

However, 2PC itself has inherent issue with coordinator failure and so does distributed transaction in MSA.

Migo