MVCC 공부해보기 - 2

이동훈·2023년 1월 25일

데이터베이스

목록 보기

2/4

Intro

이번 블로그 시리즈는 CMU 2019 Intro to database system에서 MVCC(Multiversion Concurrency Control) 부분에 대한 요약 정리로 진행하려고 합니다. 저도 대학교 때 같은 이름의 수업을 들었는데 저희 학교는 MVCC쪽은 많이 다루지 않았고 초점이 SQL 쿼리를 잘 작성하는데 더 많은 시간을 썼던 것 같습니다.
그래서 이 기회에 MVCC 복습할 겸, 관련된 부분을 강의를 듣게 되고 요약 정리 목적으로 이 글을 작성합니다.
CMU 강의로는 17번째 강의입니다.

들어가며

저번 블로그에서 ACID 중에서 I(Isolation)에 대해서 간단히 알아보면서 I를 보장하기 위해서 Concurrency Protocol이 존재하고 이는 크게 Optimistic, Pessimistic 두 종류로 나뉜다고 설명을 했습니다. 이번 글에서는 Pessimistic Concurrency Protocol중 하나인 Two Phase Locking(2PL)에 대해서 알아보겠습니다.

Locks

먼저 Lock의 종류에 대해서 알아봅시다.

S-LOCK: Shared locks for reads.
X-LOCK: Exclusive locks for writes.

당연한 이야기겠지만 하나의 객체에 대해 동시에 여러개의 Shared Locks가 존재할수는 있지만 Exclusive Lock 생기게 되는 순간 다른 객체들은 접근이 되지 않고 그러므로 Lock또한 추가적으로 발급될수 없습니다.
일반적으로 Lock들을 관리하는 Lock Manager들이 존재하면 트랜스액션은 Lock Manager들한테 필요한 Lock을 받아서 진행한후 operation들이 끝나면서 Lock을 반납하는 방식으로 이루어집니다.

2PL

먼저 2PL의 정의를 보자면
Two-phase locking (2PL) is a concurrency control
protocol that determines whether a txn can access
an object in the database on the fly.
The protocol does not need to know all the queries
that a txn will execute ahead of time.

2PL은 두 단계로 이루어져 있고 이는 다음과 같습니다.

Phase #1: Growing

Each txn requests the locks that it needs from the DBMS’s lock manager.
The lock manager grants/denies lock requests.

Phase #2: Shrinking

The txn is allowed to only release locks that it previously acquired. It cannot acquire new locks.

The txn is not allowed to acquire/upgrade locks after the growing phase finishes.
즉, 2PL의 핵심은 Lock이 꾸준히 증가하는 Growing 구간과 Lock이 감소만 하는 Shrinking 구간으로만 나뉜다는 점입니다.

직관적으로 이해하기는 어렵지만 2PL on its own is sufficient to guarantee conflict serializability. It generates schedules whose precedence graph is acyclic. 이를 조금 더 생각을 해보면 당연한게 여러개의 트랜스액션이 존재할때 Conflict들은 여러 개의 트랜스액션이 하나의 객체에 대해서 동시에 접근을 하게 되면서 생기게 되는데 각각의 트랜스액션이 Lock을 일방적으로 증가시키게 되면 같은 객체에 접근하려는 트랜스액션은 기다릴수 밖에 없게 됩니다.

2PL 단점

There are potential schedules that are serializable but would not be allowed by 2PL.
→ Locking limits concurrency.
Cascading Abort && May still have "dirty reads".
→ Solution: Strong Strict 2PL (aka Rigorous 2PL)
May lead to deadlocks.
→ Solution: Detection or Prevention

1번은 Locking의 고질적인 단점이어서 해결하기 어렵지만 2번과 3번 같은 경우 해결책이 존재하고(물론 Tradeoff는 존재!)이를 뒤에서 알아보도록 하겠습니다.

Strong Strict 2PL

Definition: A schedule is strict if a value written by a txn is not read or overwritten by other txns until that txn finishes.

Advantages:
1. Does not incur cascading aborts.
2. Aborted txns can be undone by just restoring original values of modified tuples.

SS2PL releases all the locks at the end of tx. SS2PL Allows only conflict serializable schedules, but it is often stronger than needed for some apps.

SS2PL의 핵심은 "내가 Exclusive Lock 가지고 있으니까 내 트랜스액션 끝날때까지 접근 자체(읽는것도) 하지마. 내 Lock들은 트랜스액션 끝날때까지 한번에 release 할꺼야" 입니다. 그렇기 때문에 당연히(?) dirty read는 방지됩니다.
당연히 Stronger guarantee기 때문에 concurrency는 더 제한되기 때문에 상황에 맞게 사용을 해야 합니다.

Deadlocks

데드락의 정의는: A deadlock is a cycle of transactions waiting for
locks to be released by each other.

Two ways of dealing with deadlocks:

Deadlock Detection
Deadlock Prevention

Deadlock Detection

The DBMS creates a waits-for graph to keep track of what locks each txn is waiting to acquire:
1. Nodes are transactions
2. Edge from Ti to Tj if Ti is waiting for Tj to release a lock.

The system periodically checks for cycles in waitsfor graph and then decides how to break it.

When the DBMS detects a deadlock, it will select a
"victim" txn to rollback to break the cycle.
The victim txn will either restart or abort(more
common) depending on how it was invoked.
There is a trade-off between the frequency of
checking for deadlocks and how long txns have to
wait before deadlocks are broken.

Deadlock Prevention

When a txn tries to acquire a lock that is held by
another txn, the DBMS kills one of them to
prevent a deadlock.
This approach does not require a waits-for graph
or detection algorithm.

먼저 Assign priorities based on timestamps: Older Timestamp = Higher Priority (e.g., T1 > T2)

Wait-Die ("Old Waits for Young")
1. If requesting txn has higher priority than holding txn, then
requesting txn waits for holding txn.
2. Otherwise requesting txn aborts.
Wound-Wait ("Young Waits for Old")
1. If requesting txn has higher priority than holding txn, then
holding txn aborts and releases lock.
2. Otherwise requesting txn waits.

일반적으로 deadlock prevention은 작은 database에서 주로 사용되고 큰 database 같은 경우에는 deadlock detection을 사용한다! (PostgreSQL 같은 경우에는 deadlock_timeout 이 있어서 해당 시간안에 lock을 얻지 못하면 deadlock detection이 실행됩니다)

Lock Hierarchy

An intention lock allows a higher level node to
be locked in shared or exclusive mode without
having to check all descendent nodes.
If a node is in an intention mode, then explicit
locking is being done at a lower level in the tree.

Intention-Shared (IS): Indicates explicit locking at a lower level with shared
locks.
Intention-Exclusive (IX): Indicates locking at lower level with exclusive or shared
locks.
Shared+Intention-Exclusive (SIX)
→ The subtree rooted by that node is locked explicitly in
shared mode and explicit locking is being done at a
lower level with exclusive-mode locks.

Each txn obtains appropriate lock at highest level of the database hierarchy.
To get S or IS lock on a node, the txn must hold at least IS on parent node.
To get X, IX, or SIX on a node, must hold at least IX on parent node.

Compatibility Matrix

사실 이런 Lock Hierarchy는 Locking 최적화를 위해 만들어진 방법이기 때문에 실제 DB 들에서는 이보다 더 다양한 종류의 Lock들이 존재합니다. 추가로 파보고 싶다면 각 db의 문서를 찾아보면 됩니다!