Normalization[5]

임승섭·2023년 5월 19일

Database system

목록 보기

17/22

BCNF

모든 functional dependency å -> ß에 대해

trivial (ß ⊂ å)
å가 superkey

Testing for BCNF

어떤 non-trivial한 dependency å->ß가 BCNF를 violate하는지 체크하기 위해서는

compute å+ (the attribute closure of å)
and
verify that it includes all attributes of R
(that is, it is superkey of R)

Simplified Test

relation R이 BCNF인지 확인하기 위해서,
F에 있는 given dependency들만 BCNF를 violate하는지 체크하면 된다.
굳이 F+에 있는 것들까지 체크할 필요는 없다.
If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.

However,

simplified test using only F
is incorrect when testing a relation in a decomposition of R
이럴 때는 F+ 를 요구된다.
하지만 계속 나왔듯이 F+는 exponential time이 걸리기 때문에 비효율적이다

Example

Consider R = (A, B, C, D, E)
with F = { A->B, BC->D}
Decompose R into R1 = (A, B) and R2 = (A, C, D, E)

R1은 BCNF이다 (A가 superkey니까?)
R2에는 적용할 수 있는 dependency가 없다.
그렇다면 trivial한 것 밖에 없다.
그러니까 BCNF이구나
라고 하는 게 incorrect

F closure에 있는 dependency에 대해 검사해보자.
A->B => AC->BC + BC->D => AC->D
(AC)+ = ACD이므로 AC는 NOT superkey
=> AC->D is nontrivial dependency
따라서 R2는 BCNF가 아니다

Testing Decomposition for BCNF

R의 decomposition중 Ri가 BCNF를 만족하는지 확인하기 위해서는

test Ri for BCNF w.r.t the restriction of F+ to Ri
(that is, all FDs in F+ that contain only attributes from Ri)
or
Use the original set dependencies F that hold on R, but with the following test :
- For every set of attributes å ⊂ Ri,
  check that å+
  either includes no attribute of (Ri - å),
  or includes all attributes of Ri
- 이 둘 중 하나에 걸리면 Ri는 BCNF이다
  
  시간 면으로 생각해보면,
  å는 Ri의 부분집합이다. 따라서 개수는 $2^{|R_i|}$
  만약 F+ 가지고 하려면 $2^{|F|} * 2^{|F|} = 2^{2|F|}$ . 얘보단 낫다.
  첫 번째 조건은 결국
  $å^+ ∩ (R_i - å) = ∅$ 을 의미하는데, 이건
  1. å에서 어디로 가는 어떠한 dependency도 Ri에 없다
  2. å가 Ri의 superkey이다
    를 의미한다

만약 이 조건이 어떤 å 때문에 안된다면,
å -> (å+ - å) ∩ Ri
can be shown to hold on Ri(???),
어쨌든 얘가 Ri violates BCNF

Example

아까 했던 예시를 사용해보자

R2 = (A, C, D, E), F = {A->B, BC->D}
R2의 subset å들을 다 찾으려면 $2^4 = 16$ 번에 대해 찾아야 한다.
여기서 약간의 overhead가 있지만, F+ 찾고 있는 것 보다는 낫다
å = {A, C}라고 하면,
(AC)+ = ABCD이고,
이건 조건 두개를 모두 만족시키지 못한다
따라서 AC -> D가 R2가 BCNF 못되게 하는 nontrivial한 dependency이다

BCNF Decomposition Algorithm

result  = {R}	// 초기값은 R (-> BCNF로 분할하겠다)
done = false;

while (not done) do
	if (there is a schema Ri in result that is NOT in BCNF)		// 이거 확인하는 건 이전 슬라이드 참고
    then begin
          let å–>ß be a nontrivial functional dependency that holds on Ri
              such that å does not contain Ri and å ∩ ß = ∅
          result = (result - Ri) ∪ (Ri - ß) ∪ (å, ß);
		 end
     else done = true

Note : each Ri is in BCNF, and decomposition is lossless-join

Example

class(~~course_id~~, title, dept_name, credits, ~~sec_id~~, ~~semester~~, ~~year~~, building, room_number, capacity, time_slot_id)
Functional dependencies :
- course_id -> title, dept_name, credits
- building, room_number -> capacity
- course_id, sec_id, semester, year -> building, room_number, time_slot_id
Candidate key = {~~course_id~~, ~~sec_id~~, ~~semester~~, ~~year~~}
BCNF Decomposition 시작

course_id -> title, dept_name, credits 에서,
course_id는 superkey가 아니다
=> 얘가 BCNF 아닌 원인 dependency이다.

result = ∅ ∪ {coure_id, sec_id, semester, year, building, room_number, capacity, time_slot_id} ∪ {course_id, title, dept_name, credits

We replace class by
- course(course_id, title, dept_name, credits)
- class-1(course_id, sec_id, semester, year, building, room_number, capacity, time_slot_id)

course is in BCNF

이걸 어떻게 알까?
모든 subset에 대해 test를 진행해야 할까?
음.. 여기에 있는 dependency에 대해 å+를 구하고, 그게 모두를 포함하고 있으면 넘어가는 것 같다 일단?...

class-1 is NOT in BCNF
building, room_number -> capacity에서,
{building, room_number} is NOT a superkey for class-1

We replace class-1 by

classroom(building, room_number, capacity)
section(course_id, sec_id, semester, year, building, room_number, time_slot_id)

classroom and section are in BCNF

3NF

BCNF의 장단점
- 장 : strong한 정규화
- 단 : dependency preserving이 안될 수도 있다. dependency check를 하기 위해서는 반드시 join이 필요하고, 그럼 update할 때마다 join을 해야 해서 overhead가 발생한다

Solution : 조금 더 약한 normal form을 만든다. 3NF
- Allows some redundancy
- 각각의 relation에 대해 join 없이도 dependency check가 가능하다
- There is always a lossless-join, dependency preserving decomposition into 3NF

Example
- R = dept_advisor(s_ID ,i_ID, dept_name)
  F = {s_ID, dept_name -> i_ID, i_ID -> dept_name}
  Candidate key = {s_ID, dept_name} and {i_ID, s_ID}
- R is in 3NF
  - s_ID, dept_name -> i_ID
    - (s_ID, dept_name)이 superkey이다
  - i_ID -> dept_name
    - ß - å = dept_name이 candidate key에 포함되어 있다

Testing for 3NF

F에 있는 dependency들만 check하면 된다.
Need not to check all FDs in F+
å가 superkey라면, 각 dependency å->ß를 check할 때, attribute closure를 이용한다

å가 superkey가 아니라면, ß에 있는 each attribute가 candidate key에 포함되어 있는지 verify해야 한다.
- 이 testsms cadidate key를 찾아야 하기 때문에 좀 expensive하다
- Testing for 3NF has been shown to be NP-hard
- 신기하게도, decomposition into 3NF는 polynomial time에 실행될 수 있다.

3NF Decomposition Algorithm

Let Fc be a canonical cover for F;
i = 0;

for each functional dependency å -> ß in Fc	do
	begin 
    	i++;
        Ri = åß
    end

if none of the schemas Rj, 1 ≤ j ≤ i contains a candidate key for R
	then begin
    	i++;
        Ri = any candidate key for R
    end
repeat

if any schema Rj is contained in another shcema Rk
	then 
    	Rj = Ri
        i--;

return (R1, R2, ..., Ri)

각각의 Ri는 3NF이다
Decomposition은 dependency preserving하고, lossless-join하다
이에 대한 증명은 넘어가자

Example

Relation schema :
cust_banker_branch = (~~customer_id~~, ~~employee_id~~, branch_name, type)

functional dependency :
- customer_id, employee_id -> branch_name, type
- employee_id -> branch_name
- customer_id, branch_name -> employee_id

compute the canonical cover

branch_name is extraneous in first depenency
- (right side case)
- F' = {
  customer_id, employee_id -> type,
  employee_id -> branch_name,
  customer_id, branch_name -> employee_id }
- (customer_id, employee_id)+
  = customer_id, employee_id, type, branch_name
- done
No other attribute is extraneous

Go to for loop
then
R1 = (customer_id, employee_id, type)
R2 = (employee_id, branch_name)
R3 = (customer_id, branch_name, employee_id)

모두 candidate key를 포함하고 있기 때문에 추가적인 schema(candidate key)를 만들 필요는 없다

R2는 R3에 모두 포함되어 있기 때문에 제거한다
The resultant simplified 3NF schema is :
(customer_id, employee_id, type)
(cumtomer_id, branch_name, employee_id)

BCNF vs. 3NF

It is always possible to decompose a relation
into a set of relation that are in 3NF such that :
- The decomposition is lossless
- The dependencies are preserved

It is alway possible to decompose a relation
into a set of relation that are in BCNF such that :
- The decomposition is lossless
- It may not be possible to preservee dependencies

Design Goals

Goal for a relational database design is :
- BCNF
- Lossless join
- Dependency preservation

만약 이렇게 하기가 어렵다면, 다음 중 하나까지는 accept한다
- Lack of depenency preservation (BCNF를 유지하거나)
- Redundancy due to use of 3NF (dependency preserving을 유지)

임승섭

이전 포스트

Normalization[4]

다음 포스트

Normalization[5]

Database system

BCNF

Testing for BCNF

Example

Testing Decomposition for BCNF

Example

BCNF Decomposition Algorithm

Example

3NF

Testing for 3NF

3NF Decomposition Algorithm

Example

BCNF vs. 3NF

Design Goals

Normalization[4]

Normalization[6]

0개의 댓글

관련 채용 정보