Normalization[6]

임승섭·2023년 5월 19일

Database system

목록 보기

18/22

Multivalued Dependency (MVD)

instructor의 children name, phone number를 위한 스키마를 만든다고 하자
- inst_child(ID, child_name)
- inst_phone(ID, phone_number)
- 이 두 스키마는 BCNF이고(trivial한 것 밖에 없기 때문), 중복도 없다.

만약 이 둘을 합치면
- inst_info(ID, child_name, phone_number)
- example data :
  (99999, David, 1234)
  (99999, William, 4321)
  (99999, Wiliam, 1234)
  (99999, David, 4321)
- 근데 이 스키마도 BCNF이다
  - ID -> c_name 은 ID가 같은데 name이 다른 게 있으므로 존재할 수 없다
  - 결국 dependency를 찾아보면 trivial한 dependency밖에 존재하지 않는다. => BCNF

그렇다면 아래에 있는 중복있는 BCNF를 어떻게 위처럼 깔끔하게 바꿀 수 있을까?
Multivalued Dependency

Multivalued dependency

Let R be a relation schema and let å ⊂ R and ß ⊂ R.
The multivalued dependency å->->ß holds on R if
in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[å] = t2[å], there exist tuples t3 and t4 in r such that :

t1[å] = t2[å] = t3[å] = t4[å]
t3[ß] = t1[ß]
t3[R - ß] = t2[R - ß]
t4[ß] = t2[ß]
t4[R - ß] = t1[R - ß]
으 읽기 어려우니까 수식으로 외우려고 하진 말자

위에 instructor info에 적용해서 생각하면
å = ID, ß = child_name, R - ß - å = phone_num
t1 = (99999, David, 1234)
t2 = (99999, Wiliam, 4321)
t3 = (99999, David, 4321)
t4 = (99999, William, 1234)
(필기한 내용)
원래 tuple들(t1, t2)만 있을 때는 child_name에서 phone_num으로 가는 dependency가 있을 수도 있어보이는데,
새 tuple들(t3, t4)를 만듦으로써 그 dependency가 없도록 원천 봉쇄를 시켜버렸다.

Let R be a relation schema with a set of attributes
that are partitioned into 3 nonempty subsets
Y, Z, W (각각 nonempty set of attributes)

We say that Y ->-> Z (Y multidetermines Z)
if and only if
for all possible relations r(R),
<y1, z1, w1> ∈ r and <y1, z2, w2> ∈ r
then
<y1, z1, w2> ∈ r and <y1, z2, w1> ∈ r

Note that since the behavior of Z and W are identical
it follows that
Y ->-> Z if Y ->->W

Example

처음에 들었던 예제를 그대로 가보자
ID ->-> child_name
ID ->-> phone_number

위 식은 다음을 의미한다 :
given a particular value of Y(ID),
it has associated with a set of values of Z(child_name)
and a set of values of W(phone_number),
and these two sets are in some sense independent of each other

Note :
If Y -> Z then Y ->-> Z (모든 FD는 MD)

R(A, B, C)에 대해 A ->-> B 이고,
기본 tuple이 (a1, b1, c1), (a2, b2, c2), (a3, b3, c3)라면,

추가로 있어야 하는 tuple을 찾아보자
1-2
(a1, b2, c1) (a1, b1, c2)
1-3
(a1, b3, c1) (a1, b1, c3)
2-3
(a2, b3, c2) (a2, b2, c3)

추가로 6개의 tuple이 존재한다

Use of MVD

in two ways :

주어진 FD 또는 MVD 집합에서 relation들이 legal한지 test한다
legal realtion의 집합에서 제약조건(constraints)을 specify한다.
We shall concern ourselves only with relations that satisfy a given set of functional and multivalued dependencies (???)

만약 어떤 relation r이 multivalued dependency를 만족하는 걸 fail했다면, tuple들을 r에 추가하여 dependency를 만족하는 새로운 relation r'을 construct할 수 있다.

Theory of MVD

If å -> ß, then å ->-> ß
That is, every functional dependency is also a multivalued dependency
the set of all functional and multivalued dependencies = D.
We can compute D+ (closure) using the formal definitions of functional and multivalued dependencies

Fourth Normal Form

A relation schema R is in 4NF w.r.t
a set D of functional and multivalued dependencies
if
for all multivalued dependencies in D+ of the form å ->-> ß, where å ⊂ R and ß ⊂ R, at least one of the following hold :

å ->-> ß is trivial (i.e. ß ⊂ å)
å is a superkey for schema R

If a relation is in 4NF, it is in BCNF

Restriction of MVD

똑같다 그냥
The restriction of D of Ri = Di consisting of

All FDs in D+ that include only attributes of Ri
All MVDs of the form å ->-> (ß ∩ Ri)
where å ⊂ Ri and å ->-> ß is in D+

4NF Decomposition Algorithm

BCNF와 굉장히 유사하다

result = {R}
done = false
compute D+		// D closure를 먼저 구해라
Let Di denote the restriction of D+ to Ri

while (not done)
	if (there is a schema Ri in result that is NOT in 4NF) then
    	begin
        	let å ->-> ß be a nontrivial MVD that holds on Ri such that
             å -> Ri in NOT in Di, and å ∩ ß = ∅;
            result = (result - Ri) ∪  (Ri - ß) ∪ (å, ß);
        end
    else done = true

Each Ri is in 4NF, and decomposition is lossless-join

Example

R = (A, B, C, G, H, I)
D = {
A->->B
B->->HI
CG->->H } // FD랑 MVD랑 같이 있는데 그냥 MD 형식으로 적는다
R is NOT in 4NF since A->->B and A in NOT a superkey for R.
(A+ = ABHI)
Decomposition 시작

A->->B에 대해서
Ri - ß = (A, C, G, H, I) -> NOT 4NF
(since CG->->H. CH가 NOT superkey)
(å, ß) = (A, B) -> 4NF
CG->->H에 대해서
Ri - ß = (A, C, G, I) -> NOT 4NF
(since A->I (transitivity + restriction) )
(å, ß) = (C, G, H) -> 4NF
A->I에 대해서
Ri - ß = (A, C, G) -> 4NF
(å, ß) = (A, I) -> 4NF

Overall Database Design Process

4NF < BCNF < 3NF < 1NF
-> true
<- NOT always true

ER diagram을 set of tables로 전환할 때 schema R이 만들어진다
R은 단순히 모든 속성을 포함한 하나의 relation이 될 수도 있다
(universal relation)
Normalization을 통해 R을 작은 relation들로 쪼갤 수 있다.

E-R Model and Normalization

만약 E-R 모델을 이쁘게 잘 만들었으면, E-R 모델로부터 만들어진 table은 더이상 정규화가 필요 없을 것이다

하지만 실제 design에서는 그 entity의 non-key attribute에서 다른 attribute로 가는 functional dependency가 존재할 수도 있다(???)
- Example : an employee entity with
  - attributes : dept_name and building
  - FD : dept_name -> building
    Good design would have made department an entity
non-key attribute에서 가는 FD는 가능하지만, rare하다
- most relationships are binary

ER에서는 entity에 대해서는 schema가 나오지만, relationship set에 대해서는 schema가 나올 수도 있고 나오지 않을 수도 있다. 나오는 schema에 대해서는, non-trivial(non-key)한 FD가 나오는 것이 굉장히 rare하다

non-key를 non-trivial로 이해하면 될듯

Denormalization for Performance

성능을 위해 정규화를 희생한다

양 극단
1. 싹 다 1개의 table로 만든다
- update 시 일관성이 깨질 수 있다
- 검색 속도는 good
2. 다 column 2개짜리 table로 쪼개
- 관리는 good
- 검색 시 join이 계속 필요하다
3. 그래서 이 중간 지점을 정규화를 통해 찾으려 하는 것.

그런데 만약, 사용자가 질의를 정말 많이 하는 걸 계속 join해서 찾아야 하는 상황이면, 차라리 정규화를 희생하는 게 더 나을 수 있다

Example

Displaying prereqs along with course_id and title requires join of course with prereq

방법 1 : Use denormalized relation containing attributes of course as well as prereq with all above attributes
- 장점 : faster lookup
- 단점 : 정보의 중복이 발생하기 때문에 update할 때 추가적인 공간과 실행 시간이 필요하다
  또한, 일관성을 계속 체크해야 하기 때문에 programmer가 추가적인 코딩을 해야 한다

방법 2 : Use a materialized view defined by a course⋈prereq
- 장단점은 방법 1과 동일하나, 일관성을 유지하는 건 프로그래머의 역할이 아니라 DB 엔진이 해준다.

Other Design Issue

BCNF라고 항상 좋다고 할 수는 없다

예쁜 거 : earnings(company_id, year, amount)
이상한거 1 : earnings_2004(company_id, earnings), earnings_2005(company_id, earnings), ...
이상한거 2 : company_year(company_id, earnings_2004, earnings_2005, earnings_2006)

위에 있는 이상한 것들은 BCNF이지만, 절대 좋은 relation은 아니다
1은 매 해가 지날수록 table을 추가로 만들어야 하고
2는 속성을 추가해야 한다(crosstab이라고 부른다 : values for one attribute becomes column name).

임승섭

이전 포스트

Normalization[5]

다음 포스트

Normalization[6]

Database system

Multivalued Dependency (MVD)

Multivalued dependency

Example

Use of MVD

Theory of MVD

Fourth Normal Form

Restriction of MVD

4NF Decomposition Algorithm

Example

Overall Database Design Process

E-R Model and Normalization

Denormalization for Performance

Example

Other Design Issue

Normalization[5]

Data Storage Structures

0개의 댓글

관련 채용 정보