Multivalued Dependencies
다취종속
Suppose we record names of children, and phone numbers for instructors:
inst_child(ID, child_name)
inst_phone(ID, phone_number)
If we were to combine these schemas to get
inst_info(ID, child_name, phone_number)
Example data:
(99999, David, 512-555-1234)
(99999, David, 512-555-4321)
(99999, William, 512-555-1234)
(99999, William, 512-555-4321)
This relation is in BCNF
Why?
Multivalued Dependencies (MVDs)
Let R be a relation schema and let R and R. The multivalued dependency
holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in r such that:
t1[] = t2 [] = t3 [] = t4 []
t1[] = t2 []
t3[] = t4[]
t1[R – ] = t3[R – ]
t2[R – ] = t4[R – ]
강사가 여러가지 핸드폰을 가질 수 있다.
ID가 모두 결정하진 못하지만 어느정도 결정해서 화살표 두개 씀
a ->-> b
Example
ppt 참고
다취종속에서는 모든 교차집합을 만들어 anomally를 검증해야 한다?
In our example:
ID child_name
ID phone_number
The above formal definition is supposed to formalize the notion that given a particular value of Y (ID)
It has associated with a set of values of Z (child_name) and a set of values of W (phone_number), and these two sets are in some sense independent of each other.
formalize = 일반적으로 표현할 수 있다. 수학적 표기 가능?
Use of Multivalued Dependencies
We use multivalued dependencies in two ways:
To test relations to determine whether they are legal under a given set of functional and multivalued dependencies
To specify constraints on the set of legal relations. We shall thus concern ourselves only with relations that satisfy a given set of functional and multivalued dependencies.
테스트 릴레이션. 규칙에 맞고 함수종속, 다취종속을 만족하는지 체크
제약들을 specify?하기 위해
Theory of MVDs
From the definition of multivalued dependency, we can derive the following rule:
If , then
That is, every functional dependency is also a multivalued dependency
The closure D+ of D is the set of all functional and multivalued dependencies logically implied by D.
We can compute D+ from D, using the formal definitions of functional dependencies and multivalued dependencies.
함수종속은 다취종속에 포함된다.
X -> Y, Y->-> Z일 경우 X ->-> Z가 된다.
Fourth Normal Form
A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form , where R and R, at least one of the following hold:
is trivial (i.e., or = R)
is a superkey for schema R
If a relation is in 4NF it is in BCNF
정의는 BCNF와 비슷함.
Example
R =(A, B, C, G, H, I)
F ={ A B, B HI, CG H }
R is not in 4NF since A B and A is not a superkey for R
Decomposition
R1 = (A, B) (R1 is in 4NF)
R2 = (A, C, G, H, I) (R2 is not in 4NF, decompose into R3 and R4)
R3 = (C, G, H) (R3 is in 4NF)
R4 = (A, C, G, I) (R4 is not in 4NF, decompose into R5 and R6)
A B and B HI A HI, (MVD transitivity), and
and hence A I (MVD decomposition to R4)
R5 = (A, I) (R5 is in 4NF)
R6 = (A, C, G) (R6 is in 4NF)
Overall Database Design Process
We have assumed schema R is given
R could have been generated when converting E-R diagram to a set of tables.
R could have been a single relation containing all attributes that are of interest (called universal relation).
Normalization breaks R into smaller relations.
ER Model and Normalization
When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not need further normalization.
However, in a real (imperfect) design, there can be functional dependencies from non-key attributes of an entity to other attributes of the entity
Example: an employee entity with attributes
department_name and building,
and a functional dependency
department_name building
Good design would have made department an entity
Denormalization for Performance
May want to use non-normalized schema for performance
For example, displaying prereqs(선수과목) along with course_id and title requires join of course with prereq
Alternative 1: Use denormalized relation containing attributes of course as well as prereq with all above attributes
faster lookup
extra space and extra execution time for updates
extra coding work for programmer and possibility of error in extra code
Alternative 2: use a materialized view defined as
course prereq
Benefits and drawbacks same as above, except no extra coding work for programmer and avoids possible errors
Other Design Issues
Some aspects of database design are not caught by normalization
Examples of bad database design, to be avoided:
Instead of earnings (company_id, year, amount ), if we use
Multiple relations, earnings_2004, earnings_2005, earnings_2006, etc., all on the schema (company_id, earnings).
Above are in BCNF, but make querying across years difficult and needs new table each year
company_year (company_id, earnings_2004, earnings_2005,
earnings_2006)
Also in BCNF, but also makes querying across years difficult and requires new attribute each year.
Used in spreadsheets, and in data analysis tools