[ DB ] 07. Normalization

38AΒ·2023λ…„ 4μ›” 25일
1

Database System

λͺ©λ‘ 보기
7/10
post-thumbnail

πŸ–₯️ Motivating example

  • Data is jumbled(λ’€μ„žμΈ) ➑️ create duplicates by the design
    • Repetition of data β†’ increases the size of database
    • Data consistency issues ( data anomaly )
      • Insertion anomaly: Inserting redundant data for every new record
      • Deletion anomaly: Loss of related data, when some data is deleted
      • Update anomaly: When updating certain information, every single record must be updated
  • How to avoid the repetition of information problem?
    • Decompose it into two schemas
      • Normalization = decomposition of relational schemas
      • Key idea: split relational schemas such that only directly related data composes a relation

Lossy Decomposition

  • Less redundancy β†’ Uses smaller disk storage;
    Causes less issues associated with insertion, deletion, and update anomalies
  • Lossy decomposition: a decomposition from which the original relation cannot be reconstructed➑️ result in more records
    ➑️ lost the connectivity between data components ➑️ "lossy"

Lossless Decomposition

  • A decomposition is a lossless decomposition if there is no loss of information by replacing R with the two relation schemas R1_1 U R2_2
    • Formally, ∏R1(r)β‹ˆβˆR2(r)=r∏_{R_1}(r) β‹ˆ ∏_{R_2}(r) = r
    • cf. lossy decomposition rβŠ‚βˆR1(r)β‹ˆβˆR2(r)r \subset ∏_{R_1}(r) β‹ˆ ∏_{R_2}(r)
      ➑️ create something more than the original r

Normalization

  • Database normalization: Process of structuring a database to reduce data redundancy and improve data integrity
    • Through the process ( benefit ):
      • One can decompose relations to suppress(μ–΅μ œ) data anomalies
      • One can make sure the decomposition is lossless

πŸ–₯️ Normal forms

  • Normalization process
    • Normalization is a database design technique, which is used to design a
      relational database table up to higher normal form
      - Procedurally separates logically independent (but related) data entities into multiple relations
      - Maintains(μœ μ§€) the connections using keys
  • Progressive(단계적) process
    • A higher level of database normalization cannot be achieved unless the previous levels have been satisfied
    • 1NF, 2NF, 3NF, BCNF, 4NF ...
  • Normal forms are backed by a set of normalization theories
    • Functional dependencies
      • Partial dependencies
      • Transitive dependencies
    • Multi-valued dependencies
    • These theories decide whether a particular relation R is in β€œgood form”

First Normal Form (1NF)

  • 1NF checklist
    1. Each column should contain an atomic value
      • Atomic - INT, FLOAT, DOUBLE, DECIMAL (NUMERIC), CHAR, VARCHAR, BLOB, TEXT
      • NOGO - Structure, List (array)
    2. Each column should contain values that are in the same data domain(type)
      • Do not mix different types of values in a column
    3. Each column should have a unique name(identifier)
    4. The order in which data is stored does not matter
    5. There are no duplicated rows in the table
      • Primary key (PK) ensures:
        • Attributes that are part of PK are unique, not null
  • Functional Dependencies
    • Relations require that the value for a certain set of attributes determines uniquely the value for another set of attributes
      - A functional dependency is a generalization of the notion of a key (= trivial dependency)
      - Ex_
      ID β†’ instructor
      ID β†’ student
      name β†’ student (X)
      dept_name β†’ instructot (X)
      - EX2_
      AB
      14
      15
      37
      ➑️ B β†’ A hold( Fuctional Dependendies ); A β†’ B does NOT hold (only the values of B are unique)
  • Ex_
    in_dep (ID, name, salary, dept_name, building, budget )
    • hold:
      • dept_name β†’ building
      • ID β†’ building
      • dept_name and ID are super keys = candidate keys β†’ ID, dept_name, {ID, dept_name}
    • does not hold:
      • dept_name β†’ salary
  • Ex_
  • Ex2_

Second Normal Form (2NF)

  • A relation should NOT have a non-PK that is functionally dependent on any subset of any candidate key = NO PARTIAL DEPENDENCIES!

  • Ex_

  • Ex2_

Third Normal Form (3NF)

  • A relation should NOT have transitive dependencies
  • Ex_

Boyce-Codd Normal Form (BCNF) = 3.5NF

  • For any dependency A β†’ B, A should be a super key

  • BCNF is a subset of 3NF ( BCNF is more strict than 3NF )

  • Ex_

Fourth Normal Form (4NF)

  • A relation should NOT have multi-valued dependency

    • Multi-valued dependency occurs when a relation has more than 3 attributes
    • having dependency, A β†’ B, and A β†’ C
    • B and C are independent from each other ➑️ multi-valued dependency
  • Ex_

Summary

Denormalization for Performance

  • Alternative 1: Use denormalized relation
    • faster lookup
    • extra space and extra execution time for updates
    • extra coding work for programmer and possibility of error in extra code
  • Alternative 2: Use a materialized view ( view stored in Disk )

Remaining Issues

  • normalization으둜 ν•΄κ²°λ˜μ§€ μ•ŠλŠ” 문제
  • Ex_ ( to be avoided )
    • earnings_2004, earnings_2005, earnings_2006,... all on the schema (company_id, earnings)
      • Above are well normalized (in BCNF), but make querying across years difficult and needs new table each year
    • company_year (company_id, earnings_2004, earnings_2005, earnings_2006)
      • Above are well normalized (in BCNF), but makes querying across years difficult and requires new attribute each year
    • This is an example of a crosstab
      ➑️ Better schema: earnings (company_id, year, amount)

πŸ–₯️ Normalization example

Example1

Example2

Example3

HGU μ „μ‚°μ „μžκ³΅ν•™λΆ€ 홍참길 κ΅μˆ˜λ‹˜μ˜ 23-1 Database System μˆ˜μ—…μ„ λ“£κ³  μž‘μ„±ν•œ 포슀트이며, μ²¨λΆ€ν•œ λͺ¨λ“  사진은 κ΅μˆ˜λ‹˜ μˆ˜μ—… PPT의 사진 원본에 ν•„κΈ°λ₯Ό ν•œ μˆ˜μ •λ³Έμž…λ‹ˆλ‹€.

profile
HGU - 개인 곡뢀 기둝용 λΈ”λ‘œκ·Έ

0개의 λŒ“κΈ€