[DB] Disk Storage, Basic File Structures

SUbbb·2021년 12월 1일

Database

DataBase

목록 보기

13/15

Introduction

Databases는 일반적으로 magnetic disks에 저장
물리적인 database design은 특정 data organization techniques를 선택하는 것도 포함한다.

순서대로 저장, 정렬 저장, 트리구조 저장
application의 요구에 맞춰 선택

Storage Hierarchy

Primary storage: CPU에 의해 동작
- CPU cache memory, RAM, mainboard $\rarr$ 휘발성(volatile) 존재, 빠른 속도, 크지 않은 용량
Secondary storage: non-volatile(비휘발성), primary보다 싸다.
- HDD, flash memory, SSDs
Tertiary storage: 가장 싸지만, 제일 느림
- tapes, optical disks

Storage Organization of Databases

Persistent data (nonvolatile)

대부분의 databases는 장기간에 걸쳐 유지되는 대량의 데이터를 저장
secondary storage(magnetic and/or SSD disks)에 저장, 이유는?
- main memory에 DBs 전체를 저장하기에는 턱없이 모자란 용량
- Nonvolatile, 전원이 꺼져도 저장 유지
- primary storage보다 10배는 싼 가격
처리할 데이터만 RAM에 load하여 처리하고, disk에 update
disk에 저장되는 데이터는 records의 files로서 저장
- 각 record는 ER의 사실을 표현하는 data values의 collection
- records는 효율적인 접근을 위해서 잘 위치해야 한다.

Transient data (volatile)
persistent data와 반대

Primary file organizations는 응용 프로그램에 맞춰서 선택

disk에 file records가 물리적으로 어떻게 위치할지 결정
어떻게 records에 접근할지 결정

Heap file(unordered file): 순서 없이 제일 마지막에 있는 file 뒤에 저장
Sorted file(sequential file): sort key field를 이용하여 record를 정렬하여 저장 (유지 보수 비용이 발생)
Hashed file: hash key field에 적용되는 hash function를 사용하여 disk에 저장할 위치 결정 (hash value에 따라 서로 다른 위치에 저장되어 같은 key값을 갖는 tuple끼리끼리 묶여서 저장됨)
Tree-structured file(B-trees)
Column-based file: column 단위로 record 저장 (row_id 정보도 같이 저장)

Secondary organization (or auxiliary(예비) access structure)

primary file organization에 추가적으로, file record에 효율적인 접근을 위해 alternate field를 기반으로 사용
indexes (index file 접근 $\rarr$ primary file 접근)

SECONDARY STORAGE DEVICES

Disk Storage: Nonvolatile, rotating magnetic storage

Disk: random access addressable device (block 단위로 접근 및 처리 = block addressable device $\longleftrightarrow$ byte addressable(RAM))
- RAM과 disk blocks의 unit으로 data를 주고 받는다.
- block(or sector) address
  - cylinder number, track number,(platter에) block number(track에)(or sector number(block에))
- block address는 disk I/O controller에 제공
- 대부분의 최신 disk drives는 controller에 의해 LBA(Logical Block Address)가 오른쪽 block에 자동으로 매핑된다.
- Disk read
  - 처리할 disk block의 LBA가 disk drive buffer(cache)에 복사
  - cache에 이미 해당 LBA가 있다면 copy X
- Disk write
  - buffer(main memory buffer)의 contents가 LBA와 함께 disk block로 복사됨
- Disk read와 write는 Synchronous하여 성능이 같지만, flash의 경우는 두 경우의 성능에 차이가 있다.

Disk Storage: Internal Architecture

Disk는 magnetic platters의 stack
각 platter의 표면은 여러 개의 tracks로 구성
Tracks는 sectors로 나눠진다.
- Sector: storage의 기본 unit
  - Sector ID, data (일반적으로 512bytes), Error-correction code, sync fields, or gaps
Block (or page) : sectors의 그룹 (read/write의 unit), 일반적으로 4Kbytes (= 8Sectors)
disk heads는 spindle이 원하는 sector를 찾기 위해 회전하는 동안 (동일한 cylinder내에서) 원하는 tracks를 찾기 위해 움직인다.
- disk arm을 움직여서 원하는 sector를 찾을 때 $\Rightarrow$ Seek time, rotation delay 발생

Disk Access Time

세 가지 주요 요소

Seek time: 원하는 track을 찾기 위해 disk head를 움직이는데 드는 시간
Rotational delay: 원하는 track에는 도달했고, 원하는 sector가 (read/write) head 아래에 위치할 때까지 회전하는 데 걸리는 시간
- 최소: 0, 최대: 한 바퀴 전체 $\Rightarrow$ 평균적으로 full rotation time의 절반
Transfer time: bit의 disk block을 전송하기 위해 사용되는 시간

거기에 더해, disk controller의 overhead로 인한 queuing delays로, 추가적인 latencies가 발생할 수 있다.

Examples: 512B sector, 7,200rpm, 4ms average seek time, 100MB/s transfer rate, 0.2ms controller overhead, idle disk

Q: What would be the expected “read” (or “write”) time of a sector on this disk (즉, 한 섹터 읽(쓰)는데 걸리는 평균 시간)?
- Seek time + Rotational time + Transfer time + overhead
- 4ms + (7200/60s = 120/s => 1초당 120 rotation = 1000ms당 120 rotation => 25/3ms => half이므로 1/2) = 25/6ms + (8*512B/100MB/1000ms = 4KB/100KB/ms = 4KB/100KB/ms) = 0.04ms + 0.2ms
  = 4ms + 25/6ms + 0.04ms + 0.2ms

Techniques for Efficient Data Access

Buffering of data (main memory 내에서)
- old data가 처리되는 동안, new data가 buffer에서 대기
disk에서의 적절한 data organization
- related data를 이웃한 blocks에 위치; data blocks를 head에 가깝게 위치하여 inner track에 대해 disk arm이 안까지 들어가야 하는 성능의 저하를 방지
request 이전에 data를 reading: prefetching
- disk read에 대해, track의 남은 blocks 또한 함께 read
- 곧 read될 가능성이 있기 때문
I/O requests에 대한 적절한 scheduling
- total access time을 최소화하기 위함
- elevator algorithm: disk arms는 한 방향으로만 움직임
writes 사항들을 임시적으로 보관하는 log disks의 사용
- 적혀져야 하는 모든 blocks는 write를 위한 seek time을 제거하기 위해 log disk에 저장된다.
- 모아뒀다가 한 번에 write 수행
SSDs의 사용: 기계적인 지연은 없지만 비쌈

BUFFERING OF BLOCKS

Interleaved Concurrency (on Single CPU) vs. Parallel Execution (on multi-CPUs)

Interleaved concurrency의 경우, 각 task를 쪼개어 시간대별로 수행
하지만 Parallel execution의 경우, 각 task를 동시에 수행
Buffering은 프로세스들이 동시에 실행(concurrently in parallel)될 수 있을 때 가장 유용

Concepts of Double Buffering

disk를 read하는데 A와 B 두 개의 buffer를 사용

processing time이 I/O time보다 짧은 경우

P1은 buffer의 I/O를 담당하고, P2는 buffer에 대한 processing을 담당

Double buffering

block transfer가 완료되면, CPU는 해당 block에 대한 처리를 시작할 수 있다.
동시에 disk I/O controller (processor)는 다음 block을 읽고 또 다른 buffer에 전송할 수 있다.
따라서 여러 blocks에 대한 continuous read에 용이하다

Buffer Management

Buffer manager (BM) of a DBMS

data request에 응답
(i) 어떤 buffer를 사용할 지와 (ii) buffer에서 새로운 pages를 위해 어떤 pages를 대체할 것인지를 결정 (I/O의 효율을 지키는 최적의 page를 kick out)
사용 가능한 main memory storage를 buffer pool로 view (buffer pool에는 page단위가 저장)
각 page에 대해 아래 두 type의 정보를 저장
- Pin-count: 해당 page가 요청된 횟수 (또는 해당 page를 사용 중인 user의 수)
  - Pinning: pin-count를 증가시키는 것
    pin-count가 0이 되면, unpinned. 그렇지 않으면, evict될 수 없다.
- Dirty bit: 초기는 0, program에 의해 해당 page가 update되면 1로 set
  - set to 1: Disk 상의 page 정보 $\neq$ main memory에 올라온 page 정보
    즉, Disk에 update가 필요함을 의미

특정 page가 requested되면 ...

Buffer manager는 요청된 page가 buffer pool에 있는 buffer에 이미 존재하는지 확인!

page가 존재한다면, manager는 해당 page의 pin-count를 증가시키고 (pinning) 해당 page의 내용을 copy해서 전달
존재하지 않는다면, manager는 아래의 행동을 수행
- a) replacement policy를 사용하여 대체할 page를 선택하고, 해당 page의 pin-count를 증가시킨다.
- b) 대체하려는 page의 dirty bit가 ON인 경우(update가 필요), manager는 해당 page를 disk에 write(disk에 있는 old page를 대체)하고 비움
  - dirty bit가 OFF인 경우, disk에 write할 필요가 없음: 해당 page에 update가 일어나지 않았기 때문
- c) 요청된 block을 금방 비운 page로 read
- d) 새로운 page의 main memory address가 요청자에게 전달됨

unpinned page가 없고, 요청된 page가 buffer pool에서 사용이 불가능한 경우(buffer pool의 모든 buffer가 사용 중), manager는 대기해야 함

Buffer Replacement Strategies

일반적인 buffer replacement strategies 3가지

Least recently used (LRU): 가장 오랫동안 사용되지 않은 page를 희생하는 전략, most common
First-in-first-out (FIFO): 가장 오래 있었던 page를 희생
- index의 root block은 매번 kick out 되었다가 다시 들어오고를 반복할 것 $\rarr$ 매우 비효율적
Clock policy: round-robin variant of LRU
- 0: unused, 1: used
- round-robin fashion에서 flag 0 값을 가지는 buffer를 탐색
  - clock hand는 "current buffer"를 가리키고 있음
  - 새로운 block에 대한 buffer가 요청되면, BM은 0인 buffer를 탐색하고 발견하면 해당 buffer를 사용
  - flag 1인 buffer는 0으로 변경하면서 탐색
  - 1 rotate동안 flag 0을 발견하지 못하면 flag 1을 0으로 변경하면서 처음 나오는 flag 0 buffer를 사용

PLACING FILE RECORDS ON DISK

Records and Record Type

Record: 관련있는 data values or items의 집합

Record type (format)

(i) field names와 (ii) 그들의 data types의 집합

Files, Fixed-Length Records, and Variable-Length Records

File: records의 sequence

file의 header에는 record에 대한 meta 정보가 존재,
record 또한 각자의 header 존재

Fixed-length records: 모든 record가 동일한 크기를 가짐
Variable-length records: 서로 다른 record는 서로 다른 크기를 가짐

Variable-length records를 사용하는 이유?

하나 이상의 fields가 variable-length를 가질 수 있다.
하나 이상의 fields가 반복될 수 있다.
하나 이상의 fields가 부가적일 수 있다.
File이 다양한 types의 records를 포함할 수 있다.

Three Options Formatting Records of a File of Variable-length Records: EMPLOYEE

1) 6 fields와 71 bytes의 크기를 가지는 fixed-length record: 할당이 쉬움, 하지만 공간 낭비 발생가능성이 존재

2) 2개의 variable-length fields와 3개의 fixed-length fields를 가지는 record: variable-length를 위함

Separator Characters로 field를 구분

3) 3가지 종류의 separator characters를 가지는 variable-field record: with optional fields

Record Blocking and Spanned vs. Unspanned Records

file의 records는 무조건 disk blocks에 할당되어야 한다.

block은 disk와 memory 사이의 data transfer의 기본 단위
block의 크기가 record의 크기보다 크거나 같다면, 한 block에 여러 개의 records가 저장될 수 있다.

Spanned records

record의 크기가 single block보다 큰 경우, 해당 record를 분리하여 여러 block에 저장
first block의 끝에 남은 record를 저장하고 있는 block에 대한 pointer를 저장

Unspanned records

record의 크기가 block보다 작은 경우
pointer를 사용하지 않아 빠르지만, 공간이 낭비된다.

Blocking factor

: 한 file의 block당 평균 record의 수

Unused space = $B - (bfr * R)$ bytes ( $B$ : block size, $R$ : record size, $bfr$ : blocking factor for the file)
$bfr$ 은 $r$ 개의 records의 file에 필요한 blocks의 수 계산하는데 사용될 수 있다.
- $b = ceiling((r/bfr))$ blocks ( $r$ : file의 record 수)

Q: How many blocks are needed to store a file of 1000 records?

Record size: 16Bytes
Block size: 4KBytes
Unused space: 0Bytes
- What is $bfr$ ?
  - 0 = 4096 - (bfr * 16) $\Rightarrow$ bfr = 4096/16 = 256

Allocating File Blocks on Disk

몇 가지 일반적인 block 할당 방법

Continuous allocation: file blocks는 순차적으로 할당
- 전체 file은 double buffering 시 빠르게 read 가능
Linked allocation: 각 file block은 다음 file block을 가리키는 pointer를 포함
- 확장이 쉽지만, 전체 file을 read하는데는 느리다. (pointer로 인해)
- Seek time이 많이 발생하는 경우 사용
Cluster allocation: Continuous와 Linked의 결합
- 여러 block을 결합
Indexed allocation: 하나 또는 이상의 index blocks이 실제 file blocks에 대한 pointers를 포함 (e.g., hash file)
- index file을 통해 file block에 접근

File Headers

file에 대한 정보를 담고 있음

disk addresses of the file blocks
format descriptions

OPERATIONS ON FILES

Operations on Files

Retrieval operations

어떠한 data도 변경하지 않음

Update operations

insertion, or deletion을 통해 파일을 변경

File Organization (vs. Access Method)

File organization

file의 data를 record, block 및 접근 구조로 구성하는 방식

files안의 records를 어떻게 구성할 것인가 ?

Heap: file의 어느 곳이든 record를 저장 가능
Sequential: 각 record의 search key의 값에 따라 순차적으로 record를 저장
Hashing: hash function 사용하여 해당 값에 따라 record를 저장
Multitable clustering file organization: 각 relation의 record들이 서로 다른 file에 저장될 수 있다.
- 서로 다른 relations에 있는 record가 같은 file에 저장될 수도 있다.
  - 관련있는 records를 같은 block에 저장하여 I/O를 최소화

(File Organization vs.) Access Method

Access method

file organization에 따라 access method를 적절히 적용해야 한다.

Ex) indexed access method는 index가 없는 file에는 적용이 불가능하다.

FILES OF UNORDERED RECORDS (HEAP FILES)

Heap (or Pile) File

가장 간단하고 기본적인 organization 방식

record insertion

records는 삽입되는 순서대로 저장
- 새로운 records는 file의 가장 끝에 저장
새로운 record에 대한 Insertion은 매우 효율적
- file의 마지막 disk block은 buffer로 복사
- 새로운 record 삽입
- 해당 block을 다시 disk에 write
  - 마지막 file block의 주소는 file buffer에 남아있다.

record search

record를 탐색하는 것은 linear search로 인해 매우 비효율적
- 평균적으로 $b/2$ ( $b$ : file의 blocks 수)
- 최악의 경우, 모든 block을 다 탐색
- min: buffer에 block이 있는 경우, max: 전체 block 모두 탐색

record delete

먼저 해당 block을 탐색
block을 buffer으로 복사
buffer에서 record를 삭제
block을 다시 disk에 write

또 다른 방법: delete marker

각 record에 추가 bit or byte를 붙여 사용

두 가지 방법 모두 삭제된 records의 공간 낭비를 막기 위해 주기적인 reorganization이 필요

삭제되지 않은 records에 대해 packing

세번째 방법: 삭제된 records의 공간에 insertion

FILES OF ORDERED RECORDS (SORTED FILES)

Sorted (Sequential) Files

물리적으로 key field에 대해 정렬하여 file 저장 (key field = ordering field)
- ordering key: file의 key field가 ordering field인 경우
- 정렬을 위한 key field가 필요
  - 탐색 시, Binary Search ( $log_n$ )

heap files와 비교하여 가지는 이점

정렬이 필요없음
다음 record를 찾는데 block의 마지막 record가 아니면 추가적인 block access가 필요하지 않다. (정렬되어 있기 때문에 다음 record는 같은 block 내에 존재하기 때문)
binary search 가능

Deletion

: pointer chains 또는 deletion marker

Insertion

: record가 삽입되어야 할 위치에 삽입해야 함

free space가 있는 경우, 해당 위치에 insert
free space가 없는 경우, record를 overflow block에 삽입
- pointer chain을 update해야 함

하지만 주기적으로 sequential order를 재저장하기 위해 file을 재구성해야 한다.

SUbbb

배우고 정리하고 공유하기

이전 포스트

[DB] Basics of Functional Dependencies and Normalization for Relational Databases

다음 포스트