DBMS (Database Management System)
is a module system, and consists of the following subsystem:
- Transportation layer (request)
- Query Processor
- Execution engine
- Storage engine
In-memory DBMS
Stores most of data in volatile RAM
Backups data on Disk for durability (as asynchronous batch to reduce I/O)
- Pros
- Easier to manage data over disk (through virtualization)
- Low cost of accessing data
- Cons
- Volatile data
- Expensive over SSD/HDD
Disk-based DBMS
- Uses specific data structure for accessing disk
- Usually a wide-short tree
- Uses a pointer to access data in memory
- Data stored on the disk is usually accessed with block-level
- Stores values of every column of a record in a block
- Efficient when trying to read a specific record
- Not efficient when trying to read multiple records
- Stores data by column-level continuously in memory
- Appropriate for aggregation analytics
Storage Engine
is a software component in charge of
- saving data in memomry and disk
- providing simple API to manipulate data in detail
- so that user can CRUD records
- providing useful functions like schema, query language, indexing, and transaction
consists of the following components:
- Transaction manager
- Lock manager
- Access method
- Buffer manager
- Recovery manager
Data
is saved as a sequence of bytes (for storage engine)
- can be defined as int32 or ASCII on the upper level subsystem
Buffering
Intentionally uses memory(buffer) to store the data before writing to disk
e.g) Adding In-memory buffer in B-tree node (reduce I/O)
Mutability
Means being able to write updated value at the same location
In immutable structure, data cannot be updated once it's written in the file
- Copy-On-Write
- Difference between LSM and B tree(In-place Update)
Ordering
Stores data in disk ordered by key
- Efficient for searching specific data record and range scan
- Not optimized for write operation
Data File vs Index File
Data File
Based on the following structure:
- Index-Organized Table (IOT)
- Stores actual data record in an index
- Data is sorted by key
- Heap-Organized Table
- Stores data in an order of insertion
- No need to reconfigure file once page is added
- Hash-Organized Table
- Stores records in the bucket of hash value of each key
Index File
Does mapping keys to the location of the according record in the data file
- Clustered Index
- Order of actual data record and one of the index is the same
- Data record is stored in either index file or clustered file
- Usually used for priamry key
- Non-clustered Index
- Data record is stored in other files and not ordered by the index key
- Also called as secondary index