Say you are building a cache server that has persistence feature like Redis.
Everytime server spins up, you have to load the snapshot and build the cache storage.
The most straightforward approach would be as follows:
pub(crate) async fn load_from_filepath(filepath: String) -> anyhow::Result<Snapshot> {
let bytes = tokio::fs::read(filepath).await?;
Self::load_from_bytes(&bytes)
}
pub(crate) fn load_from_bytes(bytes: &[u8]) -> anyhow::Result<Snapshot> {
let decoder: BytesDecoder<DecoderInit> = bytes.into();
let database = decoder.load_header()?.load_metadata()?.load_database()?;
Ok(database)
}
What it does is basically loading the entire file into memory at once. In other words:
This becomes problematic when hosting server has limited memory space. For example, imagine you are trying to load a 20gb size of snapshot when server has 32gb memory.
This would require:
So, this would likely result in:
So, how do we circumvent this problem?
Use mmap appropriately:
pub(crate) async fn load_from_filepath(filepath: String) -> anyhow::Result<Snapshot> {
let file = tokio::fs::File::open(&filepath).await?;
let mmap = unsafe { memmap2::Mmap::map(&file).unwrap() };
Self::load_from_bytes(&mmap)
}
pub(crate) fn load_from_bytes(bytes: &[u8]) -> anyhow::Result<Snapshot> {
let decoder: BytesDecoder<DecoderInit> = bytes.into();
let database = decoder.load_header()?.load_metadata()?.load_database()?;
Ok(database)
}
With mmap:
Memory mapping. is marked as unsafe
mainly because of file modification risk
If you are sure that:
Then memory mapping is relatively safe and can be wrapped in a safe abstraction: