mmap

Migo·2025년 2월 15일

Fluent Rust

목록 보기

23/23

Say you are building a cache server that has persistence feature like Redis.

Everytime server spins up, you have to load the snapshot and build the cache storage.

The most straightforward approach would be as follows:

 pub(crate) async fn load_from_filepath(filepath: String) -> anyhow::Result<Snapshot> {
        let bytes = tokio::fs::read(filepath).await?;
        Self::load_from_bytes(&bytes)
    }
    pub(crate) fn load_from_bytes(bytes: &[u8]) -> anyhow::Result<Snapshot> {
        let decoder: BytesDecoder<DecoderInit> = bytes.into();
        let database = decoder.load_header()?.load_metadata()?.load_database()?;
        Ok(database)
    }

What it does is basically loading the entire file into memory at once. In other words:

OS reads file from disk into kernel buffer (page cache)
Data is then copied from kernel buffer to a new buffer in user space (your application's memory)
This involves TWO copies: disk → kernel buffer → user buffer

This becomes problematic when hosting server has limited memory space. For example, imagine you are trying to load a 20gb size of snapshot when server has 32gb memory.

This would require:

Space in kernel buffer (page cache) for the file
PLUS an additional 20GB in user space for the complete copy
PLUS memory needed for your application's other operations
PLUS memory needed by other processes and the OS

So, this would likely result in:

Out of memory
Severe system performance degradation due to swapping
System instability or crash

So, how do we circumvent this problem?

mmap

Use mmap appropriately:

   pub(crate) async fn load_from_filepath(filepath: String) -> anyhow::Result<Snapshot> {
        let file = tokio::fs::File::open(&filepath).await?;
        let mmap = unsafe { memmap2::Mmap::map(&file).unwrap() };
        Self::load_from_bytes(&mmap)
    }
    pub(crate) fn load_from_bytes(bytes: &[u8]) -> anyhow::Result<Snapshot> {
        let decoder: BytesDecoder<DecoderInit> = bytes.into();
        let database = decoder.load_header()?.load_metadata()?.load_database()?;
        Ok(database)
    }

With mmap:

Only the actually accessed portions get loaded into physical memory
The OS can intelligently page data in/out as needed
No double-buffering of the entire file
The virtual memory mapping doesn't require physical memory for the entire file upfront

Safety

Memory mapping. is marked as unsafe mainly because of file modification risk

If the underlying file is modified/deleted while mapped, it could lead to undefined behavior or segmentation faults
Rust's safety guarantees can't prevent external modifications
Rust can't guarantee thread-safety across process boundaries

If you are sure that:

File won't be modified externally
It runs on a local filesystem
It has sufficient virtual memory
Only single process accesses the file

Then memory mapping is relatively safe and can be wrapped in a safe abstraction:

Migo

Dude with existential crisis

이전 포스트

mmap

Fluent Rust

mmap

Safety

Async & consensus algorithm

0개의 댓글