mmap

Migo·2025년 2월 15일
0

Fluent Rust

목록 보기
23/23

Say you are building a cache server that has persistence feature like Redis.

Everytime server spins up, you have to load the snapshot and build the cache storage.

The most straightforward approach would be as follows:

 pub(crate) async fn load_from_filepath(filepath: String) -> anyhow::Result<Snapshot> {
        let bytes = tokio::fs::read(filepath).await?;
        Self::load_from_bytes(&bytes)
    }
    pub(crate) fn load_from_bytes(bytes: &[u8]) -> anyhow::Result<Snapshot> {
        let decoder: BytesDecoder<DecoderInit> = bytes.into();
        let database = decoder.load_header()?.load_metadata()?.load_database()?;
        Ok(database)
    }

What it does is basically loading the entire file into memory at once. In other words:

  • OS reads file from disk into kernel buffer (page cache)
  • Data is then copied from kernel buffer to a new buffer in user space (your application's memory)
  • This involves TWO copies: disk → kernel buffer → user buffer

This becomes problematic when hosting server has limited memory space. For example, imagine you are trying to load a 20gb size of snapshot when server has 32gb memory.

This would require:

  • Space in kernel buffer (page cache) for the file
  • PLUS an additional 20GB in user space for the complete copy
  • PLUS memory needed for your application's other operations
  • PLUS memory needed by other processes and the OS

So, this would likely result in:

  • Out of memory
  • Severe system performance degradation due to swapping
  • System instability or crash

So, how do we circumvent this problem?

mmap

Use mmap appropriately:

   pub(crate) async fn load_from_filepath(filepath: String) -> anyhow::Result<Snapshot> {
        let file = tokio::fs::File::open(&filepath).await?;
        let mmap = unsafe { memmap2::Mmap::map(&file).unwrap() };
        Self::load_from_bytes(&mmap)
    }
    pub(crate) fn load_from_bytes(bytes: &[u8]) -> anyhow::Result<Snapshot> {
        let decoder: BytesDecoder<DecoderInit> = bytes.into();
        let database = decoder.load_header()?.load_metadata()?.load_database()?;
        Ok(database)
    }

With mmap:

  • Only the actually accessed portions get loaded into physical memory
  • The OS can intelligently page data in/out as needed
  • No double-buffering of the entire file
  • The virtual memory mapping doesn't require physical memory for the entire file upfront

Safety

Memory mapping. is marked as unsafe mainly because of file modification risk

  • If the underlying file is modified/deleted while mapped, it could lead to undefined behavior or segmentation faults
  • Rust's safety guarantees can't prevent external modifications
  • Rust can't guarantee thread-safety across process boundaries

If you are sure that:

  • File won't be modified externally
  • It runs on a local filesystem
  • It has sufficient virtual memory
  • Only single process accesses the file

Then memory mapping is relatively safe and can be wrapped in a safe abstraction:

profile
Dude with existential crisis

0개의 댓글