Go 언어로 Sieve 캐시 구현하기

sangjinsu·2025년 5월 3일

Implementing a Sieve cache in Go

최근 LRU(Least Recently Used) 캐시를 구현하면서 캐시 관리의 세계에 본격적으로 발을 들이게 되었습니다. 성능 면에서 꽤 잘 작동했지만, 문득 이런 생각이 들었습니다. “더 빠르고 효율적인 캐시 제거(eviction) 알고리즘은 없을까?”

이 호기심은 저를 다양한 캐시 전략을 파고드는 리서치의 길로 이끌었고, 그 과정에서 눈에 띈 것이 바로 SIEVE 캐시였습니다.

전통적인 캐시 시스템이 LRU, LFU 같은 일반적인 정책을 기반으로 동작하는 반면, SIEVE 캐시는 필터링 규칙에 기반한 선택적이고 적응형 캐싱 전략에 초점을 둡니다. 사용 사례에 맞춰 정교하게 조정 가능한 구조라는 점이 특히 흥미로웠습니다.

📦 LRU(Least Recently Used) 캐시란?

LRU는 가장 오랫동안 사용되지 않은 데이터를 우선 제거하는 캐시 알고리즘입니다. 최근에 사용된 데이터는 캐시에 남기고, 사용되지 않은 데이터부터 제거하여 메모리를 확보합니다.

🔧 구현 방식

해시맵 + 이중 연결 리스트 조합이 일반적

O(1) 시간 복잡도로 삽입, 조회, 삭제 가능

📈 장점

최근 사용 데이터를 효과적으로 유지

구현이 비교적 단순함

⚠️ 한계

접근 빈도가 아닌 사용 시간 순서만 고려됨

빈번히 사용되지만 최근에 접근하지 않은 데이터도 제거될 수 있음

🛠 사용 예시

웹 브라우저 방문 기록

메모리 기반 세션 관리

DB 쿼리 결과 캐싱 등

🧪 Sieve 캐시란?

Sieve 캐시는 자주 접근되는 콘텐츠를 더욱 효율적이고 선택적으로 제공하기 위해 설계된 캐싱 메커니즘입니다.

이 개념은 주로 특정 기준에 따라 데이터를 선별(filtering) 하거나 분할(partitioning) 해야 하는 시스템에서 사용됩니다. 이름처럼, 체(sieve)를 통해 불필요한 데이터를 거르고 유의미한 데이터만 통과시키는 방식에서 유래했습니다.

즉, 단순히 오래되었거나 적게 사용된 데이터를 제거하는 것이 아니라, 사용 목적과 조건에 따라 필요한 데이터만 선별적으로 캐시에 담는 전략입니다.

🔍 Sieve 캐시의 핵심 특징

선택적 캐싱 (Selective Caching)
- 전체 데이터를 저장하지 않고, 가치가 높거나 자주 접근되는 데이터만 선별적으로 캐싱
- 저장 공간이나 중요도 측면에서 모든 데이터를 캐싱하는 것이 비효율적인 상황에 적합
효율적인 필터링 (Efficient Filtering)
- 특정 알고리즘을 통해 데이터를 선별적으로 걸러냄
- 접근 빈도, 최근 사용 여부, 혹은 사용자 정의 규칙 등을 기반으로 캐싱 여부를 결정
적응형 메커니즘 (Adaptive Mechanism)
- 워크로드나 사용 패턴의 변화에 따라 동적으로 캐시 데이터를 조정
- 현재 트래픽과 사용 환경에 최적화된 데이터만 유지함으로써 캐시의 실효성을 높임
낮은 지연 시간 (Low Latency)
- 실시간 처리가 중요한 시스템을 위한 빠른 응답 속도 보장
- 불필요한 데이터를 줄이고 필요한 데이터만 남기기 때문에 검색 및 조회 속도가 빠름

🛠 구현

Sieve 캐시의 핵심 로직은 무엇을 캐싱할지 결정하는 사용자 정의 필터링 규칙에 기반합니다.

이번 구현에서는 Go의 sync.Map을 사용해 동시성-safe한 캐시 저장소를 구성하고,

"이 데이터를 캐시에 담을지 말지"를 판단하는 커스터마이징 가능한 필터 함수를 정의할 예정입니다.

즉, 단순한 키-값 저장소가 아닌, 입력 조건에 따라 데이터를 선별적으로 수용하는 지능형 캐시 구조를 만드는 것입니다.

💡 Sieve 캐시 구현 예제 (Go)

아래는 Go로 구현한 Sieve 캐시의 예제 코드입니다. 이 캐시는 sync.Map을 기반으로 하고 있으며, 각 항목의 TTL(만료 시간)과 사용자 정의 필터 함수를 통해 선택적 캐싱을 구현합니다.

주요 구조

type CacheItem struct {
    Value      interface{}
    Expiration int64
}

각 캐시 항목은 값(Value)과 만료 시간(Expiration)을 포함하는 구조체입니다. 만료 시간은 유닉스 타임스탬프로 계산되며, TTL(Time-To-Live) 기반입니다.

type SieveCache struct {
    store        sync.Map
    ttl          time.Duration
    sieveFilter  func(key string, value interface{}) bool
    cleanupTicker *time.Ticker
    stopCleanup  chan struct{}
}

sync.Map: 고루틴-safe한 캐시 저장소
ttl: 기본 TTL 설정값
sieveFilter: 필터링 조건을 동적으로 지정 가능
cleanupTicker: 주기적인 만료 데이터 정리를 위한 타이머
stopCleanup: 정리 루틴 종료 신호 채널

캐시 저장 로직 (Set)

func (c *SieveCache) Set(key string, value interface{}) {
    if c.sieveFilter(key, value) {
        c.store.Store(key, CacheItem{
            Value:      value,
            Expiration: time.Now().Add(c.ttl).Unix(),
        })
    }
}

데이터를 캐시에 저장할 때는 반드시 sieveFilter 함수 조건을 통과해야 합니다.
TTL은 현재 시간에 더해져 만료 시점을 계산합니다.

캐시 조회 및 만료 처리 (Get)

func (c *SieveCache) Get(key string) (interface{}, bool) {
    ...
}

만료된 항목은 조회 시점에서 자동 제거되며, false를 반환합니다.

만료 항목 주기적 정리

func (c *SieveCache) cleanupExpiredItems() {
    ...
}

TTL의 절반 주기로 모든 캐시 항목을 순회하며, 만료된 항목을 삭제합니다.

🧪 실행 예제


filter := func(key string, value interface{}) bool {
    str, ok := value.(string)
    return ok && len(str) > 3
}

위 예제에서는 문자열 길이가 3자를 초과할 경우에만 캐시하도록 필터를 정의했습니다.

cache.Set("key1", "short")         // 캐싱 안됨
cache.Set("key2", "longerString")  // 캐싱됨

이후, key2 항목은 10초 후 만료되며, 조회 시 사라집니다.

package main

import (
 "fmt"
 "sync"
 "time"
)

// CacheItem represents an individual item in the cache
type CacheItem struct {
 Value      interface{}
 Expiration int64 // Unix timestamp for expiration
}

// SieveCache represents the sieve cache
type SieveCache struct {
 store        sync.Map
 ttl          time.Duration
 sieveFilter  func(key string, value interface{}) bool // Custom filter logic
 cleanupTicker *time.Ticker
 stopCleanup  chan struct{}
}

// NewSieveCache creates a new instance of SieveCache
func NewSieveCache(ttl time.Duration, sieveFilter func(key string, value interface{}) bool) *SieveCache {
 cache := &SieveCache{
  ttl:         ttl,
  sieveFilter: sieveFilter,
  stopCleanup: make(chan struct{}),
 }
 cache.cleanupTicker = time.NewTicker(ttl / 2) // Cleanup runs at half the TTL interval
 go cache.cleanupExpiredItems()
 return cache
}

// Set adds a new item to the cache if it passes the sieve filter
func (c *SieveCache) Set(key string, value interface{}) {
 if c.sieveFilter(key, value) {
  c.store.Store(key, CacheItem{
   Value:      value,
   Expiration: time.Now().Add(c.ttl).Unix(),
  })
 }
}

// Get retrieves an item from the cache if it exists and is not expired
func (c *SieveCache) Get(key string) (interface{}, bool) {
 item, exists := c.store.Load(key)
 if !exists {
  return nil, false
 }

 cacheItem := item.(CacheItem)
 if time.Now().Unix() > cacheItem.Expiration {
  c.store.Delete(key)
  return nil, false
 }
 return cacheItem.Value, true
}

// Delete removes an item from the cache
func (c *SieveCache) Delete(key string) {
 c.store.Delete(key)
}

// cleanupExpiredItems removes expired items from the cache periodically
func (c *SieveCache) cleanupExpiredItems() {
 for {
  select {
  case <-c.cleanupTicker.C:
   c.store.Range(func(key, value interface{}) bool {
    cacheItem := value.(CacheItem)
    if time.Now().Unix() > cacheItem.Expiration {
     c.store.Delete(key)
    }
    return true
   })
  case <-c.stopCleanup:
   return
  }
 }
}

// Stop stops the cleanup goroutine
func (c *SieveCache) Stop() {
 close(c.stopCleanup)
 c.cleanupTicker.Stop()
}

func main() {
 // Custom sieve filter: only cache strings with length > 3
 filter := func(key string, value interface{}) bool {
  str, ok := value.(string)
  return ok && len(str) > 3
 }

 // Create a sieve cache with a 10-second TTL
 cache := NewSieveCache(10*time.Second, filter)
 defer cache.Stop()

 // Add items to the cache
 cache.Set("key1", "short")         // Won't be cached (length <= 3)
 cache.Set("key2", "longerString") // Will be cached

 // Retrieve items from the cache
 if value, ok := cache.Get("key2"); ok {
  fmt.Println("Found key2:", value)
 } else {
  fmt.Println("key2 not found")
 }

 // Wait for 11 seconds to test expiration
 time.Sleep(11 * time.Second)
 if _, ok := cache.Get("key2"); !ok {
  fmt.Println("key2 expired")
 }
}

Python

import time
import threading

class CacheItem:
    def __init__(self, value, expiration):
        self.value = value
        self.expiration = expiration

class SieveCache:
    def __init__(self, ttl_seconds, sieve_filter):
        self.ttl = ttl_seconds
        self.sieve_filter = sieve_filter  # 사용자 정의 필터 함수
        self.store = {}
        self.lock = threading.Lock()
        self.cleanup_interval = ttl_seconds / 2
        self.stop_event = threading.Event()

        self.cleanup_thread = threading.Thread(target=self._cleanup_loop, daemon=True)
        self.cleanup_thread.start()

    def set(self, key, value):
        if self.sieve_filter(key, value):
            with self.lock:
                expiration = time.time() + self.ttl
                self.store[key] = CacheItem(value, expiration)

    def get(self, key):
        with self.lock:
            item = self.store.get(key)
            if not item:
                return None

            if time.time() > item.expiration:
                del self.store[key]
                return None
            return item.value

    def delete(self, key):
        with self.lock:
            if key in self.store:
                del self.store[key]

    def stop(self):
        self.stop_event.set()
        self.cleanup_thread.join()

    def _cleanup_loop(self):
        while not self.stop_event.is_set():
            time.sleep(self.cleanup_interval)
            now = time.time()
            with self.lock:
                keys_to_delete = [key for key, item in self.store.items() if now > item.expiration]
                for key in keys_to_delete:
                    del self.store[key]

# --- 사용 예시 ---

def custom_filter(key, value):
    return isinstance(value, str) and len(value) > 3

cache = SieveCache(ttl_seconds=10, sieve_filter=custom_filter)

cache.set("key1", "hi")           # 캐싱 안 됨 (길이 <= 3)
cache.set("key2", "longerData")   # 캐싱 됨

val = cache.get("key2")
print("key2 found:", val) if val else print("key2 not found")

time.sleep(11)
val = cache.get("key2")
print("key2 expired") if not val else print("key2 still valid")

cache.stop()

Java

import java.util.Map;
import java.util.concurrent.*;
import java.util.function.BiPredicate;

public class SieveCache<K, V> {

    private static class CacheItem<V> {
        final V value;
        final long expiration; // Unix timestamp in milliseconds

        CacheItem(V value, long expiration) {
            this.value = value;
            this.expiration = expiration;
        }
    }

    private final ConcurrentHashMap<K, CacheItem<V>> store = new ConcurrentHashMap<>();
    private final long ttlMillis;
    private final BiPredicate<K, V> sieveFilter;
    private final ScheduledExecutorService scheduler;
    
    public SieveCache(long ttlSeconds, BiPredicate<K, V> sieveFilter) {
        this.ttlMillis = ttlSeconds * 1000;
        this.sieveFilter = sieveFilter;
        this.scheduler = Executors.newSingleThreadScheduledExecutor();
        scheduler.scheduleAtFixedRate(this::cleanupExpiredItems, ttlMillis / 2, ttlMillis / 2, TimeUnit.MILLISECONDS);
    }

    public void set(K key, V value) {
        if (sieveFilter.test(key, value)) {
            long expiration = System.currentTimeMillis() + ttlMillis;
            store.put(key, new CacheItem<>(value, expiration));
        }
    }

    public V get(K key) {
        CacheItem<V> item = store.get(key);
        if (item == null) return null;

        if (System.currentTimeMillis() > item.expiration) {
            store.remove(key);
            return null;
        }
        return item.value;
    }

    public void delete(K key) {
        store.remove(key);
    }

    private void cleanupExpiredItems() {
        long now = System.currentTimeMillis();
        for (Map.Entry<K, CacheItem<V>> entry : store.entrySet()) {
            if (now > entry.getValue().expiration) {
                store.remove(entry.getKey());
            }
        }
    }

    public void stop() {
        scheduler.shutdownNow();
    }

    // --- 테스트용 main ---
    public static void main(String[] args) throws InterruptedException {
        SieveCache<String, String> cache = new SieveCache<>(10, (key, value) -> value.length() > 3);

        cache.set("key1", "hi");            // 캐싱 안 됨
        cache.set("key2", "longerData");    // 캐싱 됨

        String val = cache.get("key2");
        System.out.println(val != null ? "key2 found: " + val : "key2 not found");

        Thread.sleep(11_000);
        val = cache.get("key2");
        System.out.println(val != null ? "key2 still valid" : "key2 expired");

        cache.stop();
    }
}

Typescript

type CacheItem<T> = {
  value: T;
  expiration: number; // Unix timestamp (ms)
};

export class SieveCache<T> {
  private store: Map<string, CacheItem<T>> = new Map();
  private ttl: number;
  private sieveFilter: (key: string, value: T) => boolean;
  private cleanupInterval: NodeJS.Timer;

  constructor(ttlSeconds: number, sieveFilter: (key: string, value: T) => boolean) {
    this.ttl = ttlSeconds * 1000;
    this.sieveFilter = sieveFilter;

    // 주기적 만료 정리 (TTL의 절반 주기)
    this.cleanupInterval = setInterval(() => {
      const now = Date.now();
      for (const [key, item] of this.store.entries()) {
        if (now > item.expiration) {
          this.store.delete(key);
        }
      }
    }, this.ttl / 2);
  }

  set(key: string, value: T): void {
    if (this.sieveFilter(key, value)) {
      const expiration = Date.now() + this.ttl;
      this.store.set(key, { value, expiration });
    }
  }

  get(key: string): T | undefined {
    const item = this.store.get(key);
    if (!item) return undefined;

    if (Date.now() > item.expiration) {
      this.store.delete(key);
      return undefined;
    }

    return item.value;
  }

  delete(key: string): void {
    this.store.delete(key);
  }

  stop(): void {
    clearInterval(this.cleanupInterval);
  }
}

🔄 LRU 캐시와의 비교

🧠 LRU 캐시

가장 최근에 사용된 데이터가 다시 사용될 가능성이 높다는 가정하에 동작
새로운 항목을 저장하기 위해 가장 오래된 항목부터 제거
시간 기반이 아닌, 접근 순서를 기준으로 관리

🧪 Sieve 캐시

*사용자 정의 필터(sieve filter)를 기반으로, 캐싱 전에 데이터를 선별**
캐시에 들어가는 모든 데이터가 조건을 통과해야만 저장됨
제거 정책은 LRU와 달리 TTL(Time-to-Live) 기반으로 만료됨

💼 Sieve 캐시의 활용 사례

CDN (콘텐츠 전송 네트워크)
- 자주 접근되는 리소스 또는 지역 기반 맞춤 콘텐츠를 선별하여 캐싱
- 예: 특정 지역에서만 인기 있는 이미지/스크립트 리소스만 저장
애드테크 (AdTech)
- 유저 타겟팅 규칙, 광고 크리에이티브, 입찰 응답 등 반복 사용되는 데이터만 선택적으로 캐싱
- 비효율적인 전체 캐싱을 피하면서 광고 응답 속도 향상
데이터베이스 쿼리 최적화
- 결과 값이 큰 쿼리 중 일부만 캐싱, DB 부하를 줄이면서도 효율성 확보
- 예: 조건부 캐싱 – 특정 테이블 조합에서만 결과 저장
스트리밍 서비스
- 사용자의 시청/청취 패턴에 따라 필요한 데이터 조각(chunk)만 캐싱
- 예: 최근 10초 구간, 자주 시청된 클립 등만 선별적으로 유지

💭 개인적인 고민 – Sieve 캐시와 Redis를 함께 활용할 수 있을까?

이 부분은 블로그 본문과는 별도로, 제가 실제로 고민했던 지점입니다.

Sieve 캐시는 애플리케이션 레벨에서 특정 데이터를 선별적으로 캐싱하는 구조이고,

Redis는 TTL 기반의 고성능 분산 캐시로 많이 사용됩니다.

그렇다면, Sieve 캐시의 로직은 앱에서 처리하고, Redis를 캐시 저장소로 활용할 수 있지 않을까?

이런 구조라면 캐싱 대상도 줄이고, TTL 관리도 Redis에 위임할 수 있어 유지 보수도 쉬워집니다.

나아가 멀티 노드 환경에서는 Redis를 통해 선별된 데이터를 공유하는 다단계 캐시 구조도 가능하겠다는 생각이 들었습니다.

이 아이디어는 아직 실험 단계지만, 실제로 적용해본다면 추후 별도 글로도 정리해보고 싶습니다.