python lru cache 코드 살펴보기

Matthew Woo·2022년 5월 29일

Python

목록 보기

1/1

앞으로 근무하면서 조만간 로컬캐시를 도입할 상황이 생길터라
캐시 코드를 좀 보았다. 호오? 흠터레스팅한 부분들이 있었다.

def ttl_cache(maxsize=128, ttl=600, timer=time.monotonic, typed=False):
    """Decorator to wrap a function with a memoizing callable that saves
    up to `maxsize` results based on a Least Recently Used (LRU)
    algorithm with a per-item time-to-live (TTL) value.
    """
    if maxsize is None:
        return _cache(_UnboundTTLCache(ttl, timer), typed) # 흠터레스팅 1
    elif callable(maxsize): # 흠터레스팅 2
        return _cache(TTLCache(128, ttl, timer), typed)(maxsize)
    else:
        return _cache(TTLCache(maxsize, ttl, timer), typed)

첫 번째는, maxsize를 None으로 주었을 경우,

class _UnboundTTLCache(TTLCache):
    def __init__(self, ttl, timer):
        TTLCache.__init__(self, math.inf, ttl, timer)

    @property
    def maxsize(self):
        return None

math.inf 값을 바로 변환하여 넣어주는 것이 아니라 클래스 상속 받아서 별도 클래스를 처리하면서 maxsize를 조회할 경우에는 None 임으로 처리될 수 있도록 한 코드가 인상적이었다. input으로 들어온 것을 다른 값으로 처리하고 그 값을 조회할 경우에는 input 받았던 값으로 return시키는게 인상적.

두번째는 maxsize 를 주는데 상수일 경우와 callable 일 경우 처리하는 로직이 들어가있는데 이 callable처리는 비슷한 로직으로 구현되어 있는것들이 많이있고 유용하게 쓸 수 있을거같은데 처음 보는 것이라 좀 더 봐야겠다. 이 코드들은 python 내장된 함수들은 아니어서 우선 파이썬 내장 @lru_cache 를 살표보고자 하였다.

코드를 보면서 좋았던 부분은 LRU를 여러 번 접하면서도 어떻게 구현할지 궁금하다정도만 접할 때 마다 궁금하다 정도로 생각하고 넘어갔는데 LRU를 어떻게 구현하는지 볼 수 있었다.
또, lock을 사용하면서 자칫하면 놓칠 수 있는 포인트를 챙길 수 있었다.

보면서 이건 뭐지 싶었던 부분,

# cpython/Lib/functools.py
def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # Constants shared by all lru cache instances:
    sentinel = object()          # unique object used to signal cache misses
    make_key = _make_key         # build a key from the function arguments
    
    ...
    
    lock = RLock()           #오 파이썬에서는 첨 보게 되는 lock이다. # because linkedlist updates aren't threadsafe
    root = []                # root of the circular doubly linked list
    root[:] = [root, root, None, None]     # initialize by pointing to self

캐싱을 사용하는 함수를 wrapping 하면서 캐싱을 지원하는 함수이다.
초기 캐시설정을 위한 값들을 설정해주는 것 같은데 root를 보면 root 안에 root. 안에 계속 들어가게끔 되어 있는데 왜 이런게 필요한지?

에 대한 답은 바로 아래 쪽 코드를 보면 나온다.

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # Constants shared by all lru cache instances:
    sentinel = object()          # unique object used to signal cache misses
    make_key = _make_key         # build a key from the function arguments
    PREV, NEXT, KEY, RESULT = 0, 1, 2, 3   # names for the link fields

    cache = {}
    hits = misses = 0
    full = False
    cache_get = cache.get    # bound method to lookup a key or return None
    cache_len = cache.__len__  # get cache size without calling len()
    '''
    파이썬은 락을 이렇게 쓰는구나? lock = RLock()
    잠깐, c 에서 봤던건 왜 기억이 안나?  0, 양의 int 있고 점유해가면 
    카운트를 낮추고 0이면 어떻게하더라 생각보다 꽤나 심플했는데 다시 봐야겠다.
    '''
    lock = RLock()           # because linkedlist updates aren't threadsafe
    root = []                # root of the circular doubly linked list
    root[:] = [root, root, None, None]     # initialize by pointing to self

    if maxsize == 0:

        def wrapper(*args, **kwds):
            # No caching -- just a statistics update
            nonlocal misses
            misses += 1
            result = user_function(*args, **kwds)
            return result

    elif maxsize is None:

        def wrapper(*args, **kwds):
            # Simple caching without ordering or size limit
            nonlocal hits, misses
            key = make_key(args, kwds, typed)
            result = cache_get(key, sentinel)
            if result is not sentinel:
                hits += 1
                return result
            misses += 1
            result = user_function(*args, **kwds)
            cache[key] = result
            return result

    else:

        def wrapper(*args, **kwds):
            # Size limited caching that tracks accesses by recency
            nonlocal root, hits, misses, full
            key = make_key(args, kwds, typed)
            with lock:
            	'''
                캐싱해둔 것이 있을 경우, 
                '''
                link = cache_get(key)
                if link is not None:
                    # Move the link to the front of the circular queue
                    '''linked-list에서 캐싱해둔 것의 앞뒤의 것을 연결해준다'''
                    link_prev, link_next, _key, result = link
                    link_prev[NEXT] = link_next
                    link_next[PREV] = link_prev
                    
                    ''' root 와 마지막 사이에 넣어준다. (최신조회)'''
                    last = root[PREV]
                    last[NEXT] = root[PREV] = link
                    link[PREV] = last
                    link[NEXT] = root
                    hits += 1
                    return result
                misses += 1
            result = user_function(*args, **kwds)
            with lock: 
                if key in cache: 
                '''
                이 부분은 내가 생각하지 못했던 부분이다.
                당연히 키 세팅해줄 때 lock을 걸어야하는 부분.
                동시에 시도할 때 락이 걸리고 그 다음 점유권을 가져올 경우가 있기에 그럼 키가 생성되어 있을 테고 그럼 pass.
                '''
                    # Getting here means that this same key was added to the
                    # cache while the lock was released.  Since the link
                    # update is already done, we need only return the
                    # computed result and update the count of misses.
                    pass
                elif full:
                '''
                LRU를 이렇게 구현해놨구나 싶은 부분. ㅗㅜㅑ 
                
                ㅁ 
                ㅁ <- Latest. 가장 최근 조회
                ㅁ <- root. [prev, next, null, null] 
                ㅁ <- oldest
                ㅁ
                
                새로 캐싱을 넣어줘야하는데 풀일 경우, root는 비어있으니 채워주고,
                root가 oldest를 참조하면서 oldest 의 키로 해쉬테이블에 저장되어있는 데이터를 지우고 null, null을 만든 뒤 root 로 변경된다.
                '''
                
                    # Use the old root to store the new key and result.
                    ''' 새로 들어오는 녀석이 root가 참조하는 부분을 같이 참조하면서 안에 key, value를 채워넣는다'''
                    oldroot = root
                    oldroot[KEY] = key
                    oldroot[RESULT] = result
                    # Empty the oldest link and make it the new root.
                    # Keep a reference to the old key and old result to
                    # prevent their ref counts from going to zero during the
                    # update. That will prevent potentially arbitrary object
                    # clean-up code (i.e. __del__) from running while we're
                    # still adjusting the links.
                    '''root는 이제 다음에 연결된 가장 참조가 되지 않았던 것을 참조한다.'''
                    root = oldroot[NEXT]
                    oldkey = root[KEY]
                    oldresult = root[RESULT]
                    root[KEY] = root[RESULT] = None '''비워버림'''
                    # Now update the cache dictionary.
                    del cache[oldkey] '''캐시에서도 지움'''
                    # Save the potentially reentrant cache[key] assignment
                    # for last, after the root and links have been put in
                    # a consistent state.
                    cache[key] = oldroot '''새로 들어온 것을 바로 접근하여 사용할 수 있게 hash table에 넣어줌'''
                else:
                    # Put result in a new link at the front of the queue.
                    last = root[PREV]
                    link = [last, root, key, result]
                    last[NEXT] = root[PREV] = cache[key] = link
                    # Use the cache_len bound method instead of the len() function
                    # which could potentially be wrapped in an lru_cache itself.
                    full = (cache_len() >= maxsize) '''하나 더 늘리고 풀인지 아닌지 체크'''
            return result

    def cache_info():
        """Report cache statistics"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())

    def cache_clear():
        """Clear the cache and cache statistics"""
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

    wrapper.cache_info = cache_info
    wrapper.cache_clear = cache_clear
    return wrapper

이거 그림이 처음에는 많이 혼란스러웠다. 화살표가 hash map에서 list로만 가리키고있는데, 양방향이 되어야한다. 삭제할 때 List에서 가장 오래된 것을 확인한 후 해당 키 값으로 hash map 을 삭제하기 때문.

키를 생성하는 방법은 나중에 보련다. 읽단 긁어만 놔야지.

def _make_key(args, kwds, typed,
             kwd_mark = (object(),),
             fasttypes = {int, str},
             tuple=tuple, type=type, len=len):
    """Make a cache key from optionally typed positional and keyword arguments

    The key is constructed in a way that is flat as possible rather than
    as a nested structure that would take more memory.

    If there is only a single argument and its data type is known to cache
    its hash value, then that argument is returned without a wrapper.  This
    saves space and improves lookup speed.

    """
    # All of code below relies on kwds preserving the order input by the user.
    # Formerly, we sorted() the kwds before looping.  The new way is *much*
    # faster; however, it means that f(x=1, y=2) will now be treated as a
    # distinct call from f(y=2, x=1) which will be cached separately.
    key = args
    if kwds:
        key += kwd_mark
        for item in kwds.items():
            key += item
    if typed:
        key += tuple(type(v) for v in args)
        if kwds:
            key += tuple(type(v) for v in kwds.values())
    elif len(key) == 1 and type(key[0]) in fasttypes:
        return key[0]
    return _HashedSeq(key)

원본 코드: https://github.com/python/cpython/blob/3.10/Lib/functools.py

이미지 REF
https://realpython.com/lru-cache-python/

Matthew Woo

지속가능하고 안정적인 시스템을 만들고자 합니다.

python lru cache 코드 살펴보기

Python

0개의 댓글