[Claude AI Agent] Prompt caching

DH.J·2025년 1월 30일

Claude로 나만의 AI agents 만들기

목록 보기

4/4

Prompt caching

prompt에서 요청을 보내면 cache 메모리에 저장되고, 응답을 캐시하여 재사용한다.
추가적으로 prompt를 입력하면, 모든 토큰을 reprocess하지 않고 cache hit이 발생하여 cache로부터 읽게 된다.

cache hit이란?
cache 메모리에서 원하는 데이터를 성공적으로 찾은 경우.
캐시를 확인해서 특정 데이터가 있으면? cache hit, 없으면? cache miss.

caching을 통해 반복되는 더 많은 프롬프트를 입력할수록, API를 재호출하지 않고 빠르게 응답할 수 있다.
EX) FAQ시스템, 데이터 요약, 번역 서비스 등의 단순한 기능이 반복되는 작업.

참고로 cache는 영원하지 않다. (대략 5분 정도?)

이제, 파일 텍스트를 읽는 함수를 만들어 봅시다.

with open('files/frankenstein.txt', 'r') as file:
    book_content = file.read()
    
def make_cached_api_call():
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "<book>" + book_content + "</book>",
                    "cache_control": {"type": "ephemeral"}
                },
                {
                    "type": "text",
                    "text": "What happens in chapter 5?"
                }
            ]
        }
    ]

    start_time = time.time()
    response = client.messages.create(
        model=MODEL_NAME,
        max_tokens=500,
        messages=messages,
    )
    end_time = time.time()

    return response, end_time - start_time

실행해보면?

response1, duration1 = make_cached_api_call()
response1.usage

Usage(cache_creation_input_tokens=0, cache_read_input_tokens=108427, input_tokens=11, output_tokens=353)

duration1: 10.412367820739746

response2, duration2 = make_cached_api_call()
response2.usage

Usage(cache_creation_input_tokens=0, cache_read_input_tokens=108427, input_tokens=11, output_tokens=309)

duration2: 8.788889646530151

caching하지 않은 함수는 Non-cached time: 33.14 seconds로 나온다.
따라서 위 함수는 cache hit된 것을 알 수 있다.

DH.J

평생 질문하며 살고 싶습니다.

이전 포스트

[Claude AI Agent] Prompt caching

Claude로 나만의 AI agents 만들기

Prompt caching

[Claude AI Agent] Prompting으로 알고리즘 문제 해설 AI 만들기

0개의 댓글