[LLM] OpenAI Realtime API Function Calling 구현 예시

Sngmng·2026년 2월 13일

LLM

개요

본 글은 OpenAI Realtime API를 사용하여
WebSocket 환경에서 function calling을 처리하는 최소 예제를 정리한 글이다.

Speech-to-Speech 전체 구현이 목적이 아니라,
Realtime API에서 function calling이 어떤 흐름으로 동작하는지 이해하는 것을 목표로 한다.

참고 자료

1. Realtime API Function Calling 전체 흐름

Realtime API에서 function calling은 이벤트 기반으로 동작한다.

전체 흐름은 다음과 같다.

Client (WebSocket)
└─ response.create (tools 전달)
↓
LLM이 함수 호출 필요 여부 판단
↓
response.function_call.delta
↓
response.function_call.arguments.done
↓
로컬 함수 실행
↓
response.function_call.output

2. WebSocket Client 구현

OpenAI Realtime API는 WebSocket 연결을 사용한다.

import json
import os
import time
import logging
from websocket import create_connection, WebSocketConnectionClosedException
from dotenv import load_dotenv
from tools import TOOLS

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
load_dotenv()

class Socket:
    def __init__(self, api_key, ws_url):
        self.api_key = api_key
        self.ws_url = ws_url
        self.ws = None

    def connect(self):
        self.ws = create_connection(
            self.ws_url,
            header=[
                f"Authorization: Bearer {self.api_key}",
                "OpenAI-Beta: realtime=v1"
            ]
        )
        logging.info("WebSocket 연결 완료")

    def send(self, data):
        try:
            self.ws.send(json.dumps(data))
        except WebSocketConnectionClosedException:
            logging.error("WebSocket 연결 종료")
        except Exception as e:
            logging.error(f"Send 오류: {e}")

    def recv(self):
        try:
            return self.ws.recv()
        except WebSocketConnectionClosedException:
            logging.error("WebSocket 연결 종료")
        except Exception as e:
            logging.error(f"Recv 오류: {e}")
        return None

    def close(self):
        try:
            self.ws.close()
            logging.info("WebSocket 종료")
        except Exception as e:
            logging.error(f"Close 오류: {e}")

3. 함수 스키마 (Function Schema) 정의

Function schema는 LLM이 호출할 수 있는 함수 정보를 JSON 형식으로 정의한 명세이다.
모델은 이 정보를 기반으로 어떤 함수를 호출할지와 어떤 인자를 전달할지를 판단한다.

구성 요소

name : 모델이 호출할 함수 이름
description : 함수의 역할 설명 (호출 여부 판단에 사용)
parameters : 함수 입력 파라미터 정의
required : 필수 파라미터 지정

TOOLS = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "지정된 위치의 현재 날씨를 조회한다.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "도시 이름 (예: Seoul, South Korea)"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius"],
                    "description": "온도 단위"
                }
            },
            "required": ["location"]
        }
    }
]

4. LLM에 Tool(Function) 등록

response.create 이벤트를 통해 LLM에게
실제로 사용할 수 있는 function 목록을 전달한다.

socket.send({
    "type": "response.create",
    "response": {
        "modalities": ["text"],
        "instructions": "서울 날씨 알려줘",
        "tools": TOOLS
    }
})

5. 함수 호출 처리 (Function Call Handling)

LLM이 대화를 분석한 결과,
사전에 등록된 tool(function)을 호출해야 한다고 판단하면
Realtime API에서는 이벤트 기반으로 함수 호출이 진행된다.

5-1. response.function_call.delta

함수 호출이 시작될 때 스트리밍 형태로 전달되는 이벤트
함수 이름(name)과 arguments가 조각(delta) 단위로 전달됨
이 단계에서는 실제 실행을 하지 않고, 로그 확인 용도로만 사용

if event_type == "response.function_call.delta":
    delta = parsed.get("delta", {})
    name = delta.get("name")
    arguments = delta.get("arguments")
    logging.info(f"🛠️ Tool 호출 감지: {name}, args={arguments}")

5-2. response.function_call.arguments.done

함수 호출에 필요한 arguments 전달이 모두 완료되었을 때 발생
이 시점부터 실제 Python 함수 실행이 가능
call_id는 이후 결과를 LLM에게 전달할 때 필요

elif event_type == "response.function_call.arguments.done":
    call_id = parsed.get("call_id")
    arguments = json.loads(parsed.get("arguments", "{}"))
    name = parsed.get("name")

    logging.info(
        f"✅ Tool 실행 준비: {name}, call_id={call_id}, args={arguments}"
    )

    output = run_tool(name, arguments)

    socket.send({
        "type": "response.function_call.output",
        "call_id": call_id,
        "output": output
    })

    logging.info(f"📤 Tool 결과 전송: {output}")

5-3. response.function_call.output

로컬에서 실행한 함수 결과를 LLM에게 다시 전달
LLM은 해당 결과를 컨텍스트로 활용해 최종 응답을 생성
call_id가 일치하지 않으면 LLM이 결과를 인식하지 못함

6. 전체 이벤트 흐름 정리

Realtime API 기반 Function Calling의 전체 흐름은 다음과 같다.

response.create
- instructions와 tools 등록
response.function_call.delta
- 함수 호출 시작 (스트리밍)
response.function_call.arguments.done
- 함수 이름 및 arguments 확정
로컬 Python 함수 실행
- run_tool()을 통해 실제 함수 호출
response.function_call.output
- 실행 결과를 LLM에게 전달
LLM 최종 응답 생성

정리

Realtime API의 Function Calling은 이벤트 기반(Event-driven) 구조로 동작한다.

Chat Completions API와 달리
함수 호출 과정이 여러 이벤트로 분리되어 스트리밍 형태로 전달된다.

핵심 포인트

tools는 response.create 시점에 반드시 등록해야 함

함수 실행은 response.function_call.arguments.done 이후에 수행

실행 결과는 response.function_call.output 이벤트로 반환

call_id는 요청과 응답을 매칭하기 위한 필수 값

이런 경우에 사용

Speech-to-Speech 기반 Agent

실시간 사용자 인터랙션

Tool 호출 과정을 세밀하게 제어해야 하는 경우

WebSocket 기반 장시간 연결 서비스

Sngmng

"Engineering Notes" → Here "Research Notes" → https://lifes-ng.tistory.com/ "Code" → github.com/sngmng6506