OPRO Large Language Models as Optimizers

Southgiri·2025년 2월 3일

Paper Review

목록 보기

4/7

1. Introduction

Optimize prompt by utilizing LLM as optimizers

Describe the optimization problem in natural language,

then instruct the LLM to iteratively generate new solutions
based on the problem description and the previously found solutions

Optimizing the prompt for accuracy on a small training set is sufficient to reach high performance on the test set

Meta-prompt

Prompt to the LLM serves as a call to the optimizer

Two Core pieces

Previously generated prompts with their corresponding training accuracies
Optimization problem description
which includes several exemplars randomly selected from the training set

Each optimization step generates new prompts based on a trajectory of previously generated prompts

OPRO enables the LLM to gradually generate new prompts that improve the task accuracy, where the initial prompts have low task accuracies

2. OPRO

Each optimization step, LLM generates candidate solutions

New solutions are evaluated and added to the meta-prompt for the subsequent optimization process

Optimization process terminates when the LLM is unable to propose new solutions with better optimzation scores,

or a max number of optimzation steps

2.1. Desirables of Optimization

Task can be described with a high-level text summary

Trading off exploration and exploitation

Exploit promising areas of the search space where good solutions are already found
Also Exploring new regions of the search space so as to not miss potentially better solutions

2.2. Meta prompt Design

Optimization problem description

Text description of the optimization problem, including the objective function and solution constraints

Optimization trajectory

Optimization trajectory includes past solutions and their optimzation scores,
sorted in the ascending order
Allow the LLM to construct potentially better ones without the need of explicitly defining how the solution should be updated

2.3. Solution Generation

Optimization stability

Not all solutions achieve high scores and monotonically improve over prior ones
To improve stability, LLM generate multiple solutions at each optimization step,
allowing the LLM to simultaneously explore multiple possibilities and quickly discover

Exploration-exploitation trade off

A lower temperature encourages the LLM to exploit the solution space
While a high temperature allow the LLM to more aggressibely explore

4. Application : Prompt Optimization

4.1. Problem Setup

Both input and output are in the text format
Training set is used to calculate the training accuracy as the objective value during the optimzation process
Compute the test accuracy on the test set after the optimization finishes
Small number of fraction of training samples is sufficient
Scorer LLM
- LLM for objective function evaluation
Optimizer LLM
- LLM for optimization
- Output is an instruction

Example of the meta-prompt

Blue : solution-score pairs
Purple : optimization task and output format
Orange : meta-instructions

4.2. Meta-prompt Design

Optimization problem examples

Problem description includes a few examples taken from the training set
Essential for the optimizer LLM to generate instructions of the same style
Each optimization step, add several training examples to the meta-prompt

Optimization trajectory

Inclues instructions generated from the past optimization steps, along with scores
- Sorted by the score in ascending order
Only keep instructions with the highest scores

Meta-instructions

Explain the optimization goal and instruct the model how to use the above information

5. Prompt Optimization Experiments

5.1. Evaluation Setup

Implementation details

Temperature 0 when evaluating
Default : 1.0 for optimizer LLM
Each optimization step, generate 8 instructions
Meta prompt contains best 20 instructions and 3 randomly picked exemplars from the training set

5.2. Main Results

At a point when the meta-prompt only triggers higher quality instructions, the leap happens

이전 포스트

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

다음 포스트

TimeCMA Towards LLM-Empowered Time Series Forecast

0개의 댓글