Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

김지원·2024년 9월 10일

1일1논문

목록 보기

10/13

논문: https://arxiv.org/abs/2409.03271v1?utm_source=substack&utm_medium=email

요약

기존문제:CoT는 LLM의 추론 능력을 향상시키는 데 널리 성공적이었으나, 생성된 추론 경로의 품질이 일관되지 않아 복잡한 추론 작업에서 성능이 최적화되지 않는 문제가 있었습니다.

새로운 방법론: 이 문제를 해결하기 위해, 저자들은 새로운 방법론인 Strategic Chain-of-Thought (SCoT)를 제안합니다. SCoT는 문제 해결 전략을 먼저 도출한 후 이를 활용해 더 높은 품질의 추론 경로와 정확한 답변을 생성하는 두 단계의 접근 방식을 사용합니다.

방법(Method):

While CoT has been successful in improving LLM reasoning, it often suffers from inconsistent quality in the reasoning paths generated, leading to suboptimal performance in complex reasoning tasks.

SCoT introduces a two-stage process that integrates strategic knowledge to improve reasoning accuracy:

①Strategy Elicitation: In the first stage, the model identifies and elicits a problem-solving strategy before generating the reasoning path. This involves recognizing effective methods or principles that guide reasoning towards a correct and stable solution. For example, in mathematical reasoning, instead of adding numbers sequentially, the model may identify the arithmetic sequence sum formula as a more efficient method. This step reduces the likelihood of errors by guiding the model to follow a more structured and optimal approach.

②Answer Generation: After eliciting the strategy, the model uses it to guide the generation of the reasoning path and produce the final answer. By applying the identified strategy, the model can generate high-quality CoT paths, leading to more accurate and stable results.

SCoT distinguishes itself from traditional CoT methods by being a single-query approach that does not rely on external knowledge sources or multiple queries, making it more efficient and resource-effective. Other methods, such as voting-based or retrieval-augmented generation (RAG), often require multiple steps or external data, whereas SCoT eliminates these complexities by incorporating strategy-based reasoning directly into the model.

The effectiveness of SCoT is demonstrated in various domains, including mathematical, commonsense, physical, spatial, and multi-hop reasoning tasks. For example, experiments using the Llama3-8B model showed a 21.05% improvement on the GSM8K dataset and a 24.13% improvement on the Tracking Objects dataset. Moreover, SCoT was extended into a few-shot version that automatically matches relevant demonstrations based on the elicited strategy, further enhancing performance in reasoning tasks.

In summary, SCoT enhances the quality of reasoning in LLMs by eliciting and applying strategic knowledge within a single prompt, resulting in more accurate and reliable outputs for complex reasoning tasks.

Single Query 및 Multiple-query에 대한 설명:

In the context of the SCoT (Strategic Chain-of-Thought) method and traditional CoT (Chain-of-Thought) methods, the terms "single-query" and "multiple queries" refer to how many times the language model is asked to generate a reasoning path or answer during the problem-solving process.

□ Single-query Approach

A single-query approach means that the language model is asked to solve the problem in one step. All the reasoning and answer generation happens in a single interaction with the model. In the case of SCoT, the model first elicits a problem-solving strategy and then directly applies that strategy to generate the reasoning path and final answer within one prompt.

Example of a Single-query Approach (SCoT):

Suppose you ask the model to solve this math problem: "What is the sum of all integers between -26 and 24?"
In a single query, SCoT first identifies that the arithmetic sequence sum formula is the most efficient strategy. It then applies this formula directly and gives you the final answer.
The entire process of identifying the strategy and solving the problem happens within one query.

□ Multiple-query Approach

A multiple-query approach means that the language model is asked to generate reasoning paths or answers multiple times and then the results are combined to reach a final answer. This is often done to improve accuracy by generating various solutions and then selecting the most consistent or reliable one, but it can be computationally expensive.

Example of a Multiple-query Approach (Traditional CoT):

Let's use the same math problem: "What is the sum of all integers between -26 and 24?"
In a multiple-query CoT approach, you might ask the model several times to solve the problem, generating different reasoning paths in each query. For instance, the model might try pairing numbers in one attempt, and in another, it might try the arithmetic series formula.
After generating multiple different solutions (queries), you might combine or vote on the answers to find the most consistent or correct one.

Key Difference:

In single-query (SCoT), the model generates the reasoning path and final answer in one step, which is efficient and fast.

In multiple-query (traditional CoT), the model is asked to generate multiple possible solutions (often using different reasoning paths), and these solutions are then aggregated or selected for the final answer. This process can lead to higher accuracy but requires more computational resources and time.

The numbers in the table represent accuracy percentages. The accuracy is measured as follows:

For datasets with multiple-choice questions (MathQA, AQuA, MMLU, ARC, StrategyQA, CommonsenseQA, and Tracking Object):

Accuracy is calculated by comparing the model's predicted choice with the correct (gold) choice.
The percentage represents how often the model selects the correct answer out of all questions in the dataset.

For GSM8K, which has numerical answers:

Accuracy is determined by checking if the model's predicted numerical answer exactly matches the correct (gold) answer.
The percentage represents how often the model produces the exact correct numerical result.

3.Measurement process:

The paper mentions that for all experiments (except Self-Consistency), they conducted three independent inference runs and calculated the average results.
For Self-Consistency, due to its high computational cost, only a single inference was performed.

accuray개념에 대한 설명
Let's say we have a multiple-choice question from one of these datasets:

Question: "What is the capital of France?"
Options:
A) London
B) Berlin
C) Paris
D) Rome
The correct answer, or "gold choice," is C) Paris.
Now, let's say the AI model is given this question and it predicts that the answer is C) Paris.

To calculate accuracy:
We compare the model's prediction (C) with the correct answer (C).
In this case, they match, so this would count as a correct prediction.

Now, imagine we have 100 such questions in our dataset. The model answers all 100 questions, and gets 75 of them correct.
The accuracy would then be:
(Number of correct predictions / Total number of questions) 100
(75 / 100) 100 = 75%

So for this hypothetical dataset, we would say the model has an accuracy of 75%.
In the table from the paper, each number represents this kind of percentage - how often the model correctly answered questions in that particular dataset using that specific method. The higher the percentage, the more questions the model answered correctly.

The Strategic Chain-of-Thought (SCoT) approach is primarily evident in the "Workflow" section of this prompt template. Specifically, these elements demonstrate the SCoT approach:

Step 1 of the Workflow: "Analyze the problem and identify any relevant mathematical formulas, or approaches that might be helpful, and select the approaches that can solve the problem." This step is about eliciting strategic knowledge before solving the problem.
Step 2 of the Workflow: "Choose the most efficient and practical approach." This explicitly instructs the model to select a strategic approach, which is a key aspect of SCoT. The example given (using a summation formula instead of adding numbers one by one) illustrates the preference for more elegant and efficient problem-solving strategies.
Step 3 of the Workflow: "Solve the problem step by step following the selected approach carefully." This step applies the chosen strategy to solve the problem.

These steps embody the two-stage SCoT process: first eliciting an effective problem-solving strategy (steps 1 and 2), and then using this strategy to guide the solution (step 3). This structured approach to problem-solving, focusing on identifying and applying efficient strategies, is what distinguishes SCoT from standard Chain-of-Thought prompting.

김지원

이전 포스트

MemLong: Memory-Augmented Retrieval for Long Text Modeling(240909)

다음 포스트

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

1일1논문

요약

□ Single-query Approach

□ Multiple-query Approach

MemLong: Memory-Augmented Retrieval for Long Text Modeling(240909)

LLM 모델이 LLM 성능을 평가한다. LLM-as-a-judge 알아보기

0개의 댓글