논문: https://arxiv.org/abs/2409.03271v1?utm_source=substack&utm_medium=email

기존문제:CoT는 LLM의 추론 능력을 향상시키는 데 널리 성공적이었으나, 생성된 추론 경로의 품질이 일관되지 않아 복잡한 추론 작업에서 성능이 최적화되지 않는 문제가 있었습니다.
새로운 방법론: 이 문제를 해결하기 위해, 저자들은 새로운 방법론인 Strategic Chain-of-Thought (SCoT)를 제안합니다. SCoT는 문제 해결 전략을 먼저 도출한 후 이를 활용해 더 높은 품질의 추론 경로와 정확한 답변을 생성하는 두 단계의 접근 방식을 사용합니다.
방법(Method):

While CoT has been successful in improving LLM reasoning, it often suffers from inconsistent quality in the reasoning paths generated, leading to suboptimal performance in complex reasoning tasks.
SCoT introduces a two-stage process that integrates strategic knowledge to improve reasoning accuracy:
①Strategy Elicitation: In the first stage, the model identifies and elicits a problem-solving strategy before generating the reasoning path. This involves recognizing effective methods or principles that guide reasoning towards a correct and stable solution. For example, in mathematical reasoning, instead of adding numbers sequentially, the model may identify the arithmetic sequence sum formula as a more efficient method. This step reduces the likelihood of errors by guiding the model to follow a more structured and optimal approach.
②Answer Generation: After eliciting the strategy, the model uses it to guide the generation of the reasoning path and produce the final answer. By applying the identified strategy, the model can generate high-quality CoT paths, leading to more accurate and stable results.
SCoT distinguishes itself from traditional CoT methods by being a single-query approach that does not rely on external knowledge sources or multiple queries, making it more efficient and resource-effective. Other methods, such as voting-based or retrieval-augmented generation (RAG), often require multiple steps or external data, whereas SCoT eliminates these complexities by incorporating strategy-based reasoning directly into the model.
The effectiveness of SCoT is demonstrated in various domains, including mathematical, commonsense, physical, spatial, and multi-hop reasoning tasks. For example, experiments using the Llama3-8B model showed a 21.05% improvement on the GSM8K dataset and a 24.13% improvement on the Tracking Objects dataset. Moreover, SCoT was extended into a few-shot version that automatically matches relevant demonstrations based on the elicited strategy, further enhancing performance in reasoning tasks.
In summary, SCoT enhances the quality of reasoning in LLMs by eliciting and applying strategic knowledge within a single prompt, resulting in more accurate and reliable outputs for complex reasoning tasks.


Single Query 및 Multiple-query에 대한 설명:
In the context of the SCoT (Strategic Chain-of-Thought) method and traditional CoT (Chain-of-Thought) methods, the terms "single-query" and "multiple queries" refer to how many times the language model is asked to generate a reasoning path or answer during the problem-solving process.
A single-query approach means that the language model is asked to solve the problem in one step. All the reasoning and answer generation happens in a single interaction with the model. In the case of SCoT, the model first elicits a problem-solving strategy and then directly applies that strategy to generate the reasoning path and final answer within one prompt.
Example of a Single-query Approach (SCoT):
A multiple-query approach means that the language model is asked to generate reasoning paths or answers multiple times and then the results are combined to reach a final answer. This is often done to improve accuracy by generating various solutions and then selecting the most consistent or reliable one, but it can be computationally expensive.
Example of a Multiple-query Approach (Traditional CoT):
Key Difference:
- In single-query (SCoT), the model generates the reasoning path and final answer in one step, which is efficient and fast.
- In multiple-query (traditional CoT), the model is asked to generate multiple possible solutions (often using different reasoning paths), and these solutions are then aggregated or selected for the final answer. This process can lead to higher accuracy but requires more computational resources and time.
The numbers in the table represent accuracy percentages. The accuracy is measured as follows:
3.Measurement process:
accuray개념에 대한 설명
Let's say we have a multiple-choice question from one of these datasets:
Question: "What is the capital of France?"
Options:
A) London
B) Berlin
C) Paris
D) Rome
The correct answer, or "gold choice," is C) Paris.
Now, let's say the AI model is given this question and it predicts that the answer is C) Paris.
To calculate accuracy:
We compare the model's prediction (C) with the correct answer (C).
In this case, they match, so this would count as a correct prediction.
Now, imagine we have 100 such questions in our dataset. The model answers all 100 questions, and gets 75 of them correct.
The accuracy would then be:
(Number of correct predictions / Total number of questions) 100
(75 / 100) 100 = 75%
So for this hypothetical dataset, we would say the model has an accuracy of 75%.
In the table from the paper, each number represents this kind of percentage - how often the model correctly answered questions in that particular dataset using that specific method. The higher the percentage, the more questions the model answered correctly.

The Strategic Chain-of-Thought (SCoT) approach is primarily evident in the "Workflow" section of this prompt template. Specifically, these elements demonstrate the SCoT approach:
These steps embody the two-stage SCoT process: first eliciting an effective problem-solving strategy (steps 1 and 2), and then using this strategy to guide the solution (step 3). This structured approach to problem-solving, focusing on identifying and applying efficient strategies, is what distinguishes SCoT from standard Chain-of-Thought prompting.