[PAPER REVIEW] PS Prompting

SOOH·2024년 5월 12일

LLMs

목록 보기
3/3

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

https://arxiv.org/pdf/2305.04091.pdf


Background

Zero-shot-CoT의 단점

  1. Calculation errors
  2. missing-step errors
    this occur when some intermediate reasoning step(s) is missed-out (especially complex&multi-step reasoning)
    SOL) PS prompting
  3. semantic misunderstanding errors

관련 개념

  • pre-trained language models (PTMs) ↔ LLMs :no access to model parameters

PS Prompting

[ to solve 2. the missing-step errors ]

Let’s first understand the problem and devise a plan to solve the problem. Then, let’s carry out the plan and solve the problem step by step

  1. devise a plan to divide the entire task into smaller subtasks
  2. carrying out the subtasks according to the plan.

[ to solve 1. calculation errors and improve the quality of generated reasoning steps ]

PS+ prompting(more detailed instruction)

extract relevant variables and their corresponding numerals

calculate intermediate results (pay attention to calculation and commonsense)

arithmetic reasoning에서 PS+ Prompting은 8-shot CoT prompting과 비슷한 성능을 보임

[ Two steps to Zero-shot PS prompting ]

  1. make an inference using the proposed prompting template → generate the reasoning process and the answer to a problem

    1. subtask를 만들어내고 accomplish하도록 하기
      → “Let’s first understand the problem and devise a plan to solve the problem. Then, let’s carry out the plan and solve the problem step by step” in template of Answering ( A:[T] )
    2. LLM이 계산에 더 집중하며, 중간과정의 결과를 정확하게 도출해낼 수 있도록 하기
      → “pay attention to calculation” + “extract relevant variables and their corresponding numerals

      💡 [ hypothesis ]
      if the LLMs leave out the relevant and important variables, it is more likely to miss out relevant reasoning steps.

    3. LLM의 reasoning step에 대한 성능 향상을 위해..
      → “calculate intermediate results
  2. extract the answer for evaluation by using the answer extraction prompting

    → ”Therefore, the answer # is

[ Experiments ]

[ Benchmarks ]

evaluate PS-prompting on the !ten! benchmark datasets from !three! categories of reasoning problems.

  1. Arithmetic Reasoning
    1. GSM8K dataset : high quality linguistically diverse grade school math word problems (created by human problem writers)
    2. SVAMP benchmark : one-unknown arithmetic word problems for up-to-4 grade level students by making simple changes to a set of problems (from another existing dataset)
    3. MultiArith dataset : math word problems requiring multiple reasoning steps and operations
    4. AddSub dataset : addition and subtraction arithmetic word problems
    5. AQUA dataset : algebraic word problems with natural language rationales
    6. SingleEq dataset : single-equation grade-school algebra word problems with multiple math operations over non-negative rational numbers and one variable
  2. Commonsense Reasoning
    1. CSQA benchmark dataset : multiple-choice questions that require different types of commonsense knowledge to obtain the correct answers
    2. StrategyQA benchmark dataset : questions requiring multi-step reasoning but the reasoning steps are not given.
  3. Symbolic Reasoning
    1. Last Letter Concatenation dataset : questions requiring the last letters of words in a name to be concatenated
    2. Coin Flip dataset : questions on whether a coin is still heads up after it is flipped or not flipped based on steps given in the questions

[ Baselines ]

  1. Zero-shot baselines

    • Zero-shot CoT(“Let’s think step by step”)
    • Zero-shot PoT(uses LLM mainly OpenAI Codex to generate a Python pro- gram and then derive an answer by executing the generated program on a Python interpreter)
  2. Few-shot with manual demonstrations

    Manual-CoT creates eight hand-crafted examples as demonstrations.

  3. Few-shot with automatic demonstrations

    Auto-CoT automatically selected examples by clustering with diversity and generates reasoning chains using zero-shot-CoT to construct demonstrations.

0개의 댓글