beforesunset.log

beforesunset.log

[LLM] Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in LLMs

nnnahyunnn·2024년 4월 7일

LLM_apply

목록 보기

2/2

저자: Anonymous ACL submission
연도: 2023
링크: https://openreview.net/pdf?id=lnPP2TO3jW7

목표

Mathematical reasoning 의 solve rate 를 Self-consistency 를 이용하여 계산하자
surface form 에 변화를 주어 solve rate 를 높여보자

설계

Self-Consistency-over-Paraphrase (SCoP): prompt 를 paraphrase 하여 surface form 을 변형해보자

SCoP

concept
- LLM 이 각 문제에 대해 K 개의 paraphrase 를 생성하도록 함.
- paraphrase 된 prompt 에 대해 N/K 개의 reasoning path 를 생성하도록 함.
  - 총 답변은 N 개로 고정
  - separates the effect of increasing the diversity of reasoning paths and the number of reasoning path
- 가장 consistent 한 것을 정답으로 채택.
  - paraphrase 와 reasoning 에 동일한 LLM 을 활용하여 다른 LLM 간의 cross-sharing of knowledge 효과를 차단함.
paraphrase
- Naive
  - 단순히 K 개의 Paraphrase 생성
  - original 형태보다 worse solve rate 의 paraphrase 도 있을 수 있음
- In-Context-Learning
  - original solve rate 보다 설정된 margin 만큼 solve rate 가 올린 paraphrase 만 채택.
  - test-time 에서 활용하기 어렵다는 단점이 있음.
reasoning
- zero-shot CoT: "let's think step by step."
- four-shot CoT

실험

dataset
- GSM8K, AQuA, MATH, MMLU
model
- LLaMA-2, GPT-3.5-turbo, GPT-4
Results
- Zero-shot CoT setting 에서 baseline 과의 비교
- Four-shot CoT setting 에서 ICL_parameter 로 paraphrase
논의
- Naive 방식을 이용하면 좋은 Paraphrase 와 나쁜 paraphrase 가 함께 이용되는데도 왜 성능이 올라가는가?
  - 좋은 Paraphrase 는 answer distribution 을 sharpen 하고 나쁜 Paraphrase 는 distribution 을 flatten 한다

-

이전 포스트

[LLM] TruthfulQA: Measuring How Models Mimic Human Falsehoods

0개의 댓글