[๐Ÿ“–๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022)

Becky's Study Labยท2024๋…„ 12์›” 9์ผ
0

PaperReview

๋ชฉ๋ก ๋ณด๊ธฐ
24/26

์ตœ๊ทผ์— Prompting, Chain-of-Thought๋ฅผ ํ™œ์šฉํ•œ Few-shot reasoning์„ ํ†ตํ•œ ๋…ผ๋ฌธ์„ ๋‚ด๋ ค๊ณ  ์ž‘์—…ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด์„œ ์ œ๋Œ€๋กœ ๊ผผ๊ผผํžˆ ์ฝ์œผ๋ ค๊ณ  ๋ณด๊ณ  ์žˆ๊ณ , ์ •๋ฆฌํ•ด ๋ณด์•˜๋‹ค.

๋ณธ ๋…ผ๋ฌธ์€ NeurIPS 2022 Main Conference Track์— publish ๋œ ๋…ผ๋ฌธ์œผ๋กœ ์•„๋งˆ ๋งŽ์€ ๋ถ„๋“ค์ด CoT๋ผ๊ณ  ์•Œ๊ณ  ์žˆ๋Š” ๋…ผ๋ฌธ์ด๋‹ค.

โœ’๏ธChain-of-Thought Prompting Elicits Reasoning in Large Language Models(โญNeurIPS-2022-Main)

43ํŽ˜์ด์ง€๊ฐ€ ๋˜๋Š” Appendix๊นŒ์ง€ ๋งค์šฐ ์ž์„ธํ•˜๊ฒŒ ์ •๋ฆฌ๋œ ๋…ผ๋ฌธ์œผ๋กœ์„œ ๊ณต๋ถ€ํ•˜๋Š” ๋ถ„๋“ค์€ ํ•˜๋‚˜ํ•˜๋‚˜ ๋ฒˆ์—ญํ•˜๋ฉด์„œ ๋๊นŒ์ง€ ์ฝ์–ด๋ณด๊ธธ ์ถ”์ฒœํ•œ๋‹ค. ํŠนํžˆ Appendix์— ์‹ค์ œ๋กœ ์–ด๋–ป๊ฒŒ few-shot exampler text(์œ„์˜ ๊ทธ๋ฆผ์—์„œ ํŒŒ๋ž‘์ƒ‰ ๋ถ€๋ถ„)์„ ๊ตฌ์„ฑํ–ˆ๋Š”์ง€ ๋‹ค ๊ณต๊ฐœํ•ด์„œ ์ •๋ง ์ข‹์•˜๋‹ค.


Abstract

[์ฃผ์š” ๋ฐฉ๋ฒ•๋ก ] Chain of Thought (CoT)

  • ์ค‘๊ฐ„ ์ถ”๋ก  ๋‹จ๊ณ„๋ฅผ ์ œ๊ณตํ•ด few-shot inference ์‹œ ์ถ”๋ก ์˜ ๋ฐฉํ–ฅ์„ฑ์„ ๊ฐ„์ ‘์ ์œผ๋กœ ์ „๋‹ฌํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ, ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋ณต์žกํ•œ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ.
  • ๋ช‡ ๊ฐœ์˜ Chain of Thought ์˜ˆ์ œ ํ…์ŠคํŠธ์ธ "Exemplars"๋ฅผ ํ”„๋กฌํ”„ํŠธ์— ์ œ๊ณตํ•˜์—ฌ ๋ชจ๋ธ์˜ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”

[์‹คํ—˜ ๊ฒฐ๊ณผ]

  1. ์„ธ ๊ฐ€์ง€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์—์„œ ์‹คํ—˜ ์ง„ํ–‰ : GPT-3(InstructGPT) - [350M, 1.3B, 6.7B, 175B]
  2. Chain of Thought Prompting์ด Arithmetic Reasoning, Commonsense Reasoning, Symbolic Reasoning ์ž‘์—…์—์„œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์ž„
  3. ์—ฌ๋Ÿฌ Benchmark์—์„œ Stardard Prompt๋ฅผ ๋Šฅ๊ฐ€ํ•จ
    • PaLM 540B ๋ชจ๋ธ์— 8๊ฐœ์˜ ์ฒด์ธ ์˜ค๋ธŒ ์˜ํŠธ ์˜ˆ์ œ๋ฅผ ํ”„๋กฌํ”„ํŒ…์œผ๋กœ ์ œ๊ณต
    • ์ˆ˜ํ•™ ๋‹จ์–ด ๋ฌธ์ œ(GSM8K) ๋ฒค์น˜๋งˆํฌ์—์„œ ์ตœ์‹  ์„ฑ๋Šฅ(SOTA)์„ ๋‹ฌ์„ฑ
    • Fine-tuned GPT-3 with a verifier๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ์ •ํ™•๋„

Chain-of-Thought Prompting : ใ€ˆinput, chain of thought, output>

์œ„์˜ ํ•˜์ด๋ผ์ดํŠธ๋œ ๋ถ€๋ถ„์ด chain of thought์— ํ•ด๋‹น๋˜๋Š” prompt text์ด๋‹ค. ์œ„์™€ ๊ฐ™์€ ํ…์ŠคํŠธ๋“ค์€ ๋ชจ๋ธ์ด ์ถ”๋ก ํ•˜๋Š” ๊ณผ์ •์—์„œ ๊ทธ ์ถ”๋ก  ๊ณผ์ •์„ ์œ ๋„ํ•˜๋„๋ก ํ•˜๋Š” ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ๋‹ค.

Chain-of-Thought ํ”„๋กฌํ”„ํŠธ์˜ ์ฃผ์š” ํŠน์ง• ๋ฐ ์žฅ์ 

1. ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ์ค‘๊ฐ„ ๋‹จ๊ณ„๋กœ ๋ถ„ํ•ด ๊ฐ€๋Šฅ
์ฒด์ธ ์˜ค๋ธŒ ์†ŒํŠธ๋Š” ๋‹ค๋‹จ๊ณ„ ๋ฌธ์ œ๋ฅผ ์ค‘๊ฐ„ ๋‹จ๊ณ„๋กœ ๋ถ„ํ•ดํ•˜๋„๋ก ๋ชจ๋ธ์„ ์œ ๋„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ถ”๊ฐ€์ ์ธ ๊ณ„์‚ฐ ๋ฆฌ์†Œ์Šค๋ฅผ ํ• ๋‹นํ•˜์—ฌ ๋” ๋ณต์žกํ•œ ์ถ”๋ก ์ด ํ•„์š”ํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
2. ๋ชจ๋ธ์˜ ํ–‰๋™์„ ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋Š” ์ฐฝ ์ œ๊ณต
์ฒด์ธ ์˜ค๋ธŒ ์†ŒํŠธ๋Š” ๋ชจ๋ธ์ด ํŠน์ • ๋‹ต์— ๋„๋‹ฌํ•œ ๊ฒฝ๋กœ๋ฅผ ํ•ด์„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ถ”๋ก  ๊ณผ์ •์—์„œ ์–ด๋””์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ๋Š”์ง€ ๋””๋ฒ„๊น…ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. (๋‹ค๋งŒ, ๋ชจ๋ธ์˜ ๊ณ„์‚ฐ ๊ณผ์ •์„ ์™„์ „ํžˆ ํŠน์„ฑํ™”ํ•˜๋Š” ๊ฒƒ์€ ์—ฌ์ „ํžˆ ํ•ด๊ฒฐ๋˜์ง€ ์•Š์€ ๊ณผ์ œ์ž…๋‹ˆ๋‹ค.)
3. ๋‹ค์–‘ํ•œ ์ž‘์—…์— ์ ์šฉ ๊ฐ€๋Šฅ
์ฒด์ธ ์˜ค๋ธŒ ์†ŒํŠธ ์ถ”๋ก ์€ ์ˆ˜ํ•™ ๋ฌธ์ œ, ์ƒ์‹์  ์ถ”๋ก , ์‹ฌ๋ณผ๋ฆญ ์กฐ์ž‘๊ณผ ๊ฐ™์€ ์ž‘์—…์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์›์น™์ ์œผ๋กœ ์ธ๊ฐ„์ด ์–ธ์–ด๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ์ž‘์—…์— ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4. ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์—์„œ ๊ฐ„๋‹จํžˆ ํ™œ์šฉ ๊ฐ€๋Šฅ
์ถฉ๋ถ„ํžˆ ํฐ ํฌ๊ธฐ์˜ ๊ธฐ์กด ์–ธ์–ด ๋ชจ๋ธ์— ์ฒด์ธ ์˜ค๋ธŒ ์†ŒํŠธ ์˜ˆ์ œ๋ฅผ ํฌํ•จ์‹œํ‚ค๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ์ด๋Ÿฌํ•œ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ์‰ฝ๊ฒŒ ์œ ๋„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


Experiment

โœ… Model + Setting

  • ์‚ฌ์šฉํ•œ pre-trained LLM ์ข…๋ฅ˜ (ํฌ๊ธฐ ๋‹ค์–‘)
    • Instruct GPT(GPT3) [350M, 1.3B, 6.7B, 175B]
    • LaMDA [422M, 2B, 8B, 68B, 137B]
    • PaLM [8B, 62B, 540B]
    • UL2 [20B]
    • Codex
  • ๋ชจ๋“  ๊ฒฝ์šฐ์— ๋Œ€ํ•ด์„œ, greedy decoding ์‚ฌ์šฉ, ๋‹ค์Œ ํ† ํฐ์ด ๋  ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์€ ํ† ํฐ์„ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•จ

Standard prompting & Chain-of-thought prompting

์‚ฐ์ˆ ์ถ”๋ก  Task์— CoT๋ฅผ ์ ์šฉํ•ด ์‹คํ—˜ํ•˜์˜€๊ณ , SoTA ๋‹ฌ์„ฑ์„ ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.๋…ผ๋ฌธ์—์„œ๋Š” Few-Shot Prompt๋ฅผ Standard Prompting์ด๋ผ๊ณ  ๋ถ€๋ฅด๋ฉด์„œ Base Prompt๋กœ ํ•˜์˜€๋‹ค. ๊ทธ๋ฆฌ๊ณ  CoT Prompt๋ฅผ ์ถ”๊ฐ€ํ•œ ๊ฒฝ์šฐ๋ฅผ Chain-of-thought Prompting์ด๋ผ๊ณ  ํ•˜์—ฌ ์‹คํ—˜์„ ํ•˜์˜€๋‹ค.

์œ„์˜ ๊ฒฝ์šฐ๊ฐ€ 1shot์˜ ์˜ˆ์‹œ, ์ฆ‰ input-output ์˜ˆ์‹œ๊ฐ€ 1๊ฐœ ๋“ค์–ด๊ฐ„ ๊ฒฝ์šฐ์ด๋‹ค. ์ด๋ ‡๊ฒŒ ๊ตฌ์„ฑํ•œ ๊ฒฝ์šฐ๊ฐ€ ๊ฐ€์žฅ Base Prompt์ด๋‹ค.

์œ„์˜ ๊ฒฝ์šฐ๊ฐ€ ๊ฐ™์€ ์ผ๋‹จ 1shot ์ด์ง€๋งŒ, chain-of-thought text๊ฐ€ ๋“ค์–ด๊ฐ„ ๊ฑฐ๋กœ Chain-of-thought Prompting์ด๋‹ค.

๐Ÿค” Chain-of-thought Prompt์— ํ•ด๋‹น๋˜๋Š” ์ถ”๋ก  ๊ณผ์ •์„ ๋‹ด์€ ์˜ˆ์‹œ, text๋Š” ์–ด๋–ป๊ฒŒ ๋งŒ๋“ ๊ฑฐ์ง€? ์–ด๋””์„œ ์™”์„๊นŒ?

์ €๋ ‡๊ฒŒ ์ถ”๋ก ์„ ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•ด chain-of-thought text๋ฅผ ๋„ฃ์–ด์ฃผ๋Š” ๊ฑด ์•Œ๊ฒ ๋Š”๋ฐ, ๋ฌธ์ œ๋Š” "์–ด๋””์„œ ์ € CoT Text๋ฅผ ๊ฐ€์ ธ์™”์„๊นŒ?"๋ผ๋Š” ๊ฑฐ๋‹ค. ์‹ค์ œ๋กœ CoT ๋ฐฉ๋ฒ•๋ก ์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ๋ฅผ ์ƒ๊ฐํ•˜๋”๋ผ๋„ ์ € CoT Text๋ฅผ ๊ตฌํ•˜๋Š”๊ฒŒ ๋ฌธ์ œ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
์ •๋‹ต์€ ๋…ผ๋ฌธ์— ์ ํ˜€ ์žˆ๋Š”๋ฐ, ๊ทธ๋ƒฅ ์‚ฌ๋žŒ์ด ์ง์ ‘ ๋งŒ๋“ค์—ˆ๋‹ค(manually composed)๋ผ๊ณ  ์ ํ˜€์žˆ๋‹ค. ์‹ค์ œ๋กœ 8๊ฐœ์˜ ์˜ˆ์‹œ์— ๋Œ€ํ•œ CoT Text๋ฅผ ๋งŒ๋“ค์—ˆ๋Š”๋ฐ Appendix์— ์•„๋ž˜์™€ ๊ฐ™์ด ๊ธฐ์žฌ๋˜์–ด ์žˆ์—ˆ๋‹ค.

์œ„์˜ 8๊ฐœ์˜ example์— ์žˆ๋Š” CoT ์˜ˆ์‹œ๊ธ€์„ ๋งŒ๋“ค์–ด์„œ 8shot ์œผ๋กœ prompt์— ๋„ฃ์–ด์ค€ ๊ฒƒ์ด๋‹ค. ์‚ฐ์ˆ ์ถ”๋ก ์— ํ•ด๋‹น๋˜๋Š” benchmark๋Š” ๋ชจ๋‘ ์ € 8๊ฐœ์˜ few shot eaxampler ๊ฐ€ ๋“ค์–ด๊ฐ„๊ฑฐ๋‹ค

๋ฌผ๋ก  AQuA๊นŒ์ง€ ํ•ด์„œ 4๊ฐœ ๋” ์žˆ๋‹ค.


(1) Arithmetic Reasoning

์šฐ๋ฆฌ๊ฐ€ ์•„๋Š” ์‚ฐ์ˆ  ๋ฌธ์ œ์ธ๋ฐ, ์•ฝ๊ฐ„ ์ˆ˜์‹์ ์ธ ๋Š๋‚Œ๋ณด๋‹ค๋Š” ํ…์ŠคํŠธ๋กœ ์ƒํ™ฉ์„ ๋งํ•˜๋ฉด ๊ณ„์‚ฐ๋œ ๋‹ต์„ ๋งํ•˜๋„๋ก ํ•˜๋Š” QAํ˜•์‹์˜ ๋ฐ์ดํ„ฐ์…‹์„ ์ฃผ๋กœ ๊ฐ€์ง€๊ณ  ์‹คํ—˜ํ–ˆ๋‹ค.

Benchmark

์•„๋ž˜์™€ ๊ฐ™์€ Benchmark ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

Results

์ผ๋‹จ, ๋‹น์—ฐํžˆ CoT๋ฅผ ์ ์šฉํ•œ ํ”„๋กฌํ”„ํŠธ๊ฐ€ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ธ๊ฑด ๋งž๋‹ค. ํ•˜์ง€๋งŒ, ์ด ๋…ผ๋ฌธ์˜ ์‚ฐ์ˆ  ์ถ”๋ก  ์‹คํ—˜ ๊ฒฐ๊ณผ์—์„œ ๊ผญ ๋ด์•ผ ํ•  ๋ถ€๋ถ„์ด ์žˆ๋‹ค.

"First, Figure 4 shows that chain-of-thought prompting is an emergent ability of model scale (Wei et al., 2022b). That is, chain-of-thought prompting does not positively impact performance for small models, and only yields performance gains when used with models of โˆผ100B parameters. We qualitatively found that models of smaller scale produced fluent but illogical chains of thought, leading to lower performance than standard prompting."

โžœ 100B ์ด์ƒ์˜ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ์— ํ•œ ํ•ด์„œ, CoT Prompting์ด Standard Prompting ๋ณด๋‹ค ํšจ๊ณผ๊ฐ€ ์žˆ๋‹ค๋Š” ๊ฒƒ
โžœ CoT Prompting์€ ํฐ LLM ๋ชจ๋ธ์— ์ ์šฉํ•œ ๊ฒฝ์šฐ์— ์‚ฐ์ˆ  ์ถ”๋ก ์„ ํ•˜๋Š”๋ฐ ์ข‹์€ ํผํฌ๋จผ์Šค๋ฅผ ๋ณด์ธ๋‹ค๋Š” ์ 

"Second, chain-of-thought prompting has larger performance gains for more-complicated problems. For instance, for GSM8K (the dataset with the lowest baseline performance), performance more than doubled for the largest GPT and PaLM models. On the other hand, for SingleOp, the easiest subset of MAWPS which only requires a single step to solve, performance improvements were either negative or very small(see Appendix Table 3)."

โžœ CoT Prompting์€ ๋ณต์žกํ•œ, ๋” ์–ด๋ ค์šด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๊ฒฝ์šฐ์— ๊ฐ•์ ์ด ์žˆ์Œ

"Third, chain-of-thought prompting via GPT-3
175B and PaLM 540B compares favorably to prior state of the art, which typically finetunes a task-specific model on a labeled training dataset.
Figure 4 shows how PaLM 540B uses chain-ofthought prompting to achieve new state of the art on GSM8K, SVAMP, and MAWPS (though note that standard prompting already passed the prior best for SVAMP). On the other two datasets, AQuA and ASDiv, PaLM with chain-of-thought prompting reaches within 2% of the state of the art (Appendix Table 2)."

โžœ CoT Prompting์„ ์ ์šฉํ•œ PaLM(540B), GPT3(175B)๋Š” Fine-tuning ๋ชจ๋ธ๋ณด๋‹ค ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด SoTA๋ฅผ ๋‹ฌ์„ฑ

๋˜ํ•œ ์–ด๋–ค ๊ฒฝ์šฐ์— ์˜ค๋‹ต์„ ๋งํ–ˆ๋Š”์ง€๋„ ์ฒดํฌํ•˜๊ณ ์ž GSM8K ๋ฐ์ดํ„ฐ์…‹ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ๋ฝ‘์•„ ๋ณด์•˜๋Š”๋ฐ,

  • ์ •๋‹ต Tracking : 50๋ฌธ์ œ ์ค‘ ์ผ๋‹จ ์ •๋‹ต์€ 50๊ฐœ ๋‹ค ๋งž์•˜์ง€๋งŒ, ๊ทธ ์ค‘ 2๊ฐœ๋Š” ์šฐ์—ฐํžˆ ๋งž์€ ์ผ€์ด์Šค์˜€์Œ
  • ์˜ค๋‹ต Tracking : 50๋ฌธ์ œ ์ค‘ ์ผ๋‹จ ์˜ค๋‹ต์€ 50๊ฐœ๋กœ ๋‹ค ํ‹€๋ ธ๋Š”๋ฐ, 46%๋Š” ๊ณ„์‚ฐ ์˜ค๋ฅ˜, ๊ธฐํ˜ธ ๋งคํ•‘ ์˜ค๋ฅ˜, ์ถ”๋ก  ๋‹จ๊ณ„ missing ์ด์˜€๊ณ , 54%๋Š” semantic understanding or coherence ๋ผ๋Š”๋ฐ ์•ฝ๊ฐ„ ๋…ผ๋ฆฌ์ ์œผ๋กœ ๋ง์ด ์•ˆ๋˜๋Š” ์ถ”๋ก ์„ ํ•œ ๊ฒฝ์šฐ๋ผ๊ณ  ๋ณด๋ฉด ๋  ๊ฑฐ ๊ฐ™๋‹ค.

๋˜, ๋…ผ๋ฌธ์—์„œ๋„ ์„ค๋ช…ํ•˜์ง€๋งŒ PaLM 62B์—์„œ ํ‹€๋ ธ๋˜ ๋ฌธ์ œ๋Š” ๋ชจ๋ธ ์‚ฌ์ด์ฆˆ๋ฅผ ํ‚ค์›Œ์„œ PaLM 540B์—์„œ๋Š” ๋งž์ท„๋‹ค๊ณ  ํ•œ๋‹ค. ํŠนํžˆ one-step missing ์ด๋‚˜ semantic understanding error ๋“ค์„ ํฐ ๋ชจ๋ธ์€ ํ•ด๊ฒฐํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

Ablation Study

๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ prompting ๋„ ํ•ด๋ณด๋Š” ์‹คํ—˜์„ ํ–ˆ๋‹ค.

1) Equation only

  • ์ˆ˜์‹๋งŒ prompting์— ์ฃผ๋Š” ์‹คํ—˜
  • ์ข‹์ง€ ์•Š์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š”๋ฐ, ๊ฒฐ๊ตญ ์ž์—ฐ์–ด์ ์ธ ์ถ”๋ก  ๋‹จ๊ณ„๋ฅผ ๋„ฃ์–ด์ฃผ๋Š”๊ฒŒ ํ•ต์‹ฌ์ž„์„ ๋ณด์ž„

2) Variable compute only

  • CoT์˜ ์„ฑ๊ณต ์š”์ธ์ด ๋‹จ์ˆœํžˆ "๋” ๋งŽ์€ ๊ณ„์‚ฐ ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ" ๋•Œ๋ฌธ์ธ์ง€, ์•„๋‹ˆ๋ฉด "์ค‘๊ฐ„ ๊ณ„์‚ฐ ๊ณผ์ •์„ ์ž์—ฐ์–ด๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ"์— ์žˆ๋Š”์ง€๋ฅผ ๊ตฌ๋ถ„ํ•˜๋ ค๋Š” ๋ชฉ์ ์ž„
  • ๋ชจ๋ธ์—๊ฒŒ ์ฒด์ธ ์˜ค๋ธŒ ์ƒ๊ฐ ๋ฐฉ์‹ ๋Œ€์‹ , ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๋ฐ ํ•„์š”ํ•œ ๋ฌธ์ž ์ˆ˜๋งŒํผ์˜ ์  (.)์„ ์ถœ๋ ฅํ•˜๊ฒŒ ํ–ˆ๋Š”๋ฐ, ์˜ˆ๋ฅผ ๋“ค์–ด, ์–ด๋–ค ์ˆ˜์‹์„ ํ‘ธ๋Š” ๋ฐ ํ•„์š”ํ•œ ๋ฌธ์ž ์ˆ˜๊ฐ€ 10์ด๋ผ๋ฉด, ๋ชจ๋ธ์ด ..........(10๊ฐœ์˜ ์ )์„ ์ถœ๋ ฅํ•˜๊ฒŒ ํ•จ
  • ๊ฒฐ๊ณผ์ ์œผ๋กœ, ์ (.)๋งŒ ์ถœ๋ ฅํ•˜๋„๋ก ํ•œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๊ธฐ์กด์˜ ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ๊ณผ ๊ฑฐ์˜ ๋™์ผํ•˜๊ฒŒ ๋‚˜์™”๊ณ , ์ด ๊ฒฐ๊ณผ๋Š” ๋‹จ์ˆœํžˆ ๊ณ„์‚ฐ ๋ฆฌ์†Œ์Šค(์  ์ถœ๋ ฅ = ํ† ํฐ ์ˆ˜)๊ฐ€ ๋งŽ์•„์ง„๋‹ค๊ณ  ํ•ด์„œ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๋ผ๋Š” ์ ์„ ๋ณด์—ฌ์คŒ

3) Chain of thought after answer

  • CoT ๋ฐฉ์‹์ด "๋‹จ๊ณ„๋ณ„ ์‚ฌ๊ณ  ๊ณผ์ •"์ด ์•„๋‹ˆ๋ผ, ๋ชจ๋ธ์ด ์ด๋ฏธ ํ•™์Šตํ•œ ์ง€์‹์„ ๋” ์ž˜ ํ™œ์„ฑํ™”์‹œํ‚ค๋Š” ํšจ๊ณผ ๋•Œ๋ฌธ์ธ์ง€ ํ™•์ธํ•˜๊ณ ์ž ํ•จ
  • ์ด ์‹คํ—˜์—์„œ๋Š” CoT์˜ ์ˆœ์„œ๋ฅผ ๋ฐ”๊พธ์–ด, ๋‹ต์„ ๋จผ์ € ์ถœ๋ ฅํ•œ ๋’ค ์ฒด์ธ ์˜ค๋ธŒ ์ƒ๊ฐ์„ ์ž‘์„ฑํ•˜๋„๋ก ๋ชจ๋ธ์„ ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ๊ธฐ์กด ๋ฐฉ์‹: ๋ฌธ์ œ โ†’ CoT(์ค‘๊ฐ„ ๊ณ„์‚ฐ) โ†’ ์ตœ์ข… ๋‹ต
    • ๋ณ€ํ˜• ๋ฐฉ์‹: ๋ฌธ์ œ โ†’ ์ตœ์ข… ๋‹ต โ†’ CoT(์ค‘๊ฐ„ ๊ณ„์‚ฐ)
  • ๋‹ต์„ ๋จผ์ € ์ถœ๋ ฅํ•˜๊ณ  CoT๋ฅผ ๋’ค์— ์ž‘์„ฑํ•œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๊ธฐ์กด ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ๊ณผ ๊ฑฐ์˜ ๋™์ผํ•˜๊ฒŒ ๋‚˜์™”๋Š”๋ฐ,CoT ๋ฐฉ์‹์ด ๋‹จ์ˆœํžˆ ์‚ฌ์ „ ํ•™์Šต๋œ ์ง€์‹์„ ํ™œ์„ฑํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๋„๊ตฌ๋กœ๋งŒ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์€ ์•„๋‹˜

Robustness of Chain of Thought

  • 8๊ฐœ์˜ CoT prompting์˜ ๊ธ€์„ ์ž‘์„ฑํ•œ annotator๋ฅผ ๋ณ€๊ฒฝํ•ด๋„, ์ผ๋‹จ ๊ธฐ์กด Standard Prompting ๋ณด๋‹ค๋Š” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ๋ชจ๋“  ๊ฒฝ์šฐ์—์„œ ๋ณด์ธ๋‹ค, ๊ทธ๋ ‡๊ธฐ์— ์–ธ์–ด์  ์Šคํƒ€์ผ์— ํฌ๊ฒŒ ์˜์กดํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•จ

(2) Commensense Reasoning

์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ์ง€์‹์„ ๋ฐ”ํƒ•์œผ๋กœ, ์ผ์ƒ ์ƒํ™œ์—์„œ ๊ฒฝํ—˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ„๋‹จํ•œ ๋ฌธ๋‹ต์„ ์–ด๋Š์ •๋„๋กœ ๋‹ต๋ณ€ํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ๋ณด๋Š” ๊ฑฐ๋‹ค.

Benchmark

์•„๋ž˜์™€ ๊ฐ™์€ Benchmark ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ–ˆ๋‹ค.

  • CSQA

  • StrategyQA

  • Date understanding and sports understanding from BIG-Bench

  • SayCan

  • math๋Š” exampler๋ฅผ 8๊ฐœ+4๊ฐœ ๋งŒ๋“ค์–ด์„œ 1shot ์˜ˆ์ œ๋กœ ๋„ฃ์—ˆ๋‹ค๋ฉด, ์—ฌ๊ธฐ์„  ๋ฒค์น˜๋งˆํฌ๋ณ„๋กœ 6~10๊ฐœ๋ฅผ ๋งŒ๋“ค์—ˆ๊ณ , ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

Results

๊ฒฐ๋ก ์ ์œผ๋กœ, Commonsense Reasoning ์—์„œ๋„ CoT Prompt๊ฐ€ ํšจ๊ณผ๊ฐ€ ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ธ ์Šค์ผ€์ผ์ด ํด ๋•Œ ํผํฌ๋จผ์Šค๊ฐ€ ๋‚˜์˜จ๋‹ค๋Š” ๊ฒƒ๋„ ๋™์ผํ–ˆ๋‹ค.

(3) Symbolic Reasoning

Symbolic Reasoning ์ธ๊ฐ„์—๊ฒŒ๋Š” ๋น„๊ต์  ์‰ฌ์šด ๋ฌธ์ œ(์˜ˆ: ์ˆ˜์‹ ๊ณ„์‚ฐ, ๋…ผ๋ฆฌ ์—ฐ์‚ฐ ๋“ฑ)์ง€๋งŒ, ์–ธ์–ด ๋ชจ๋ธ์—๊ฒŒ๋Š” ๊นŒ๋‹ค๋กœ์šธ ์ˆ˜ ์žˆ๋Š” ์ž‘์—…์„ ๋งํ•œ๋‹ค. ๊ธฐ์กด์˜ Standard Prompting ๋ฐฉ์‹์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์‹ฌ๋ณผ๋ฆญ ์ถ”๋ก  ์ž‘์—…์—์„œ ์–ธ์–ด ๋ชจ๋ธ์ด ์ข…์ข… ํ•œ๊ณ„๋ฅผ ๋ณด์ด๋Š”๋ฐ, ๋‹จ์ˆœํžˆ ์ •๋‹ต์„ ๋งžํžˆ๊ธฐ๋ณด๋‹ค ๋‹จ๊ณ„๋ณ„ ์‚ฌ๊ณ ๊ฐ€ ํ•„์š”ํ•œ ๋ฌธ์ œ๋“ค์ด ๋งŽ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ํ•˜์ง€๋งŒ CoT๋Š” ์ด๋ฅผ ํ•ด๊ฒฐํ–ˆ๋‹ค.

Tasks

์•„๋ž˜์˜ ๋‘๊ฐ€์ง€ Task๋กœ Symbolic Reasoning ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•จ

(1) Last Letter Concatenation

  • ๋ฌธ์ œ ์ •์˜: ์ฃผ์–ด์ง„ ์ด๋ฆ„์—์„œ ๊ฐ ๋‹จ์–ด์˜ ๋งˆ์ง€๋ง‰ ๊ธ€์ž๋ฅผ ์ถ”์ถœํ•ด ์—ฐ๊ฒฐํ•˜๋Š” ์ž‘์—… (์˜ˆ: "Amy Brown" โ†’ "yn")
  • ๊ธฐ์กด์˜ "์ฒซ ๊ธ€์ž ์ถ”์ถœ(First Letter Concatenation)" ์ž‘์—…๋ณด๋‹ค ๋” ์–ด๋ ค์›€ (์˜ˆ: "Amy Brown" โ†’ "AB"๋Š” ์ฒด์ธ ์˜ค๋ธŒ ์ƒ๊ฐ(CoT) ์—†์ด๋„ ์–ธ์–ด ๋ชจ๋ธ์ด ์‰ฝ๊ฒŒ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ)
  • ๋งˆ์ง€๋ง‰ ๊ธ€์ž๋ฅผ ์ถ”์ถœํ•˜๋Š” ์ž‘์—…์€ ๋” ๋งŽ์€ ์ถ”๋ก  ๊ณผ์ •๊ณผ ์ž‘์—… ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋ฏ€๋กœ ์–ด๋ ค์šด ๋ฌธ์ œ๋กœ ๊ฐ„์ฃผ
  • ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐฉ๋ฒ•: ๋ฏธ๊ตญ ์ธ๊ตฌ์กฐ์‚ฌ ๋ฐ์ดํ„ฐ(name census data)์˜ ์ƒ์œ„ 1,000๊ฐœ์˜ ์ด๋ฆ„์—์„œ ๋ฌด์ž‘์œ„๋กœ ์ด๋ฆ„(์ด๋ฆ„๊ณผ ์„ฑ)์„ ๊ฒฐํ•ฉํ•ด ์ƒ์„ฑ

(2) Coin Flip

  • ๋ฌธ์ œ ์ •์˜: ์ฃผ์–ด์ง„ ์‚ฌ๋žŒ๋“ค์ด ๋™์ „์„ ๋’ค์ง‘๊ฑฐ๋‚˜(Flip) ๋’ค์ง‘์ง€ ์•Š๋Š” ํ–‰๋™์„ ์ˆ˜ํ–‰ํ•œ ํ›„, ๋™์ „์˜ ์ตœ์ข… ์ƒํƒœ๋ฅผ ์ถ”๋ก 
    (์˜ˆ: ์ž…๋ ฅ: "A coin is heads up. Phoebe flips the coin. Osvaldo does not flip the coin. Is the coin still heads up?" -> ์ถœ๋ ฅ: "no")
  • ๊ฐ ์‚ฌ๋žŒ์˜ ํ–‰๋™(๋’ค์ง‘์Œ/๋’ค์ง‘์ง€ ์•Š์Œ)์„ ์ถ”์ ํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๋…ผ๋ฆฌ์  ์ถ”๋ก  ๊ณผ์ •์ด ํ•„์š”
  • ๋‹จ์ˆœํ•œ ๋…ผ๋ฆฌ ๋ฌธ์ œ์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ, ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์˜ ์กฐ๊ฑด์„ ๊ธฐ์–ตํ•˜๊ณ  ์ฒ˜๋ฆฌํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๋ชจ๋ธ์—๊ฒ ๊นŒ๋‹ค๋กœ์šด ์ž‘์—…

in-out domain ์œผ๋กœ ๋‚˜๋ˆ„์–ด์„œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•จ

  • In-Domain Test: ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ๋™์ผํ•œ ๋‹จ๊ณ„ ์ˆ˜๋ฅผ ์š”๊ตฌํ•˜๋Š” ํ…Œ์ŠคํŠธ
  • Out-of-Domain (OOD) Test: ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ณด๋‹ค ๋” ๋งŽ์€ ๋‹จ๊ณ„(๋ณต์žกํ•œ ๋ฌธ์ œ)๋ฅผ ์š”๊ตฌํ•˜๋Š” ํ…Œ์ŠคํŠธ ์„ธํŠธ

์‹ค์ œ๋กœ ์ž‘์„ฑํ•œ CoT Prompt๋Š” ํƒœ์Šคํฌ ๋ณ„๋กœ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

Results

๊ฒฐ๋ก ์ ์œผ๋กœ, Symbolic Reasoning ์—์„œ๋„ CoT Prompt๊ฐ€ ํšจ๊ณผ๊ฐ€ ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

  1. In-Domain ํ‰๊ฐ€:

    • PaLM 540B ๋ชจ๋ธ:
      • ์ฒด์ธ ์˜ค๋ธŒ ์ƒ๊ฐ(CoT) ํ”„๋กฌํ”„ํŒ…์œผ๋กœ ๊ฑฐ์˜ 100% ์ •๋‹ต๋ฅ  ๋‹ฌ์„ฑ.
      • ๋™์ „ ๋’ค์ง‘๊ธฐ(Coin Flip) ์ž‘์—…์€ CoT ์—†์ด๋„ PaLM 540B์—์„œ๋Š” ์„ฑ๊ณตํ–ˆ์ง€๋งŒ, LaMDA 137B๋Š” CoT๊ฐ€ ํ•„์š”.
    • ์ž‘์€ ๋ชจ๋ธ์˜ ํ•œ๊ณ„:
      • ์ž‘์€ ๋ชจ๋ธ์€ CoT๋ฅผ ์ œ๊ณต๋ฐ›์•„๋„ ์‹ฌ๋ณผ๋ฆญ ์ž‘์—… ์ˆ˜ํ–‰์— ์‹คํŒจ.
      • ์‹ฌ๋ณผ๋ฆญ ์กฐ์ž‘ ๋Šฅ๋ ฅ์€ 100B ์ด์ƒ์˜ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์—์„œ๋งŒ ๋‚˜ํƒ€๋‚จ.
  2. OOD ํ‰๊ฐ€:

    • ํ‘œ์ค€ ํ”„๋กฌํ”„ํŒ…: ๋‘ ์ž‘์—… ๋ชจ๋‘ ์‹คํŒจ.
    • CoT ํ”„๋กฌํ”„ํŒ…:
      • ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์—์„œ ๊ธธ์ด ์ผ๋ฐ˜ํ™”(length generalization) ๊ฐ€๋Šฅ.
      • ์„ฑ๋Šฅ์€ In-Domain๋ณด๋‹ค ๋‚ฎ์ง€๋งŒ, ๋” ๊ธด ์ž‘์—…์—์„œ๋„ CoT๊ฐ€ ํšจ๊ณผ์ ์ž„.
  3. ๊ฒฐ๋ก :

    • CoT๋Š” ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์—์„œ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์—์„œ ๋ณด์ง€ ๋ชปํ•œ ๊ธธ์ด๋‚˜ ๋ณต์žก์„ฑ์„ ๊ฐ€์ง„ ๋ฌธ์ œ๋„ ํ•ด๊ฒฐ ๊ฐ€๋Šฅ.
    • ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๋Šฅ๋ ฅ์€ ์ถฉ๋ถ„ํžˆ ํฐ ๋ชจ๋ธ ๊ทœ๋ชจ(100B ์ด์ƒ)์—์„œ๋งŒ ๋ฐœํ˜„.

Discussion

๐Ÿฆ„ ์—ฐ๊ตฌ ์„ฑ๊ณผ

  1. CoT์˜ ์„ฑ๋Šฅ ๊ฐœ์„ :

    • CoT ํ”„๋กฌํ”„ํŠธ๋Š” ์‚ฐ์ˆ  ์ถ”๋ก (arithmetic reasoning)์—์„œ ๊ธฐ์กด ๋ฐฉ์‹๋ณด๋‹ค ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์ž„.
    • ๋‹ค์–‘ํ•œ ์ฃผ์„์ž, ์˜ˆ์ œ, ์–ธ์–ด ๋ชจ๋ธ์— ๋Œ€ํ•ด ์ผ๊ด€๋˜๊ฒŒ ํšจ๊ณผ์ ์ž„.
  2. ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ:

    • CoT๋Š” ๋‹ค๋‹จ๊ณ„ ์ถ”๋ก (multi-step reasoning)์„ ์š”๊ตฌํ•˜๋Š” ์ž‘์—…์— ์ผ๋ฐ˜์ ์œผ๋กœ ์ ์šฉ ๊ฐ€๋Šฅ.
    • ์‹ฌ๋ณผ๋ฆญ ์ถ”๋ก (symbolic reasoning)์—์„œ๋Š” ํ›ˆ๋ จ ์‹œ ์˜ˆ์ œ๋ณด๋‹ค ๋” ๊ธด ์‹œํ€€์Šค ์ž…๋ ฅ์— ๋Œ€ํ•ด์„œ๋„ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉฐ OOD(Out-of-Distribution) ์ผ๋ฐ˜ํ™”๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•จ.
  3. ๊ฐ„๋‹จํ•œ ๊ตฌํ˜„:

    • CoT๋Š” ๋ณ„๋„์˜ ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹ ์—†์ด, ๊ธฐ์กด์˜ ์‚ฌ์ „ ํ•™์Šต๋œ ์–ธ์–ด ๋ชจ๋ธ์— ํ”„๋กฌํ”„ํŠธ๋ฅผ ํ†ตํ•ด ๋ฐ”๋กœ ์ ์šฉ ๊ฐ€๋Šฅ.
  4. ๋ชจ๋ธ ์Šค์ผ€์ผ์˜ ์ค‘์š”์„ฑ:

    • CoT ์ถ”๋ก ์€ ๋ชจ๋ธ ํฌ๊ธฐ ์ฆ๊ฐ€์™€ ํ•จ๊ป˜ ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Œ.
    • ๊ธฐ์กด ํ”„๋กฌํ”„ํŒ… ๋ฐฉ์‹์˜ ์„ฑ๋Šฅ ํ•œ๊ณ„๋ฅผ ๋„˜์–ด์„œ, ๋ชจ๋ธ์ด ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ์ž‘์—…์˜ ๋ฒ”์œ„๋ฅผ ํ™•์žฅ.

๐Ÿ˜… ํ•œ๊ณ„์ 

  1. ์ง„์ •ํ•œ '์ถ”๋ก ' ์—ฌ๋ถ€:

    • CoT๊ฐ€ ์ธ๊ฐ„์˜ ์‚ฌ๊ณ  ๊ณผ์ •์„ ๋ชจ๋ฐฉํ•˜์ง€๋งŒ, ๋ชจ๋ธ์ด ์‹ค์ œ๋กœ "์ถ”๋ก (reasoning)"์„ ํ•˜๊ณ  ์žˆ๋Š”์ง€๋Š” ์—ฌ์ „ํžˆ ๋ฏธํ•ด๊ฒฐ๋œ ์งˆ๋ฌธ.
  2. ์ฃผ์„ ๋น„์šฉ:

    • Few-shot ์„ค์ •์—์„œ๋Š” CoT๋ฅผ ์ˆ˜์ž‘์—…์œผ๋กœ ์ถ”๊ฐ€ํ•˜๋Š” ๋น„์šฉ์ด ์ ์ง€๋งŒ, ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์— ์ ์šฉํ•˜๊ฑฐ๋‚˜ ํŒŒ์ธํŠœ๋‹ํ•˜๋ ค๋ฉด ์ฃผ์„ ๋น„์šฉ์ด ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•  ๊ฐ€๋Šฅ์„ฑ.
    • ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ์ด๋‚˜ ์ œ๋กœ์ƒท ์ผ๋ฐ˜ํ™”๊ฐ€ ๋Œ€์•ˆ์ด ๋  ์ˆ˜ ์žˆ์Œ.
  3. ์ •ํ™•ํ•˜์ง€ ์•Š์€ ์ถ”๋ก  ๊ฒฝ๋กœ:

    • CoT๋Š” ์˜ฌ๋ฐ”๋ฅด์ง€ ์•Š์€ ์ถ”๋ก  ๊ณผ์ •๋„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์–ด, ์ •๋‹ต๊ณผ ์˜ค๋‹ต์ด ํ˜ผ์žฌ๋  ์œ„ํ—˜.
    • ๋ชจ๋ธ์˜ ์‚ฌ์‹ค์ (factual) ์ƒ์„ฑ ๋Šฅ๋ ฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์ด ์•ž์œผ๋กœ์˜ ๊ณผ์ œ.
  4. ๋ชจ๋ธ ํฌ๊ธฐ์˜ ํ•œ๊ณ„:

    • CoT๋Š” ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์—์„œ๋งŒ ์œ ์˜๋ฏธํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋‚˜ํƒ€๋‚˜, ์‹ค์ œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ์šฉํ•˜๊ธฐ์—” ๋น„์šฉ์ด ๋†’์Œ.
    • ์†Œ๊ทœ๋ชจ ๋ชจ๋ธ์—์„œ๋„ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”.

๐Ÿ”ฅ ์•ž์œผ๋กœ์˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ

  1. ๋ชจ๋ธ ์Šค์ผ€์ผ ์ฆ๊ฐ€์™€ ์ถ”๋ก  ๋Šฅ๋ ฅ:

    • ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ๋”์šฑ ํ‚ค์šธ ๊ฒฝ์šฐ, ์ถ”๋ก  ๋Šฅ๋ ฅ์ด ์–ผ๋งˆ๋‚˜ ๋” ํ–ฅ์ƒ๋  ์ˆ˜ ์žˆ์„์ง€ ํƒ๊ตฌ.
  2. ๋‹ค๋ฅธ ํ”„๋กฌํ”„ํŠธ ๋ฐฉ์‹:

    • CoT ์™ธ์—๋„ ์–ธ์–ด ๋ชจ๋ธ์ด ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ์ž‘์—… ๋ฒ”์œ„๋ฅผ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํ”„๋กฌํ”„ํŠธ ๋ฐฉ์‹์„ ํƒ๊ตฌ.
  3. ํ•ฉ๋ฆฌ์ ์ธ ์ถ”๋ก  ๊ณผ์ • ๋ณด์žฅ:

    • CoT๊ฐ€ ์ƒ์„ฑํ•˜๋Š” ์ถ”๋ก  ๊ฒฝ๋กœ์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ด๋Š” ๋ฐฉ๋ฒ• ์—ฐ๊ตฌ.
  4. ์†Œ๊ทœ๋ชจ ๋ชจ๋ธ์—์„œ์˜ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ:

    • ์ž‘์€ ๋ชจ๋ธ์—์„œ CoT๋ฅผ ์œ ๋„ํ•  ์ˆ˜ ์žˆ๋Š” ํšจ์œจ์ ์ธ ๋ฐฉ๋ฒ• ๊ฐœ๋ฐœ.
profile
๋ฐฐ์šฐ๊ณ  ๊ณต๋ถ€ํ•˜๊ณ  ๊ธฐ๋กํ•˜๋Š” ๊ฒƒ์„ ๋ฉˆ์ถ”์ง€ ์•Š๋Š”๋‹ค.

0๊ฐœ์˜ ๋Œ“๊ธ€