๋”ฅ๋Ÿฌ๋‹ - COCONUT ๐Ÿฅฅ๐Ÿฅฅ

์•ˆ ํ˜•์ค€ยท2024๋…„ 12์›” 25์ผ
0


COCONUT - CoT๋ณด๋‹ค ํšจ์œจ๊ณผ ์„ฑ๋Šฅ์ด ์ข‹์€ ํ•™์Šต ๋ฐฉ๋ฒ•(Chain of Continuous Thoughts)

์„ ์š”์•ฝ


CoT๋ฅผ ํฌํ•จํ•œ ํ˜„์žฌ๊นŒ์ง€์˜ ํ•™์Šต ๋ฐฉ๋ฒ•์€ LLM์˜ ๋‹ค์Œ ๋‹จ๊ณ„ ์˜ˆ์ธก์— ์ž์—ฐ์–ด ํ† ํฐ์„ ํ™œ์šฉํ•จ

๊ทธ๋Ÿฌ๋‚˜ ๊ผญ ๊ทธ๋ ‡๊ฒŒ ํ•  ํ•„์š”๋Š” ์—†์Œ. LLM์—๊ฒŒ ๋‹ค์Œ ๋‹จ๊ณ„ ํ† ํฐ์œผ๋กœ ์ž์—ฐ์–ด ํ† ํฐ ๋Œ€์‹ , ๋งˆ์ง€๋ง‰ ์€๋‹‰์ธต์˜ ๊ฒฐ๊ณผ๋ฅผ ๋„ฃ์„ ์ˆ˜ ์žˆ๋Š” ์ž์œ ๋ฅผ ์ฃผ๋ฉด ๋”์šฑ ํšจ์œจ์ ์ด๊ณ , ์„ฑ๋Šฅ์ด ์ข‹์•„์ง

๊ฐ„๋‹จ ๊ฒฐ๊ณผ:

ProntoQA:

CoT: 98.8 % Acc., 92.5 tokens

COCONUT: 99.8 % Acc., 9.0 tokens

Introduction


LLM์ด ๋…ผ์ฆํ•  ๋•Œ, ์ž์—ฐ์–ด ํ† ํฐ์„ ์‚ฌ์šฉํ•˜๋ฉด

  • ๋…ผ์ฆ๋งˆ๋‹ค ํ•„์š”ํ•œ ๊ณ„์‚ฐ๋Ÿ‰์ด ๋‹ค๋ฅด๋‹ค๋Š” ์‚ฌ์‹ค์„ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•จ
  • ๋…ผ์ฆ์‚ฌ์Šฌ(reasoning chain)์˜ ๋Œ€๋‹ค์ˆ˜ ํ† ํฐ์€ ๋ฌธ์žฅ์„ ๋งค๋„๋Ÿฝ๊ฒŒ ๋งŒ๋“œ๋Š” ์—ญํ• ์„ ํ•  ๋ฟ, ๋…ผ์ฆ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์€ ๋ฏธ๋ฏธํ•จ

์ด์ „ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋“ค์€ Related Work ์ฐธ๊ณ 

  • Related Work
    • Chain of Thought (and its variants):
      • ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๊ธฐ ์ „์— ๋…ผ์ฆํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ํ†ต์นญ. ํ•™์Šต, prompting, ๊ฐ•ํ™”ํ•™์Šต์„ ๋ชจ๋‘ ํฌํ•จํ•จ. ํšจ๊ณผ์ ์ด์ง€๋งŒ ์ž๊ธฐํšŒ๊ท€์  ์„ฑ์งˆ์ด ๋ณต์žกํ•œ task์—์„œ๋Š” ์•ฝ์ ์œผ๋กœ ์ž‘์šฉ
      • ์•ฝ์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด tree search๋ฅผ ์ถ”๊ฐ€ํ•˜๊ฑฐ๋‚˜, search dynamics๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์ด ์ œ๊ธฐ๋จ (์–ธ์–ด ํ† ํฐ ์‚ฌ์šฉ)
    • LLM์˜ ์€๋‹‰ ๋…ผ์ฆ:
      • ๊ธฐ์กด ์—ฐ๊ตฌ์—์„œ ์€๋‹‰ ๋…ผ์ฆ์€ ์ค‘๊ฐ„ ๋‹จ๊ณ„ ํ† ํฐ์„ ์ง€์นญํ•จ
      • Transformer์˜ ์ค‘๊ฐ„ ๋‹จ๊ณ„ ํ† ํฐ์„ ๋ถ„์„ํ•ด ๋ณด๋ฉด, CoT๋ฅผ ์ƒ์„ฑํ•˜๋”๋ผ๋„ ์ค‘๊ฐ„ ๋‹จ๊ณ„ ํ† ํฐ์€ ์ƒ์„ฑ๋œ CoT์™€๋Š” ๋‹ค๋ฅธ ๋…ผ์ฆ ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค๋Š” ์ ์ด ๋ฐœ๊ฒฌ๋จ
        • (unfaithfulness of CoT reasoning) โ†’ ์€๋‹‰ ๋…ผ์ฆ์„ ์ œ๋Œ€๋กœ ํ™œ์šฉํ•˜๊ณ  ์žˆ์ง€ ๋ชปํ•จ
      • ์€๋‹‰ ๋…ผ์ฆ์„ ๋” ์ž˜ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ
        • pause (Think before you speak)
          • ๋…ผ์ฆ ํ† ํฐ์„ ์ƒ์„ฑํ•˜๊ธฐ ์ „์— pauseํ† ํฐ์„ ์ƒ์„ฑํ•˜๋ฉด์„œ ์ƒ๊ฐํ•˜๊ฒŒ ํ•˜๊ณ , ๋งˆ์ง€๋ง‰ pauseํ† ํฐ ์ดํ›„์˜ ๊ฒฐ๊ณผ๋งŒ ์‚ฌ์šฉ
          • (pretrain, finetune ํ•„์š”)
        • implicit-CoT (From Explicit CoT to Implicit CoT)
          • ํ•™์Šต ๋‹จ๊ณ„๋ณ„๋กœ ๋…ผ์ฆ ํ† ํฐ์„ ์ค„์ž„ โ†’ COCONUT์—์„œ๋„ ์ฑ„ํƒ

COCONUT์€ LLM์—๊ฒŒ ์›ํ•  ๋•Œ ์ž์—ฐ์–ด ํ† ํฐ์œผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š” ๋ฐฉ๋ฒ•

  • ์–ธ์–ด ๋ชจ๋“œ์™€ ์€๋‹‰ ๋ชจ๋“œ ์‚ฌ์ด์˜ ๋ณ€ํ™˜์ด ๊ฐ€๋Šฅ
  • ๋ณ€ํ™˜์€ <bot>, <eot> ์˜ ํŠน์ˆ˜ ํ† ํฐ์„ ์‚ฌ์šฉ
  • ์€๋‹‰ ๋ชจ๋“œ์—์„œ๋Š” ์€๋‹‰์ธต(Latent)์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค์Œ ํ† ํฐ์œผ๋กœ ์‚ฌ์šฉ
  • ์–ธ์–ด ๋ชจ๋“œ์—์„œ๋Š” ์ผ๋ฐ˜์ ์ธ LLM์œผ๋กœ ์ž‘๋™
  • ๋‹ค๋‹จ๊ณ„ ํ•™์Šต๋ฒ• ์ ์šฉ
  • ๋ถ„์„ ๊ฒฐ๊ณผ, ์€๋‹‰ ๋ชจ๋“œ์˜ ํ† ํฐ์€ ๊ฐ€๋Šฅํ•œ ๋‹ค์Œ ์ƒํƒœ๋ฅผ ์ค‘์ฒฉํ•ด์„œ encodeํ•จ
    • ์ด๋Š” CoT์—์„œ๋Š” ๋ถˆ๊ฐ€๋Šฅ
    • ๋…ผ์ฆ ๊ณผ์ •์„ BFS์™€ ๋น„์Šทํ•œ ๊ตฌ์กฐ๋กœ ๋งŒ๋“ฆ
  • ์žฅ๊ธฐ ๊ณ„ํš์ด ํ•„์š”ํ•œ ์ž‘์—…์ผ์ˆ˜๋ก, CoT๋ณด๋‹ค ํšจ์œจ์ ์ด๋ฉด์„œ๋„ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ƒ„

COCONUT: Chain of Continuous Thought


LLM in a nutshell

์ผ๋ฐ˜์ ์ธ LLM ๊ตฌ์กฐ์—์„œ

  1. ์ž…๋ ฅ sequence๋Š” ํ† ํฐ๋ณ„๋กœ ๋ถ„ํ•ด๋˜์–ด embedding function e ๋ฅผ ๊ฑฐ์นœ๋‹ค.

    E_t: ํ† ํฐ embedding sequence = [e(x1), โ€ฆ , e(x_t)]

  2. ์ดํ›„ ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ๊ฑฐ์ณ hidden state seqence H_t๊ฐ€ ๋œ๋‹ค

    h_t: ๋งˆ์ง€๋ง‰ ํ† ํฐ์˜ hidden state

  3. ์ตœ์ข…์ ์œผ๋กœ language model head W๋ฅผ ๊ฑฐ์นœ ๊ฒฐ๊ณผ์— softmax๋ฅผ ์ทจํ•˜๋ฉด ๋‹ค์Œ ์ž์—ฐ์–ด ํ† ํฐ์— ๋Œ€ํ•œ ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ๋œ๋‹ค.

COCONUT ๊ตฌ์กฐ์—์„œ

  • <bot>, <eot> ์˜ ํŠน์ˆ˜ ํ† ํฐ์„ ์‚ฌ์šฉํ•˜์—ฌ ์€๋‹‰ ๋ชจ๋“œ๋ฅผ ํ‘œ๊ธฐํ•œ๋‹ค.
  • ์€๋‹‰ ๋ชจ๋“œ์—์„œ๋Š” e(xk) ๋Œ€์‹  h{k-1}์„ ์‚ฌ์šฉํ•œ๋‹ค.
    • i๋ฒˆ์งธ ํ† ํฐ์„ ์ฒ˜๋ฆฌํ•  ๋•Œ, ๋Š” embedding์„ ๊ฑฐ์ณ์„œ(e(x_i)) ์€๋‹‰ ๊ฒฐ๊ณผ h_i๋ฅผ ๋‚ธ๋‹ค.
      • (i+1)๋ฒˆ์งธ ํ† ํฐ์„ ์ฒ˜๋ฆฌํ•  ๋•Œ, h_i์— W๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š๊ณ  (W h_i ์—†์Œ)
      • softmax๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š๊ณ  (Softmax (W h_i) ์—†์Œ)
      • embedding์„ ์ ์šฉํ•˜์ง€ ์•Š๊ณ  (e(Softmax (W h_i)) ์—†์Œ)
      • h_i๋ฅผ ๋‹ค์Œ ํŠธ๋žœ์Šคํฌ๋จธ ์ž…๋ ฅ sequence์˜ ์ผ๋ถ€๋กœ ํ™œ์šฉํ•œ๋‹ค
    • ์ž…๋ ฅ sequence์—์„œ ๊ฐ€ i๋ฒˆ์งธ(x_i = )์— ์žˆ๊ณ , ๊ฐ€ j๋ฒˆ์งธ์— ์žˆ๋Š” ๊ฒฝ์šฐ, E_k(i < k < j)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค
      • Ek = [e(x_1), โ€ฆe(x_i), h_i, h{i+1}, โ€ฆ, h_{k-1}]
    • h_t๋Š” ์ตœ์ข… normalization layer๋ฅผ ๊ฑฐ์นœ ๊ฒฐ๊ณผ์ด๋ฏ€๋กœ, ํฌ๊ธฐ๊ฐ€ ํฌ์ง„ ์•Š๋‹ค
    • ๋‹ค์Œ ์–ธ์–ด ํ† ํฐ ํ™•๋ฅ ๋ถ„ํฌ Softmax (W h_t)๋ฅผ ๊ตฌํ•  ํ•„์š”๋Š” ์—†์œผ๋‚˜, ๋ถ„์„ ๋ชฉ์ ์œผ๋กœ๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค
    • ๋งˆ์ง€๋ง‰ ์€๋‹‰ ํ† ํฐ์€ ํ˜„์žฌ์˜ ๋…ผ์ฆ ์ƒํ™ฉ์„ ํ‘œํ˜„ํ•˜๊ณ , ์ด๋ฅผ continous thought๋ผ๊ณ  ๋ช…๋ช…ํ•œ๋‹ค.

ํ•™์Šต ๊ณผ์ •

CoT ๋ฐ์ดํ„ฐ๋ฅผ ์ง€๋ ›๋Œ€ ์‚ผ์•„, ์งˆ๋ฌธ์„ ์ž…๋ ฅ์œผ๋กœ ํ•˜๊ณ  ๋…ผ์ฆ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์ณ ๋‹ต์„ ๋‚ด๋„๋ก ํ•™์Šตํ•œ๋‹ค.

ํ•™์Šต ๊ณผ์ •

  • stage 0์—์„œ๋Š” CoT ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ํ•™์Šตํ•œ๋‹ค.
  • ๊ฐ ๋‹จ๊ณ„(stage)๋งˆ๋‹ค ์•ž์—์„œ๋ถ€ํ„ฐ CoT ๋…ผ์ฆ ํ† ํฐ 1 ๊ฐœ๋ฅผ c ๊ฐœ์˜ continuous thought ํ† ํฐ์œผ๋กœ ๋Œ€์ฒดํ•œ๋‹ค
  • ๋‹จ๊ณ„๋ฅผ ์‹œ์ž‘ํ•  ๋•Œ๋งˆ๋‹ค optimizer state๋ฅผ resetํ•œ๋‹ค
  • Negative Log Likelihood loss๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ์งˆ๋ฌธ๊ณผ ์€๋‹‰ ์ƒ๊ฐ์ด ์•„๋‹Œ, ์ž์—ฐ์–ด ๋…ผ์ฆ ๊ณผ์ •์—์„œ๋งŒ loss๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค
  • ์€๋‹‰ ์ƒ๊ฐ์€ ์ œ๊ฑฐ๋œ ์•ž์ชฝ ์ž์—ฐ์–ด ํ† ํฐ์„ ๋˜์‚ด๋ฆฌ๊ฑฐ๋‚˜ ์••์ถ•ํ•˜๋Š” ๋ฐฉํ–ฅ์ด ์•„๋‹Œ, ๋ฏธ๋ž˜ ๋…ผ์ฆ ๋‹จ๊ณ„๋ฅผ ์ž˜ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šต๋œ๋‹ค
    • It is important to note that the objective does not encourage the continuous thought to
      compress the removed language thought, but rather to facilitate the prediction of future reasoning.

ํ•™์Šต ๊ณผ์ • - ์„ธ๋ถ€

  • continous thought๋Š” back-propagation์œผ๋กœ grad ๊ณ„์‚ฐ ๊ฐ€๋Šฅ โ†’ ํ•™์Šต ์šฉ์ด
  • n ๊ฐœ์˜ ์€๋‹‰ ์ƒ๊ฐ์„ ๋ผ์›Œ๋„ฃ์€ ๋‹จ๊ณ„์—์„œ, (n+1) forward pass๊ฐ€ ํ•„์š”ํ•จ
    • ๊ฐ ์€๋‹‰ ํ† ํฐ์„ ๋งŒ๋“ค๊ธฐ ์ „๊นŒ์ง€๋Š” ๊ฐ’์„ ๋ชจ๋ฅด๋‹ˆ(GT ์—†์Œ) ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” teacher forcing๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Œ
    • ์€๋‹‰ ํ† ํฐ์„ ๋งŒ๋“œ๋Š” ๊ณผ์ •์€ ์ง๋ ฌ โ†’ ๋ณ‘๋ ฌํ™”๋Š” ํ•ด๊ฒฐํ•ด์•ผ ํ•˜๋Š” ๊ณผ์ œ

์ถ”๋ก  ๊ณผ์ •

์งˆ๋ฌธ์˜ ์งํ›„์— ํ† ํฐ์ด ์˜ค๊ณ , ์ •ํ•ด์ง„ ๊ฐœ์ˆ˜์˜ continous thought ์ดํ›„์— ํ† ํฐ์ด ๋“ฑ์žฅ

continuous thought ํ† ํฐ์˜ ๊ฐœ์ˆ˜๋Š” ๋ณ„๊ฐœ์˜ model๋กœ ์–ธ์ œ ๋๋‚ผ์ง€ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ๊ณผ ์ •ํ•ด์ง„ ๊ฐœ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ ๋ชจ๋‘ ์ž˜ ๋˜์–ด์„œ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์„ ์ฑ„ํƒํ•จ

Experiments

3 ๊ฐœ์˜ dataset ํ™œ์šฉ / GT์™€์˜ ๋น„๊ต๋กœ ์ •ํ™•๋„ ๊ณ„์‚ฐ / ์ •๋‹ต ๋‚ด๊ธฐ๊นŒ์ง€ ํ•„์š”ํ–ˆ๋˜ ์ถ”๊ฐ€ ํ† ํฐ๋Ÿ‰ ํ‘œ๊ธฐ

Datasets

  1. GSM8k: ์ˆ˜ํ•™ ๋…ผ์ฆ
  2. ProntoQA: ๋…ผ์ฆ
    1. ํŠธ๋ฆฌ ๊ตฌ์กฐ์˜ ๋งฅ๋ฝ์„ ์ž„์˜๋กœ ์ƒ์„ฑํ•˜๊ณ , ์ž์—ฐ์–ด๋กœ ์ฃผ์–ด์ง
  3. ProsQA: ๋…ผ์ฆ
    1. ProntoQA๊ฐ€ ๋„ˆ๋ฌด ์‰ฌ์›Œ ๋…ผ์ฆ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ์— ๋ถ€์ ์ ˆํ•จ
    2. DAG ๊ตฌ์กฐ์˜ ๋งฅ๋ฝ์„ ์ž„์˜๋กœ ์ƒ์„ฑํ•˜๊ณ , ์ž์—ฐ์–ด๋กœ ์ฃผ์–ด์ง
    3. Each problem is structured as a binary question: โ€œIs [Entity] a [Concept A] or [Concept B]?โ€
    4. The graph is constructed such that a path exists from [Entity] to [Concept A] but not to [Concept B].

Experimental Setup

base: pre-trained GPT-2

์ˆ˜ํ•™ ๋…ผ์ฆ: c = 2, 3 ๋‹จ๊ณ„๊นŒ์ง€๋Š” ๋…ผ์ฆ ์–ธ์–ด ํ† ํฐ์„ 1 ๊ฐœ์”ฉ ์—†์• ๋‹ค๊ฐ€, 4 ๋‹จ๊ณ„์—์„œ๋Š” continuous thought ํ† ํฐ ๊ฐœ์ˆ˜๋ฅผ ์œ ์ง€ํ•œ ์ฑ„๋กœ ๋…ผ์ฆ ์–ธ์–ด ํ† ํฐ์„ ๋ชจ๋‘ ์—†์•ค๋‹ค. 3 epoch / stage โ†’ ๊ธด ์„ค๋ช… long-tail์— ๊ฐ•๊ฑดํ•ด์ง

๋…ผ๋ฆฌ ๋…ผ์ฆ: c =1, ๋‘ ๋ฐ์ดํ„ฐ์…‹ ๋ชจ๋‘ ๋…ผ์ฆ ์–ธ์–ด ํ† ํฐ์˜ ์ตœ๋Œ“๊ฐ’์ด 6์ด๋ฏ€๋กœ, ํ•™์Šต ๋‹จ๊ณ„๋ฅผ 6 ๋‹จ๊ณ„๋กœ ์„ค์ •ํ•œ๋‹ค. 5 epoch / stage

๋” ๋‚˜์•„๊ฐˆ ๋‹จ๊ณ„๊ฐ€ ์—†์œผ๋ฉด 50 epoch๊นŒ์ง€ ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„๋กœ ํ•™์Šตํ•œ๋‹ค.

์ถ”๋ก ์—์„œ๋Š” continuous thought ํ† ํฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ํ•™์Šต ์‹œ ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ ๊ฐœ์ˆ˜์™€ ์ผ์น˜์‹œํ‚จ๋‹ค

Results & Discussion

Chaining continous thoughts enhances reasoning

GSM8k์—์„œ, COCONUT์ด iCoT๋ณด๋‹ค ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๊ณ , pause as thought๋ณด๋‹ค๋Š” ์›”๋“ฑํžˆ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค. ์ด๋Š” COCONUT์ด ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์—์„œ ๋” ๋‚ซ๋‹ค๋Š” ์˜๋ฏธ. (pause ํ† ํฐ์ด ๋ณ‘๋ ฌํ™”์— ์œ ๋ฆฌํ•˜์ง€๋งŒ)

c๋ฅผ 0 โ†’ 1 โ†’ 2๋กœ ๋ณ€ํ™”์‹œํ‚ฌ ๋•Œ ์„ฑ๋Šฅ์ด ๊พธ์ค€ํžˆ ์˜ฌ๋ผ๊ฐ”๋Š”๋ฐ, ์ด๋Š” COCONUT๋„ CoT์—์„œ์˜ ์—ฐ์‡„ ํšจ๊ณผ๋ฅผ ๋ณด๊ณ  ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ํ•ด์„ ํ•  ์ˆ˜ ์žˆ๋‹ค

  • ํ† ํฐ์„ ์—ฐ์‡„์ ์œผ๋กœ ์—ฐ๊ฒฐํ•˜๋ฉด ๊ณ„์‚ฐ๋Ÿ‰์— ๋”ฐ๋ผ ์„ฑ๋Šฅ์ด ์˜ฌ๋ผ๊ฐ€๋Š” ํšจ๊ณผ

๋…ผ๋ฆฌ ๋…ผ์ฆ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” COCONUT๊ณผ iCoT๊ฐ€ ๋ชจ๋‘ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋Š”๋ฐ, ์ด ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” ๊ณ„์‚ฐ๋Ÿ‰์ด ๋ณ‘๋ชฉ์ด ์•„๋‹ˆ๋ผ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค

Latent reasoning outperforms language reasoning in planning-intensive tasks

๋ณต์žกํ•œ ๋…ผ์ฆ์˜ ๊ฒฝ์šฐ, ๋” ์žฅ๊ธฐ์ ์ธ ๊ด€์ ์—์„œ ๊ฐ step์„ ํ‰๊ฐ€ํ•  ํ•„์š”๊ฐ€ ์žˆ๋Š”๋ฐ, ProsQA์˜ ๋ณต์žกํ•œ DAG๋Š” ๊ณ„ํš ๋Šฅ๋ ฅ์„ ์š”๊ตฌํ•œ๋‹ค. CoT๋Š” ๊ฑฐ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์—†๋Š” ๋ฐ˜๋ฉด, COCONUT๊ณผ iCoT๋Š” ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค€๋‹ค

The LLM still needs guidance to learn latent reasoning

๋‹จ๊ณ„์ ์œผ๋กœ ๋…ผ์ฆ ์–ธ์–ด ํ† ํฐ์„ ์ค„์—ฌ ๋‚˜๊ฐ€๋Š” ๊ฒฝ์šฐ๊ฐ€ ์•„๋‹Œ, ๋ชจ๋“  ๋…ผ์ฆ ์–ธ์–ด ํ† ํฐ์„ ์—†์• ๋Š” ๋ฐฉ์‹ฑ(no-curriculum)์œผ๋กœ ํ•™์Šตํ•˜๋ฉด, no-CoT์™€ ๋น„์Šทํ•˜๋‹ค (continuous thought์˜ ์˜๋ฏธ๊ฐ€ ์—†๋‹ค)

**Continuous thoughts are efficient representations of

reasoning**

์ฒซ ์€๋‹‰ ํ† ํฐ์„ ์–ธ์–ด ํ† ํฐ์œผ๋กœ decodeํ•˜๊ธฐ ์œ„ํ•ด LM head๋ฅผ ํ†ต๊ณผ์‹œํ‚ค๋ฉด 180, 9๋ฅผ ๋†’๊ฒŒ ๊ฐ–๋Š” ๋ถ„ํฌ๊ฐ€ ๋‚˜์˜จ๋‹ค. ์ด๋Š” (3ร—3ร—60 = 9ร—60 = 540, or 3ร—3ร—60 = 3ร—180 = 540) ์˜ ์ค‘๊ฐ„ ๊ณผ์ •์„ ๋ณด์—ฌ์ค€๋‹ค. ๋˜ํ•œ ์—ฌ๋Ÿฌ ๋…ผ์ฆ์˜ ๋ฐฉํ–ฅ์ด ์ค‘์ฒฉ๋˜์–ด ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค

Understanding the Latent Reasoning in Coconut

์ด ์žฅ์—์„œ๋Š” ์€๋‹‰ ๋…ผ์ฆ ๊ณผ์ •์„ ๋ถ„์„. ์ด๋ฅผ ์œ„ํ•ด ์–ธ์–ด ๋ชจ๋“œ์™€ ์€๋‹‰ ๋ชจ๋“œ๋ฅผ ๋” ์ž์œ ๋กญ๊ฒŒ ์™”๋‹ค๊ฐ”๋‹ค ํ•  ์ˆ˜ ์žˆ๋Š” COCONUT ๋ณ€์ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ผ๋ฐ˜ COCONUT๊ณผ๋Š” ๋‹ค์Œ์˜ ์ฐจ์ด๋ฅผ ๊ฐ€์ง.

์ผ๋ฐ˜ COCONUT

  • ํ•™์Šต์˜ ๋งˆ์ง€๋ง‰ ๋‹จ๊ณ„์—์„œ๋Š” โ€œ์ •ํ•ด์ง„ ์ตœ๋Œ€์˜ continuous thought ํ† ํฐโ€์„ ๊ฐ€์ง
  • ์ถ”๋ก  ์‹œ์— โ€œ์ •ํ•ด์ง„ ์ตœ๋Œ€์˜ continuous thought ํ† ํฐโ€๋งŒํผ ์ƒ๊ฐ ํ›„ ์–ธ์–ด ๋ชจ๋“œ๋กœ ์ „ํ™˜

์€๋‹‰ ๋…ผ์ฆ ๋ถ„์„์šฉ COCONUT

  • ํ•™์Šต์˜ ๋ชจ๋“  ๋‹จ๊ณ„์—์„œ 0.3์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฅธ ๋‹จ๊ณ„์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ๋Œ€์ฒด โ†’ ์ด์ „ ๋‹จ๊ณ„๋ฅผ ๊นŒ๋จน์ง€ ์•Š๊ฒŒ ๋จ
  • ์ถ”๋ก  ์‹œ์— k ํ† ํฐ๋งŒํผ ์ƒ๊ฐ ํ›„ ์–ธ์–ด ๋ชจ๋“œ๋กœ ์ „ํ™˜

์€๋‹‰ ๋…ผ์ฆ ๋ถ„์„์šฉ COCONUT์„ ์‚ฌ์šฉํ•˜์—ฌ ์™„์ „ ์€๋‹‰ ๋ชจ๋“œ์™€ ์™„์ „ ์–ธ์–ด ๋ชจ๋“œ๋ฅผ ์‚ฌ์ด๋ฅผ ๋น„๊ต(์„ฑ๋Šฅ ๋“ฑ)ํ•ด ๋ณผ ์ˆ˜ ์žˆ์Œ.
์ด ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์€๋‹‰ ๋…ผ์ฆ ๊ณผ์ •์ด tree search์™€ ์œ ์‚ฌํ•จ์„ ๋ฐํžˆ๊ณ , ์€๋‹‰ ๋…ผ์ฆ์ด LLM์˜ ํŒ๋‹จ์— ์™œ ๋„์›€์„ ์ฃผ๋Š”์ง€ ๋ถ„์„ํ•จ

Experimental Setup

๋ณ€์ด COCONUT(k in {0, 1, 2, 3, 4 ,5, 6})์„ ProsQA๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ‰๊ฐ€.

ํ‰๊ฐ€ ํ•ญ๋ชฉ:

  1. ์ •ํ™•๋„: ์ตœ์ข… ๋‹ต์ด ๋งž์•˜๋Š”์ง€ ํ‰๊ฐ€
  2. ๋…ผ์ฆ ๊ณผ์ •:
    1. ProsQA๋Š” DAG๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฐ์ดํ„ฐ์…‹์ด๋ฏ€๋กœ, ๋ชจ๋ธ์ด ์ถœ๋ ฅํ•˜๋Š” ์–ธ์–ด ๋…ผ์ฆ๋„ ๊ทธ๋ž˜ํ”„์—์„œ์˜ ๊ฒฝ๋กœ๊ฐ€ ๋จ
    2. ์–ธ์–ด ๋…ผ์ฆ์€ ๋ฐฐํƒ€์ ์ธ ๋‹ค์Œ 6๊ฐ€์ง€ ๋ฒ”์ฃผ๋กœ ๋ถ„๋ฅ˜ ๊ฐ€๋Šฅ
      1. Correct Path: ์ •๋‹ต์„ ๋งž์ถ”์—ˆ๊ณ , ๊ฐ€์žฅ ์งง์€ ๊ฒฝ๋กœ์ž„
      2. Longer Path: ์ •๋‹ต์„ ๋งž์ถ”์—ˆ์ง€๋งŒ, ๋” ์งง์€ ๊ฒฝ๋กœ๊ฐ€ ์กด์žฌํ•จ
      3. Hallucination: ๊ทธ๋ž˜ํ”„์— ์กด์žฌํ•˜์ง€ ์•Š๋Š” edge๋ฅผ ์ด์šฉํ•˜๊ฑฐ๋‚˜, ๋Š์–ด์ ธ ์žˆ๋Š” ๊ฒฝ๋กœ์ž„
      4. Wrong Target: ์œ ํšจํ•œ ๊ฒฝ๋กœ์ด์ง€๋งŒ, ์˜ค๋‹ต์„ ๋ƒˆ์Œ
      5. Correct Label: ๋…ผ์ฆ ํ† ํฐ ์—†์ด ์ •๋‹ต์„ ๋ƒˆ์Œ (no-CoT or large k)
      6. Incorrect Label: ๋…ผ์ฆ ํ† ํฐ ์—†์ด ์˜ค๋‹ต์„ ๋ƒˆ์Œ (no-CoT or large k)

Interpolating between Latent and Language Reasoning

  • k๋ฅผ ๋Š˜๋ ค๊ฐˆ์ˆ˜๋ก ์ •ํ™•๋„์™€ ์˜ฌ๋ฐ”๋ฅธ ๋…ผ์ฆ ๊ณผ์ •(Correct Path, Correct Label)์ด ์˜ฌ๋ผ๊ฐ. ๋˜ํ•œ Hallucination๊ณผ Wrong Target์€ ์ค„์–ด๋“ฆ
  • k=0 ๊ณผ CoT๋ฅผ ๋น„๊ตํ•˜๋ฉด, ๋‘ ๋ชจ๋ธ ๋ชจ๋‘ ์™„์ „ํ•œ ์–ธ์–ด ๋ชจ๋“œ๋กœ ๋™์ž‘ํ•จ์—๋„ k=0 COCONUT์˜ ์ •ํ™•๋„๊ฐ€ ๋” ๋†’๊ณ , ๋…ผ์ฆ ๊ณผ์ •์—์„œ Correct Path ๋น„์œจ์€ ๋” ๋†’๊ณ , Hallucination ๋น„์œจ์€ ๋” ๋‚ฎ์•˜์Œ (Wrong Target์€ ๋” ๋†’์•„๋ณด์ž„..)
  • ์ด๋Š” ํ•™์Šต ๊ณผ์ •์—์„œ ๋‹ค๋ฅธ ๋‹จ๊ณ„์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์„ž์€ ์˜ํ–ฅ์œผ๋กœ, ํ›„๊ธฐ ๋‹จ๊ณ„์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” ์•ž ์ชฝ์˜ ๋…ผ์ฆ ๊ณผ์ •์ด ์ƒ๋žต๋˜์–ด ๋ชจ๋ธ์˜ ๊ณ„ํš ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ด. ์ด์— ๋ฐ˜ํ•ด CoT์—์„œ๋Š” ๋ชจ๋ธ์ด ์–ธ์ œ๋‚˜ ์งํ›„ ํ† ํฐ์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šต๋˜์–ด, ๊ทผ์‹œ์•ˆ(shortsighted)์ธ ๋ชจ์Šต์„ ๋ณด์ž„

  • ์ด ์˜ˆ์‹œ์—์„œ CoT๋Š” Hallucination, k=1 COCONUT์€ Wrong Target์œผ๋กœ ๋น ์ง€์ง€๋งŒ, k=2 COCONUT์€ CORRECT PATH๋ฅผ ์ž˜ ๋ƒ„
  • ์ด๋ฅผ ํ†ตํ•ด ์ดˆ๊ธฐ ์€๋‹‰ ์ƒ๊ฐ ํ† ํฐ์—์„œ๋Š” ์–ด๋–ค edge๋ฅผ ํƒํ•  ์ง€ ์–ด๋ ค์›Œํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Œ
  • ๊ฐ ๋‹จ๊ณ„๋งˆ๋‹ค ์–ธ์–ด ํ† ํฐ ํ•˜๋‚˜๋ฅผ ๊ณจ๋ผ์•ผ ํ•˜๋Š” ์–ธ์–ด ๋ชจ๋“œ์™€ ๋‹ฌ๋ฆฌ, ์€๋‹‰ ๋ชจ๋“œ์—์„œ๋Š” ๊ฒฐ์ •์„ ๋…ผ์ฆ์˜ ๋๊นŒ์ง€ ๋ฏธ๋ฃฐ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์€๋‹‰ ๋…ผ์ฆ ๊ณผ์ •์ด ์ง„ํ–‰๋ ์ˆ˜๋ก ์˜ค๋‹ต์„ ์ ์ง„์ ์œผ๋กœ ๊ฑธ๋Ÿฌ๋‚ด์–ด ์ •ํ™•๋„๋ฅผ ๋†’์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ

Interpreting the Latent Search Tree

  • ์€๋‹‰ ์ƒ๊ฐ ํ† ํฐ์ด ๋‹ค์Œ step์ด ๋  ์ˆ˜ ์žˆ๋Š” ์—ฌ๋Ÿฌ ํ›„๋ณด๋ฅผ ์ค‘์ฒฉํ•˜์—ฌ encodeํ•œ๋‹ค๋Š” ์ ์—์„œ, search tree๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ๋ชจ๋“  frontier node๋ฅผ ๊ฐ™์€ ๋น„์ค‘์œผ๋กœ ๋‹ค๋ฃจ๋Š” BFS์™€๋Š” ๋‹ค๋ฅด๊ฒŒ, ๋ชจ๋ธ์€ ๋” ๊ฐ€์น˜์žˆ๋Š” node๋ฅผ ์šฐ์„ ํ•˜๋Š” ๋Šฅ๋ ฅ์ด ์žˆ๋‹ค.
    • frontier node: ํ˜„์žฌ ์ˆœํšŒ์—์„œ ๋ฐฉ๋ฌธํ•œ node์™€ ์ง์ ‘ ์—ฐ๊ฒฐ๋œ, ๋ฐฉ๋ฌธํ•˜์ง€ ์•Š์€ node
  • Figure 6. ์—์„œ ์ฒซ ๋ฒˆ์งธ step์€ Alex์˜ ์ž์‹ node๋ฅผ ๊ณ ๋ฅด๋Š” ๊ณผ์ •์ด๊ณ , ๋‘ ๋ฒˆ์งธ step์˜ frontier node๋Š” ์†์ž node์ด๋‹ค.
    • frontier node: ํ˜„์žฌ ์ˆœํšŒ์—์„œ ๋ฐฉ๋ฌธํ•œ node์™€ ์ง์ ‘ ์—ฐ๊ฒฐ๋œ, ๋ฐฉ๋ฌธํ•˜์ง€ ์•Š์€ node
  • ์ด ๊ณผ์—…์—์„œ k ํ† ํฐ ํ›„ ์–ธ์–ด ๋ชจ๋“œ๋กœ ๋ชจ๋ธ์„ ์ „ํ™˜ํ•˜๋ฉด Every |Concept A| is a |Concept B| ์˜ ๊ทœ๊ฒฉํ™”๋œ ๋ฌธ์žฅ์ด ์—ฐ๋‹ฌ์•„ ์ถœ๋ ฅ๋˜๋Š”๋ฐ, Concept A๊ฐ€ ๋“ฑ์žฅํ•  ๊ฐ€๋Šฅ์„ฑ์„ ๊ณ„์‚ฐํ•˜๋ฉด ๋ชจ๋ธ์ด ํ•ด๋‹น node์— ์–ผ๋งˆ๋‚˜ ํฐ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ–ˆ๋Š”์ง€ ๋ถ„์„ ๊ฐ€๋Šฅํ•˜๋‹ค

  • Figure 7.์˜ ์ขŒ์ธก์„ ์˜ˆ์‹œ๋กœ ํ•˜๋ฉด, Every๊นŒ์ง€ ์–ธ์–ด ๋ชจ๋“œ๋กœ ์ถœ๋ ฅํ•œ ๋‹ค์Œ, Every๋ฅผ Embedding Layer์— ํ†ต๊ณผ์‹œํ‚ค๊ณ  Transformer์— ํ†ต๊ณผ์‹œํ‚ค๊ณ  LM head, Softmax๊นŒ์ง€ ํ†ต๊ณผ์‹œํ‚ค๋ฉด ๊ฐ ์–ธ์–ด ํ† ํฐ๋ณ„ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ์–ป๋Š”๋ฐ, ์ด ์‹œ์ ์—์„œ โ€œlempusโ€์˜ ๊ฐ€์น˜๋ฅผ โ€œlempusโ€๋ฅผ ์ด๋ฃจ๋Š” ๊ฐ ํ† ํฐ(โ€leโ€, โ€œmpโ€, โ€œusโ€)๊ฐ€ ๋“ฑ์žฅํ•  ํ™•๋ฅ ์˜ ๊ณฑ์œผ๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค
    • ํ™•๋ฅ ์„ ๊ตฌํ•˜๋ ค๋ฉด p(โ€le")p(โ€mpโ€|โ€le)p(โ€usโ€|โ€leโ€,โ€mpโ€)์—ฌ์•ผ ํ• ํ…๋ฐ, ๋ชจ๋‘ forward pass๋ฅผ ์‹œ์ผœ๋ณด์ง€ ์•Š๋Š” ํ•œ, ๊ณ„์‚ฐ์ด ๋ถˆ๊ฐ€๋Šฅํ•  ๊ฒƒ ๊ฐ™๋‹ค
    • ๊ฐ€์น˜๋Š” ํ™•๋ฅ ์˜ ์ •์˜๋ฅผ ๋งŒ์กฑํ•˜์ง€ ์•Š์„ ๊ฒƒ ๊ฐ™๋‹ค
  • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, Figure 7.์˜ ์šฐ์ธก์€ ์†์ž node์˜ ๊ฐ€์น˜๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ๊ณผ์ •์ด๋‹ค. ์ขŒ์ธก๊ณผ ๋น„๊ตํ•ด ๋ณด๋ฉด, ์ขŒ์ธก์—์„œ๋Š” โ€œsterpusโ€๋ฅผ ๊ฑธ๋Ÿฌ๋‚ด๊ธด ํ–ˆ์ง€๋งŒ ๋‚˜๋จธ์ง€ node์—์„œ๋Š” ํ—›๊ฐˆ๋ฆฌ๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. ์€๋‹‰ ์ƒ๊ฐ์„ 1 ํ† ํฐ ๋” ๊ฑฐ์นœ ์šฐ์ธก์—์„œ๋Š” rorphus์— ์ง‘์ค‘(๊ฐ€์น˜=0.87)ํ•œ๋‹ค๋Š” ์ ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ์€๋‹‰ ๋…ผ์ฆ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์น ์ˆ˜๋ก top-1์˜ ๊ฐ€์น˜๊ฐ€ top-2, top-3๋ฅผ ์••๋„ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋ชจ๋ธ์ด ์ ์ฐจ ๋†’์€ ๊ฐ€์น˜๋ฅผ ๊ฐ–๋Š” node์— ์ง‘์ค‘ํ•˜๋Š” ์ ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

Why is a Latent Space Better for Planning?

์€๋‹‰ ๋…ผ์ฆ ๊ณผ์ •์ด search tree๋กœ ๋™์ž‘ํ•œ๋‹ค๋Š” ์ ์„ ํ†ตํ•ด ์€๋‹‰ ๋…ผ์ฆ ๊ณผ์ •์ด ์–ด์งธ์„œ ๋ชจ๋ธ์˜ ๊ณ„ํš ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š”์ง€ ๊ฐ€์„ค์„ ์„ธ์šธ ์ˆ˜ ์žˆ๋‹ค.

Figure 6.์˜ ์˜ˆ์‹œ์—์„œ ๊ฐ€์น˜๊ฐ€ ๋‚ฎ๊ฒŒ ํ‰๊ฐ€๋œ โ€œsterpusโ€ (Figure 7. ์ขŒ์ธก)์™€ ๋‚˜๋จธ์ง€ ์„ธ node์˜ ์ฐจ์ด์ ์€ ๋†’์ด์ด๋‹ค.

โ€œsterpusโ€๋Š” leaf node๋กœ target์— ์ด๋ฅด์ง€ ๋ชปํ•œ๋‹ค๋Š” ์‚ฌ์‹ค์„ ๋ฐ”๋กœ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ , ๋‹ค๋ฅธ node๋Š” ๋” ๋†’์•„ ๋ฐฉ๋ฌธํ•  ์ž์‹์ด ์•„์ง ๋‚จ์•„์žˆ์–ด, ํ‰๊ฐ€ํ•˜๊ธฐ ๋” ์–ด๋ ต๋‹ค.

๐Ÿ’ก ๊ฐ€์„ค: ๋” ๋‚ฎ์€ node๋Š” ์ •ํ™•ํžˆ ํ‰๊ฐ€ํ•˜๊ธฐ ๋” ์‰ฝ๋‹ค. (nodes with lower heights are easier to evaluate accurately)

Figure 9. ๋Š” ์ด ๊ฐ€์„ค์— ์ผ์น˜ํ•˜๋Š” ํŒจํ„ด์„ ๋ณด์—ฌ์ค€๋‹ค. ์˜ค๋‹ต node์™€ ์ •๋‹ต node์˜ ๊ฐ€์น˜์˜ ์ฐจ์ด๊ฐ€ ํด ์ˆ˜๋ก ๋ชจ๋ธ์ด ๊ฐ€์น˜๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ํ‰๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋†’์ด๊ฐ€ ๋‚ฎ์€ node์ผ์ˆ˜๋ก ๋‘ ๊ทธ๋ž˜ํ”„์˜ ์ฐจ์ด๊ฐ€ ํฌ๋‹ค.

๊ฒฐ๋ก ์ ์œผ๋กœ, ๊ฒฝ๋กœ ํ™•์ •์„ ๋’ค๋กœ ๋ฏธ๋ฃฐ์ˆ˜๋ก search ๋ฒ”์œ„๋ฅผ ํ™•์žฅํ•˜์—ฌ ์ข…๊ฒฐ ์ƒํƒœ๊นŒ์ง€ ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๊ณ , ๋ชจ๋ธ์˜ ์ •๋‹ต node์™€ ์˜ค๋‹ต node๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋Šฅ๋ ฅ์ด ํ–ฅ์ƒ๋œ๋‹ค

๊ฒฐ๋ก 

์—ฐ์†์ ์ธ ์€๋‹‰ ๊ณต๊ฐ„์—์„œ ๋…ผ์ฆํ•˜๋Š” COCONUT์€ LLM์˜ ๋…ผ์ฆ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค. ๋˜ํ•œ ์€๋‹‰ ์ƒ๊ฐ์€ search tree์™€ ์œ ์‚ฌํ•œ ๊ตฌ์กฐ๋ฅผ ๋ณด์ธ๋‹ค.

  • CoT๋Š” ๊ฐ step๋งˆ๋‹ค ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ํ•˜๋‚˜์˜ ์–ธ์–ด token์œผ๋กœ collapse โ†’ ๋ถˆ์—ฐ์†

TODO

  • ํšจ์œจํ™”
  • COCONUT์„ pretrain์—์„œ๋„ ํ™œ์šฉ
profile
๋ฌผ๋ฆฌํ•™๊ณผ ์กธ์—…/ ์ธ๊ณต์ง€๋Šฅ ๊ฐœ๋ฐœ์ž๋กœ์˜ ํ•œ ๊ฑธ์Œ

0๊ฐœ์˜ ๋Œ“๊ธ€