[배경]
[기존연구한계]
[Graph of Thoughts]
[결과 & Contribution]
Self-Consistency Improves Chain of Thought Reasoning in Language Models (ICLR ’23)
수학문제 풀 때, 다양한 방법으로 푼 다음, 가장 적절한 결과를 정답으로 도출
두 개 논문이 거의 동시에 나왔음. 개념은 완전 똑같은 듯. 근데 뒤에 논문이 NeurIPS에 나오기도 했고, 인용수도 훨씬 많음
(1) Large Language Model Guided Tree-of-Thought (arXiv, '23) 2023.5.15 submitted
(2) Tree of Thoughts: Deliberate Problem Solving with Large
Language Models (NeurIPS ’23) 2023. 5.17 submitted
![]()
Goal:
LLM의 Reasoning 과정을 arbitrary graph로 표현하여, 더 복잡하고, 더 사람의 문제해결 방식과 유사한 과정을 표현해 내겠다!
Why?:
인간의 문제해결과정은 non-linear하며, 최종 결론을 내리기까지 기존의 사고들이 결합하기도 (Aggregation), 기존 애들을 바탕으로 새로운 아이디어를 만들어내기도 함(Generation). 그런데 CoT, ToT 애들로는 이런 특징이 표현이 안된다.
How?
thought transformation 방식 (generation, aggregation, refine 등)을 적용해서 reasoning process를 graph로 표현한다.
Contribution
하나의 thought로 합치기
self-loop
여러개의 subtask로 분할 or 새로운 thought들로
: LLM에게 줄 메세지 준비
: LLM thought에서 정보 추출
: LLM thought를 verify
: 전체 reasoning process를 총괄하고, 어떻게 진행할 지 결정
task가 뭐냐에 따라 사전에 정의되어 있어야함 (trasformation의 순서나, dependencies). 유동적으로 알아서 해주는 줄 알았는데, 그게 아니었음. 이게 GoT의 한계인듯. 분해되지 않는 task엔 적용하기 힘들듯..??
예시 코드: Generate
class Generate(Operation):
"""
Operation to generate thoughts.
"""
operation_type: OperationType = OperationType.generate
def __init__(
self, num_branches_prompt: int = 1, num_branches_response: int = 1
) -> None:
"""
Initializes a new Generate operation.
:param num_branches_prompt: Number of responses that each prompt should generate (passed to prompter). Defaults to 1.
:type num_branches_prompt: int
:param num_branches_response: Number of responses the LM should generate for each prompt. Defaults to 1.
:type num_branches_response: int
"""
super().__init__()
self.num_branches_prompt: int = num_branches_prompt
self.num_branches_response: int = num_branches_response
self.thoughts: List[Thought] = []
def get_thoughts(self) -> List[Thought]:
"""
Returns the thoughts associated with the operation.
:return: List of generated thoughts.
:rtype: List[Thought]
"""
return self.thoughts
def _execute(
self, lm: AbstractLanguageModel, prompter: Prompter, parser: Parser, **kwargs
) -> None:
"""
Executes the Generate operation by generating thoughts from the predecessors.
The thoughts are generated by prompting the LM with the predecessors' thought states.
If there are no predecessors, the kwargs are used as a base state.
:param lm: The language model to be used.
:type lm: AbstractLanguageModel
:param prompter: The prompter for crafting prompts.
:type prompter: Prompter
:param parser: The parser for parsing responses.
:type parser: Parser
:param kwargs: Additional parameters for execution.
"""
previous_thoughts: List[Thought] = self.get_previous_thoughts()
if len(previous_thoughts) == 0 and len(self.predecessors) > 0:
return
if len(previous_thoughts) == 0:
# no predecessors, use kwargs as base state
previous_thoughts = [Thought(state=kwargs)]
# 이전 thought 있을 경우!! task가 generate 이라 generate prompt 가져옴
for thought in previous_thoughts:
base_state = thought.state
prompt = prompter.generate_prompt(self.num_branches_prompt, **base_state)
self.logger.debug("Prompt for LM: %s", prompt)
responses = lm.get_response_texts(
lm.query(prompt, num_responses=self.num_branches_response)
)
self.logger.debug("Responses from LM: %s", responses)
for new_state in parser.parse_generate_answer(base_state, responses):
new_state = {**base_state, **new_state}
self.thoughts.append(Thought(new_state))
self.logger.debug(
"New thought %d created with state %s",
self.thoughts[-1].id,
self.thoughts[-1].state,
)
if (
len(self.thoughts)
> self.num_branches_prompt
* self.num_branches_response
* len(previous_thoughts)
and self.num_branches_prompt > 0
):
self.logger.warning(
"Generate operation %d created more thoughts than expected",
self.id,
)
self.logger.info(
"Generate operation %d created %d new thoughts", self.id, len(self.thoughts)
)
Class Improve
for thought in previous_thoughts:
improve_prompt = prompter.improve_prompt(**thought.state)
self.logger.debug("Prompt for LM: %s", improve_prompt)
responses = lm.get_response_texts(lm.query(improve_prompt, num_responses=1))
self.logger.debug("Responses from LM: %s", responses)
state_update = parser.parse_improve_answer(thought.state, responses)
self.thoughts.append(Thought({**thought.state, **state_update}))
generate prompt
=> 다 사전 정의되어 있어야하는 것이 코드에서도 확인되는 구만!
if current is None or current == "":
return self.got_split_prompt.format(input=input)
# if current is just a sublist of the original input, return the split prompt
if kwargs["phase"] == 1:
return self.sort_prompt.format(input=current)
if (
"unsorted_sublist" in kwargs
and kwargs["unsorted_sublist"] != ""
and len(kwargs["unsorted_sublist"]) < len(original) - 5
):
original = kwargs["unsorted_sublist"]
return self.tot_improve_prompt.format(
input=original,
incorrectly_sorted=current,
length=len(utils.string_to_list(original)),
)
ToT보다 성능과 cost더 좋음
Task가 복잡해져도 잘한다
P=128일 때 (task 수가 128개)
Task별 결과
Sorting
Set intersection
Keyword Counting
Document Merging
GoT라는 새로운 개념 알수있어서 좋았습니다!! 재밌는 내용이네요 !! 🤩