KCTS - Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection

ingeol·2023년 12월 6일

NLP

1. Selection

2. Expansion

3. Rollout(Evaluation)

4. backpropagation

4.2 Token-Level Hallucination Detection

RIPA - Reward InflectionPoint Approximation

논문리뷰

목록 보기

24/63

문제 : hallucination 해결 -> decoding 방식의 변화가 hallucination 완화 가능해진다. 하지만 말을 할 때 지식을 확인하는 방법이 부재
MCTS를 이용

KCTS : discriminator guided decoding method - 지식을 기반으로 디코딩 생성 제약시키는 방법
RIPA : hallucination detector - 시작점을 잡아냄
KCD : auxiliary knowledge classifier - RIPA(sequence 수준) 인데 이를 돕는 요소(token 수준의 적응을 가능하게 함.)

용어 :

guided decoding : 제약, 지도, 식별을 통해서 decoding 딴에서 관련 감정 스타일 관련 지식을 align 시키는 방법 (token-level distribution 을 사용해 sampling시켜 re-weighted token probabilities 같은 테크닉을 사용함.) -> hallucination 완화, 신뢰성있는 텍스트 생성 가능.
Right-hand-side coherence(제약 방식) : 문맥과 동일성을 유지시켜주는 텍스트 생성방법 -> constraint satisfaction측정을 하는데 이는 완전히 생성된 후, 측정가능하다(서치베이스 방법)
groundness : $f(y,k)=P(a_{k} = 1|y,k)$ 이 함수 $f$ 는 y에서 k까지 faithful한지 측정한 것 -> 제약시키는데 사용되는 함수
inflection point : hallucination으로 전환되는 부분
reward = groundness

weigted decoding : 생성 중 측정 가능. RIPA : 생성 중 측정 가능
저자들은 weighted decoding 방식과는 다르게 미래 reward를 고려하며 token선택이 가능하게 설계되었다고 함.

4.

$y$ : 생성된 tokens, $x$ : inputtext, $k$ : $y$ 로부터 생성된 토큰 중 제약되어야 할 부분 token 을 의미
root node : $y{<t}$ 현재까지 생성된 sequence
each node : $v$
parent node : $\rho (v)$
$y_{<v}$ 토큰들은 모두 탐색이 진행된다.

1. Selection

puct 알고리즘

\operatorname{puct}(i)=\frac{V\left(s_i\right)}{n_i}+c_{p u c t} P\left(y_{s_i} \mid x, y_{<s_i}\right) \frac{\sqrt{N_i}}{1+n_i},

$V(s_i)$ : $s_i$ 에서 추정한 groundedness 값
$s_i$ : 노드들을 나타낼 때 씀
$n_i$ : $s_i$ 의 시뮬레이션 숫자
$N_i$ : $s_i$ 의 부모노드 count한 것
$c_{puct}$ : exploration/exploitation 하이퍼 파라미터 -> 높으면 exploration 증가
자식 노드들은 높은 $puct(i)$ 값 가진 것을 선택하며 진행

2. Expansion

선택된 노드가 leaf node인데 eos가 아니라면 top-k를 이용해 선택 진행한다.

3. Rollout(Evaluation)

leaf node로 부터 EOS 생성될 때 까지 진행 후 groundedness of the generated sequence, $f(y,k)$ 생성함. 이때 s, $V(s) = f(y,k)$ 가 된다. 그러나 full roolout은 코스트가 너무 비싸고 variance가 높아지는 문제점이 존재하기 때문에 token-level groundedness score를 구함 : $V(s) = f(y_{<s_i},k)$

4. backpropagation

V\left(\rho\left(s_i\right)\right) \leftarrow \frac{N_i \cdot V\left(\rho\left(s_i\right)\right)+f\left(y_{<s_i}, k\right)}{n_i}

업데이트 수식. $s_i$ 는 leafnode $s$ 로 부터의 path (root 까지)
이 값은 다시 위에 Select 에서 다음 시뮬레이션에 사용된다. 해당 1~4의 과정을 사전에 정한 시뮬레이션 횟수만큼 반복 -> root의 child node 중 visit count가 가장 높은게 next token으로 선택된다.

4.2 Token-Level Hallucination Detection

이전 작업 2가지 종류
단점 말함

RIPA - Reward InflectionPoint Approximation

RIPA = groundness를 위한 token-level label을 제공, un-finished token sequence를 위한 $f$ 의 근사치를 구하는 작업을 진행

해당 논문에서는 groundedness에 대한 inflection point를 확인하는 것이 더 효과적으로 future score를 근사화 할 수 있다고 가정했다.

첫번 째 hallucination tokens 이후에 적어도 하나의 hallucination token을 포함한다 (?)
결과적으로 RIPA는 양성 토큰과 관련되어 있지 않게 된다. -> 더 안정적인 학습을 가능하게 한다(?)
추가적으로 0으로 예측한 토큰 이후로 모두 0으로 만들어 버리는 것이 MCTS에서 future exploration 을 줄일 수 있게 해준다.

ingeol

이전 포스트

RLAIF

다음 포스트

KCTS - Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection

논문리뷰

4.

1. Selection

2. Expansion

3. Rollout(Evaluation)

4. backpropagation

4.2 Token-Level Hallucination Detection

RIPA - Reward InflectionPoint Approximation

RLAIF

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

0개의 댓글

관련 채용 정보