DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

jihyelee·2023년 10월 5일

DetectGPT

원본 텍스트의 log probability와 원본 텍스트에 약간의 변형을 가한(=perturbation) 다른 텍스트 들의 평균 log probability를 비교
- 변형을 가한 텍스트들이 더 낮은 평균 log probability를 보이면 언어모델이 생성한 텍스트일 가능성 높음
- 언어모델이 생성한 텍스트들의 경우 모델의 log probability curvature가 사람이 작성한 텍스트보다 negative
  - ... perturbation discrepancy approximates a measure of the local curvature of the log probability function near the candidate passage
  - ... proportional to the negative trace of the Hessian of the log probability function

데이터셋
- XSum dataset (뉴스)
- Wikipedia paragraphs from SQuAD contexts
- prompted stories from the Reddit WritingPrompts
- English, German of WMT16
- PubMedQA
모델
- perturbation을 적용하기 위해 T5-3B, T5-11B, mT5-3B 사용
- GPT-NeoX, GPT-2, GPT-J, GPT-Neo-2.7B, GPT-3, Jusrassic-2 Jumbo로 문장 생성
베이스라인
- token log probabilities
  - 높은 평균 log probabilities를 가지는 문단이 모델이 생성한 문단일 가능성 높음
- token ranks
  - 평균 (로그)순위가 작은 경우 모델이 생성한 문단일 가능성 높음
- predictive entropy
  - 엔트로피가 낮은 경우 모델이 예측에 과하게 확신을 가지고 있는 것이기 때문에 모델이 생성할 문단일 가능성 높음
  - 많은 경우 예측 엔트로피는 문단의 fakeness와 양의 상관관계를 보임
평가 지표
- AUROC (area under the receiver operating characteristic curve)
  - 분류기가 모델이 생성한 예시에 사람이 작성한 예시보다 더 높은 순위를 정확하게 줄 확률