[Papers] Efficient Intent Detection with Dual Encoders 🔫

KwanHong·2020년 12월 22일

NLP

Papers

목록 보기

3/3

🎊개요

Conversational AI 서비스 중 하나인 PolyAI(https://www.polyai.com/)의 research paper를 정리한다.
논문 - https://arxiv.org/pdf/2003.04807.pdf

❔ Introduction

Intent detection in task-oriented conversational system

대화 시스템은 사용자의 현재 goal을 이해하기 위하여, intent detector를 사용하여 사용자의 발화를 분류한다.
❗ 새로운 도메인과 task를 지원하기 위해 intent detector를 확장하는 일은 어렵고 자원이 많이 소모되는 과정이다.
- 도메인 지식 전문가와 도메인 특정(domain-specific) 데이터 셋이 필요하기 때문에, 신속하고 광범위하게 intent detector를 배치하기에 어려움이 있다.
- 인텐트 별로 몇 개의 샘플 데이터 밖에 없는 low-data scenario 상황에서, 효과적으로 인텐트를 인식할 수 있어야 한다.

Pretraining methods in few-shot scenarios

부족한 도메인 데이터의 문제를 해결하기 위해, 미리 학습된 인코더를 이용하여 전이학습을 하는 방식이 대세이다.
BERT와 같은 보편적인 문장 인코더를 그대로 적용하는 것은 최선이 아닐 수 있다.
- 대화 관련 task에서는, 일반적인 언어 모델(language modeling)방식은 응답 선택 task 기반 학습인 conversational pretraining보다 덜 효과적일 수 있다.
- BERT나 BERT의 변형 모델을 fine-tuning 하는 것은 모델 전체를 도메인에 적응(adaptation)시키기 때문에 자원의 소모가 많이 필요한 작업이다.
  - 더 나아가, 이 방식은 few-shot scenario에서 오버피팅(overfitting)을 발생시킬 수 있다.
- 이러한 속성들로 인해 매우 느리고, 복잡하고, 비용이 많이 드는 개발 순환 과정(development cycle)으로 이어진다.

Dual sentence encoders

USE(Universal Sentence Encoder)나 ConveRT와 같은 문장 쌍을 모델링하는 신경망 구조 모델을 기반으로 하는 Dual sentence encoder 구조를 제안한다.

Advantages
- USE(Universal Sentence Encoder)와 ConveRT 기반 intent detector가 BERT를 기반으로 했을 경우보다 few-shot scenario에서도 더 높은 성능을 보여준다.
- 모델의 크기가 상대적으로 작고 학습 비용도 크지 않다 (compactness)
- 하이퍼파라미터 변경으로 인한 성능 변동이 크지 않음(하이퍼파라미터 튜닝 비용 감소)

🎣 Methodology: Intent Detection with Dual Sentence Encoders

Pretrained Sentence Encoders

특정 태스크 또는 도메인에 맞추어 모델 전체를 적응(adaptation)시키는 fine-tuning 과정이 필요하다.
Fine-tuning 과정은 비용 소모가 있으며, few-shot scenario에서 오버피팅 되거나 최적의 결과를 얻지 못 할수 있다.

Dual Sentence Encoders and Conversational Pretraining

Conversational pretraining는 기존의 언어 모델 기반 학습보다 dialouge act prediction나 next utterance generation와 같은 대화 태스크에 더 잘 맞는다.
Dual 모델은 입력 문장/문맥에 대응하는 응답과의 관계를 학습하는 dual-encoder 구조이다.
본 연구에서는 response selection task로 학습한 USE(Universal Sentence Encoder)와 ConveRT에 초점을 맞추었다.

Intent Detection with dual Encoders

USE와 ConveRT로 인코딩한 고정 문장 표현 임베딩(fixed sentence representation)을 사용
ReLU activation을 가진 단일 은닉층인 Multi-Layer Perceptron(MLP) layer 위에 multi-class 분류를 위한 소프트맥스 층을 쌓는다.
각각의 dual encoder에서 나온 문장 벡터를 concatenate하여 입력할 수 있다.

🔬 Results and discussion

두 개의 dual model를 조합하였을 경우( USE+ConveRT ), 상호보완적 정보를 포착하여 더 높은 성능을 보여줌
BERT는 pretraining의 목적이 다르기 때문에, fine-tuning을 한 BERT-TUNED 모델에서 의미있는 성능을 확인할 수 있음

Few-Shot Scenarios

데이터 샘플이 적은 케이스(few-shot scenario)에서 BERT-TUNED 보다 dual encoders를 사용한 모델의 더 나은 성능을 확인할 수 있음

few-shot scenario에서 사용하는 intent detector는 validation set에 대한 하이퍼파라미터 튜닝과 무관하게 off-the-shelf 방식으로 사용할 수 있어야 바람직하다.
- 본 논문에서는 intent detector의 신뢰성 보장과 오버피팅 방지를 위해 공격적인 dropout(i.e. dropout rate 0.75)과 많은 학습 반복(500 iteration)을 진행함
하이퍼파라미터 설정을 단계적으로 변경하며 성능 테스트
- Dual-based 모델은 하이퍼파라미터 변경에 따른 성능의 변경 폭이 크지 않음(robust)
- few-shot scenario에서 BERT-FIXED 모델의 최고 성능과 평균 성능의 편차가 큰 아웃라이어도 관찰됨

Resource Efficiency

10개의 샘플 few-shot scenario에서 학습 및 평가 소요 시간
GPU 또는 TPU 자원이 필요한 대규모 모델이 아니라, CPU에서도 학습 가능한 효과적인 dual encoder 기반 intent detector 구축 가능

🎉 Conclusion

USE와 ConveRT와 같은 dual encoder 모델로 인텐트 분류 태스크에서 높은 성능을 보여줌
실제 비즈니스 현업에서처럼 작은 규모의 가공된 데이터셋(annotated samples)만 사용가능한 경우, 논문의 방식을 사용하는 것이 BERT-based classifier를 매번 적응시키는 것보다 얻는 이득이 크다.

KwanHong

본질에 집중하려고 노력합니다. 🔨

이전 포스트

[Papers] Efficient Intent Detection with Dual Encoders 🔫

Papers

🎊개요

❔ Introduction

Intent detection in task-oriented conversational system

Pretraining methods in few-shot scenarios

Dual sentence encoders

🎣 Methodology: Intent Detection with Dual Sentence Encoders

Pretrained Sentence Encoders

Dual Sentence Encoders and Conversational Pretraining

Intent Detection with dual Encoders

🔬 Results and discussion

Few-Shot Scenarios

Resource Efficiency

🎉 Conclusion

[Papers] DIET: Lightweight Language Understanding for Dialogue Systems 🏃‍♂️

0개의 댓글