two generative models for vanilla seq2seq system
Maximum Likelihood Estimation(MLE) : 가장 데이터를 잘 설명하는 분포를 likelihood를 통해 구함. output은 probability distribution over the vocab
Likert(리커트 척도)
Dynamic Likert
A/B
Dyamic A/B
Lack of diversity ⇒ Diversify Responses
a. training, decoding 전략 : Maximum Mutual Information
b. model 구조 : conditional variational autoencoder
c. more data, large model : large scale pre-training
d. decoding 전략 : top-k sampling, Nucleus Sampling
vanilla seq2seq 문제점 : small conversational corpus, overfit easily, generate repetitive response, not grammatical
해결 : use pre-trained model(2가지 있음)
Lack of consistency
a. learning speaker embedding : speaker attributes feature vector
b. conditioning on persona descriptions
둘 다 특정 dataset이 필요함 : human-to-human conversations + persona features
Lack of knowledge
a. Textual
b. Graph
dialogue history → knowledge graph → subgraph → encoder → decoder → response
1 hop reasoning : all knowledge in dialogue
multi hop reasoning : neural retriever
c. Tabular
convert tabular knowledge into triples(?) tribles?
KVR, Mem2Seq, Neural Assistant
d. Service API interaction
user request를 처리하기 위해 API가 필요할 수도 있음
response와 API call 두개를 다 generate해야함
Lack of empathy
a. emotional response generation
- MojiTalk
- Emotional Chatting Machine
b. Understand user’s emtion + response generation
- Empathetic Dialogues
- MoEL
- Cairebot
Lack of controllability
a. low-level attribute ⇒ Conditional Training+Weight Decoding
conditional training + weight decoding
b. fine-tuning ⇒ arXivstyle and Holmes-style
fine-tune with special loss function on word/sentence level
inject the target style into model
c. perturbation ⇒ PPLM(Plug and Play Conversational Models) + Residual Adapters
control the style and topic of the responses
d. conditioned generation ⇒ Retrieve&Redefine + PPLM + CTRL
Lack of versatility(+task or more)
ToDs + Chit-Chat
task oriented나 chit-chat 외에도, (구조를 보면 다들 비슷비슷하다 따라서 유사하게) output은 input에 따라 무궁무진하게 바뀐다