Reinforcement Learning / Self-Chat ⇒ Human In the Loop
hard to learn directly from human interaction
Few-Shot / Zero-Shot Learning ⇒ Better Strategy
collecting dataets is hard
less data-intensive is crucial
how do we learn to converse in new domain by learning from another domain
Lifelong Learning ⇒ User experience
relatively unexplored area
Mitigating Inappropriate Response ⇒ On the Model
build classifiers to filter out inappropritate examples(preprocessing)
block n-gram from sensitive word list(decoding)
Multimodal ⇒ still challenge
talk & able to see
conversations grounded on images, VR environment
Evaluation ⇒ Better Automatic evalution
N-gram based ⇒ fail to capture semantic meaning
turn-level ⇒ fail to capture repetition, consistency between turns
perplexity vs human evaluation
Shared Tasks&Datasets ⇒ need more