[paper review] MSCTD: A Multimodal Sentiment Chat Translation Dataset

하나·2023년 5월 4일

ACL2022 MSCTD

Paper review

목록 보기

1/5

MSCTD: A Multimodal Sentiment Chat Translation Dataset

이 논문은 ACL2022 에 억셉된 롱페이퍼이고, multimodal chat translation 이라는 새로운 태스크를 제안하고, 데이터셋을 구축한 논문이다. 대화 번역이라는 새로운 태스크를 알게되어서 재밌게 읽었다!
논문 링크 : https://aclanthology.org/2022.acl-long.186/

Multimodal Chat Translation (MCT) task 제안
Multimodal Sentiment Chat Translation Dataset(MSCTD) 구축

목표 : generate more accurate translations with the help of the associated dialogue history and visual context.

Motivation

대화에서 multimodel machine translation 연구 적음 (데이터셋 없음)

conversation in its natural form is multimodel
visual infomation plays a key role
1. supplementing some crucial scene information (e.g., the specific locations or objects, or facial expressions) 장소나 얼굴 표정
2. resolving ambiguous multi-sense words (e.g., bank) 다의어
3. addressing pronominal anaphora issues (e.g., it/this) 대명사

Contributions

propose a new task : multimodal chat translation named MCT, to advance multimodal chat translation research.
we are the first that contributes the human-annotated multimodel sentiment chat translation dataset.
implement multiple transformer-based baselines and provide benchmarks for the new task.

conduct comprehensive analysis and ablation study to offer more insights.
As a by-product of our MSCTD, it also facilitates the development of multimodal dialogue sentiment analysis.

Dataset

Open ViDial dataset

Experiments results

(multiple Transformer-based model 사용)

The dialogue history indeed is beneficial for better translations

modeling the coherencee characteristic in conversation is crucial for higher results.
The models with image features incorporated get higher results than corresponding text-based models
The dialogue history and the image features obtain significant cumulative benefits
Among these image-based models, we observe that different fusion manners of text and image features reflect great difference on effects. → there is much room for further improvement using other more advanced fusion methods
Using FOV image features is generally better than the coarse counterpart CSV, which demonstrates that the fine-grained object elements may offer more specific and effective information for better translations.

the sentiment factor, as the inferent property of conversations, indeed has a positive impoact on translation performance.

Evaluation

BLEU
METEOR
TER

하나

다음 포스트