이 논문은 ACL2022 에 억셉된 롱페이퍼이고, multimodal chat translation 이라는 새로운 태스크를 제안하고, 데이터셋을 구축한 논문이다. 대화 번역이라는 새로운 태스크를 알게되어서 재밌게 읽었다!
논문 링크 : https://aclanthology.org/2022.acl-long.186/
목표 : generate more accurate translations with the help of the associated dialogue history and visual context.
대화에서 multimodel machine translation 연구 적음 (데이터셋 없음)
propose a new task : multimodal chat translation named MCT, to advance multimodal chat translation research.
we are the first that contributes the human-annotated multimodel sentiment chat translation dataset.
implement multiple transformer-based baselines and provide benchmarks for the new task.
conduct comprehensive analysis and ablation study to offer more insights.
As a by-product of our MSCTD, it also facilitates the development of multimodal dialogue sentiment analysis.
(multiple Transformer-based model 사용)
The dialogue history indeed is beneficial for better translations
modeling the coherencee characteristic in conversation is crucial for higher results.
The models with image features incorporated get higher results than corresponding text-based models
The dialogue history and the image features obtain significant cumulative benefits
Among these image-based models, we observe that different fusion manners of text and image features reflect great difference on effects. → there is much room for further improvement using other more advanced fusion methods
Using FOV image features is generally better than the coarse counterpart CSV, which demonstrates that the fine-grained object elements may offer more specific and effective information for better translations.