시리즈

Multimodal Study

1.[Multimodal_01] Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision(ICML 2021)

Pre-trained representations가 NLP와 perception task에 더 중요해지고 있다NLP에서 representation을 학습하는 것은, human annotations 없이 raw text에서 train하는 것으로 transition 되었지

2023년 2월 14일

2.[Multimodal_02 ] Zero-Shot Text-to-Image Generation(2021, OpenAI)

4 Conclusion

2023년 2월 27일

3.[Multimodal_03] Multimodal Abstractive Summarization for How2 Videos(ACL, 2019)

open-domain videos에 대한 abstractive summarization 연구지금까지 연구되온 text news summarization과 달리, 목표는 video, audio transcript(text)를 fluent textual sumary 형태로

2023년 3월 9일

4.[논문리뷰] Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (ICML21)

Text-to-speech (TTS)는 여러 가지 구성요소로, 주어진 텍스트의 원시 음성 파형 (raw speech wavefrom)을 합성함심층신경망의 빠른 발전으로, TTS 시스템 파이프라인은 텍스트 전처리 (정규화, 음소화 등)와 별개로 이중 생성 모델링으로 단순

2023년 9월 14일