Multi-modal

1.[논문 리뷰] Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks (BEiT-3)

post-thumbnail

2.[논문 리뷰] BEIT: BERT Pretraining of Image Transformers

post-thumbnail

3.[논문 리뷰] BEIT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

post-thumbnail

4.[논문 리뷰] CLIP

post-thumbnail

5.[논문 리뷰] e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

post-thumbnail

6.[논문 리뷰] Video-Text Representation Learning via Differentiable Weak Temporal Alignment

post-thumbnail