220620_sally

강영어·2022년 6월 19일
0

매일 영어 기사

목록 보기
3/3

기사원문

How does DALL-E 2 work?

Abstract

DALL-E 2 is AI system that is capable of generating realistic and high-resolusion images using a description in the form of natural language.

Below is an example of DALL-E.

Description: A bowl of soup that is a portal to another dimension as digital art.

DALL-E 2 consists of two parts:
1. prior : To convert captions or the text input into a representation of an image.
2. decoder : To convert this representation of an image into an actual image

The Text and Image embeddings are created by CLIP (Contrastive Language-Image Pre-training). CLIP is constactive model, therefore it tries to match the image with its correspoding caption.

DALL-E 2, the research scientists at OpenAI tried two types of prior for DALL-E 2, namely autoregressive prior and diffusion prior. After experimentation, they concluded that diffusion prior was performing better as compared to autoregressive prior.

The decoder is also a diffusion model, but a tuned or a modified diffusion model. The model is GLIDE.

Thoughts

I was surprised that the model was simpler than I expected. It was interesting because the resulting image was higher resolution then I expected. However I should study more about the concept of diffusion prior.
It's far from the project I'm doing at the company, but I think it's necessary to look at other models like this. By doing so, I can maintain my interest in artificial intelligence.

profile
모든 문장을 이해해버리겠다. 텝스 만점? 어렵지 않아 !

0개의 댓글