[논문리뷰] DemoFusion: Democratising High-Resolution Image Generation With No 📝

최지우·2024년 7월 18일

Any_size_diffusion cvpr2023 논문리뷰 생성모델

논문리뷰

목록 보기

3/3

CVPR2023
DemoFusion
Any size diffusion task
paper

초록

High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls. This paper aims to democratise high-resolution GenAI by advancing the frontier of high-resolution generation while remaining accessible to a broad audience. We demonstrate that existing Latent Diffusion Models (LDMs) possess untapped potential for higherresolution image generation. Our novel DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as “previews”, facilitating rapid prompt iteration.

특정 scale이 아닌 여러 scale로 생성할 수 있게끔. 고화질로 뽑겠다.

Introduction

GenAI를 돌리는 건 자원이 많이 요구됨. data, hardware, energy 등등 레파토리 비슷함.
그래서 응응. 우리가 DM의 고화질 이미지를 더더 많은 해상도로 push 해줄게! 라는 아이디어. 신기한건 training-free, 몇 줄의 코드로 접근했다는 거다! 여러 번 돌리는 거라 시간이 좀 더 걸리지만 해상도는 좋다 라고 주장.

Off the shelf SDXL은 더 높은 고화질 이미지 생성하는 거에 실패함. text-to-image LDM은 학습 중에 잘린 이미지를 만나게 됨. 따라서 학습하거나 augmentation에 내재적으로 잘린 이미지가 포함됨. 따라서 SDXL은 작은 국소화된 부분에 초점을 둔 outputs을 생성한다. 고화질 patch 단위가 완벽한 scene으로 된다는 경우에 충분한 prior knowledge가 존재한다.
하지만!! patch wise HR generation은 어려움. Demofusion은 MultiDiffusion(파노라마 이미지를 생성하기 위해서 중첩된 denosing path들을 fusing함. global context for semantic coherence가 부족함.)과 동일한 아이디어로 접근하는데, fusing multiple denoising paths from a pre-trained SDXL model이다.

Progressive Upsampling : upsample-diffuse-denoise loop를 통해서 이미지를 향상시킨다. noise-inversed lower-resolution image을 초기화로 사용.
Skip Residual : HR와 LR 사이의 Global consistency(일관성)를 유지하면서 intermediate noise-inversed representation을 skip residual로 사용.
Dilated Sampling : denoising paths의 dilated sampling을 사용하면서 global semantic coherence를 증가시킨다. (MultiDiffusion을 개선시킨 느낌.)

https://kimjy99.github.io/%EB%85%BC%EB%AC%B8%EB%A6%AC%EB%B7%B0/multidiffusion/
를 Multidiffusion을 참고해야할듯, crop sampling이 뭔지? 메소드만 이해해보기.

progress Upscalig때문에 런타임이 오래 걸리겠지만 메모리 소비는 적고, 점진적인 생성을 통해 프리뷰를 제공하기 때문에 만족스러울 때까지 기다릴 수 있다? 라는 장점

Method

Progressive Upscaling
diffusion process와 denoising process가 이렇게 정의된다면,

로 높은 해상도의 이미지를 얻을 수 있다. 우변의 term을 1,2,3 순서대로 정의한다면 첫번째 텀은 SDXL과 같은 LDM의 denoising이고, 두번째 텀은 Bicubic으로 키운 Up sampling된 이미지의 diffusion process이며, 세번째는 동일하게 Up sampling 된 이미지의 denoising process이다.
Skip Residual (최적화 및 noise를 감소)

diffusion에서 보통 바로 forward pass 안 하고, t단계를 걸쳐서 서서히 진행하는 것이 일반적임. 근데 t가 클 수록 정보 손실의 우려가 있고 작을수록 Upsampling으로 인한 noise가 강해짐

이를 막기 위해 Skip Residual 을 적용함. 보니까 t단계의 원본이미지와 t단계의 샘플링된 이미지를 넣어서 SR을 하더라

c1값은 scaled cosine decay factor인데 알파값을 통해서 denoising 단계에서 얼만큼 이전 phase의 결과값을 이용할 건지를 조절하는 factor라고 생각하면 된다.

Dilated Sampling
receptive field를 확장시켜서 global context를 더 확보하겠다는 목적으로 사용함. 당연히 diffusion 과정에서 씀(feature를 뽑아서 넘겨야 하기 때문에..)

Latent를 이용하니까 당연히 해당 저자도 dilated sampling을 latent representation 내에서 직접적으로 적용한다.

라고 한다면,

Z(local)은 MultiDiffusion의 shifted crop sampling이다. 모두 합친 것이 diffusion pass의 다음 단계 zt-1 이다. 직접적으로 Dilated sampling을 사용하면 이미지가 거칠어질 수 있는데 그것은 샘플링하기 전에 Gaussian filter를 이용하여 해소한다.

filter의 표준 편차는 시그마1에서 시그마2로 감소한다. scaling factor를 부여해서 denosing 방향이 일관적이게 하고, 이미지가 blurry되는 것을 방지한다.

최지우

이전 포스트

[논문리뷰] DemoFusion: Democratising High-Resolution Image Generation With No 📝

논문리뷰

초록

Introduction

Method

[논문리뷰] High-Resolution Image Synthesis with Latent Diffusion Models 📝

0개의 댓글