[부스트캠프 AI tech DL Basic] week04 (2022.02.09)

redgreen·2022년 2월 10일

부스트캠프 AI tech 3기

목록 보기

13/40

07 Sequential Models - RNN

- Vanilla RNN(Recurrent Neural Network)

long-term dependencies가 약함

- LSTM(Long Short Term Memory)

forget gate: 버릴 정보를 결정함

input gate: cell state에 저장할 정보를 결정함

output gate: updated cell state를 이용해서 output을 만듦

- GRU(Gated Recurrent Unit)

gate가 2개(reset gate, update gate)

No cell state, just hidden state

(07 - 실습) LSTM(colab)

08 Sequential Models - Transformer

09강 Generative Models 1

explicit model: 입력이 주어졌을 때 확률 값을 얻어낼 수 있는 모델

implicit model: 단순히 generation만 할 수 있는 모델

unsupervised representation learning

Basic Discrete Distributions

Bernoulli dist.
: $X$ ~ $Ber(p)$

Categorical dist.
: $Y$ ~ $Cat(p$ 1,... $p$ m)

Example

$(r, g, b)$ ~ $p(R, G, B)$

256 x 256 x 256개의 경우(case)
--> 2563 - 1개의 파라미터가 필요

Structure Through Independence

$n$ 개의 픽셀들이 모두 independent하다고 가정
: $p$ ( $x$ 1, ..., $x$ $n$ ) = $p$ ( $x$ 1) $p$ ( $x$ 2)... $p$ ( $x$ $n$ )

가능한 경우의 수(possible state)?
: 2 $n$

$p$ ( $x$ 1, ..., $x$ $n$ )를 위해 필요한 파라미터 수?
: $n$

Conditional Independence

Three important rules

Chain rule:
$p$ ( $x$ 1, …, $x$ $n$ )
= $p$ ( $x$ 1) $p$ ( $x$ 2 | $x$ 1) $p$ ( $x$ 3 | $x$ 1, $x$ 2)⋯ $p$ ( $x$ $n$ | $x$ 1, ⋯, $x$ $n$ −1)
parmeter 수:

$p(x$ 1 $)$ : 1 parameter

$p(x$ 2 $|x$ 1 $)$ : 2 parameters
: $p(x$ 2 $|x$ 1= $0)$ , $p(x$ 2 $|x$ 1= $1)$ 인 경우

$p(x$ 3 $|x$ 1 $, x$ 2 $)$ : 4 parameters
Hence, 1 + 2 + 22+...+2 $n-1$ = 2 $n$ - 1

Bayes' rule:

conditional independence:
- $z$ 가 주어졌을 때 $x, y$ 가 independent하다면,

Markov assumption:
$i+1$ 번째 값은 $i$ 번째 값과 dependence하고 $1...i-1$ 번째 값과는 independence 하다.
- by leveraging Markov assumtion
$p$ ( $x$ 1, …, $x$ $n$ )
= $p$ ( $x$ 1) $p$ ( $x$ 2 | $x$ 1) $p$ ( $x$ 3 | $x$ 2)⋯ $p$ ( $x$ $n$ | $x$ $n$ −1)
으로 chain rule식을 바꿀 수 있다.
parameter 수
: $2n - 1$

Auto-regressive Model

하나의 정보가 이전 정보(1개 or all)에 dependent 한 것

Let's use the chain rule to factor the joint dist.
$p(x$ 1:784 $) = p(x$ 1 $)p(x$ 2 $| x$ 1 $)p(x$ 3 $|x$ 1:2 $)⋯$

NADE: Neural Autoregressive Density Estimator

explicit model

Pixel RNN

Row LSTM

Diagonal BiLSTM

10 Generative Models 2

Question

AutoEncoder는 generative model인가?

엄밀한 의미에서 explicit한 모델

Variational Auto-encoder(논문)

Variational inference(VI)
: posterior distribution을 근사하는 variational distribution을 최적화한 것이 목적

posterior distribution: $p$ $θ$ $(z|x)$

Variational distribution: $q$ $ϕ$ $(z|x)$

특히 KL divergence between true posterior를 minimize하는 variational distribution을 찾는 것이 목적

ELBO?
: $log$ θ $P(x)$ , 데이터 log-likelihood의 Evidence Lower Bound

KL Divergence(wiki)?
:

Adversarial Auto-encoder(colab)

GAN(Generative Adversarial Network)(1시간만에 GAN이해하기)

GAN Objective

two player minimax game between Generator and Discriminator