Ai tech Day15

Lee·2021년 2월 5일
0

Generative Models Part 1

Learning a Generative Model

  • Suppose we are given images of dogs.
  • We want to learn a probability distribution p(x)p(x) such that
    • Generation: If we sample xnewx_{new} ~ p(x)p(x), xnewx_{new} should look like a dog (sampling).
    • Density estimation: p(x)p(x) should be high if xx looks like a dog, and low otherwise (anomaly detection).
      • Also known as, explicit models.
    • Unsupervised representation learning: We should be able to learn what these images have in common, e.g., ears, tail, etc (feature learning).
  • Then, how can we represent ?

Basic Discrete Distributions

  • Bernoulli distribution: (biased) coin flip
    • DD = {HeadsHeads, TailsTails}
    • Specify P(X=Heads)=pP(X=Heads) = p. Then P(X=Tails)=1pP(X = Tails) = 1 - p.
    • Write: XX ~ Ber(p)Ber(p).

  • Categorical distribution: (biased) m-sided dice
    • DD = {1,,m1,\cdots,m}
    • Specify P(Y=i)=piP(Y=i)=p_i,such that i=1mpi=1\sum_{i=1}^{m} p_i = 1
    • Write: YY ~ Cat(p1,,pm)Cat(p_1, \cdots,p_m)

  • Example
    • Modeling an RGB joint distribution (of a single pixel)
      • (r, g, b) ∼ p(R,G, B)
      • Number of cases?
        256×256×256256 \times 256 \times 256
      • How many parameters do we need to specify?
        255×255×255255 \times 255 \times 255

  • Example
    • Suppose we have X1,,XnX_1, \dots,X_n of nn binary pixels (a binary image).
      • How many possible states?
        2×2××2=2n2 \times2 \times \cdots \times 2 = 2^n
      • Sampling from p(x1,,xn)p(x_1, \dots,x_n) generates an image.
      • How many parameters to specify p(x1,,xn)p(x_1,\dots,x_n)?
        2n12^n - 1

Structure Through Independence

  • What if X1,,XnX_1,\dots,X_n are independent then?
    p(x1,,xn)=p(x1)p(x2)p(xn)p(x_1,\dots,x_n)=p(x_1)p(x_2)\cdots p(x_n)

  • How many possible states?
    2n2^n

  • How many parameters to specify p(x1,,xn)p(x_1,\dots,x_n)?
    nn

  • 2n2^n entries can be described by just nn numbers! But this independence assumption is too strong to model useful distributions.

Conditional Independence

  • Three important rules

    • Chain rule:
      p(x1,,xn)=p(x1)p(x2x1)p(x3x1,x2)p(xnx1,,xn1)p(x_1,\dots,x_n)=p(x_1)p(x_2|x_1)p(x_3|x_1,x_2)\cdots p(x_n|x_1, \dots,x_{n-1} )
      • How many parameters?
        p(x1)p(x_1): 1 parameter
        p(x2x1)p(x_2|x_1): 2 parameters (p(x2x1=0)p(x_2|x_1 = 0) and p(x2x1=1)p(x_2|x_1 = 1))
        p(x3x1,x2)p(x_3|x_1,x_2): 4 parameters Hence, 1+2+22++2n1=2n11 + 2 + 2^2 + \cdots + 2^{n-1} = 2^n-1, which is the same as before.

    • Bayes rule:
      p(xy)=p(x,y)p(y)=p(yx)p(x)p(y)p(x|y)=\cfrac{p(x,y)}{p(y)}= \cfrac{p(y|x)p(x)}{p(y)}

    • Conditional independence:
      if    x  y, then    p(xy,z)=p(xz)if \; \; x\ \bot\ y,\ then\; \; p(x|y,z)=p(x|z)

  • Now, suppose Xi+1X1,,Xi1XiX_{i+1} \bot X_1,\dots,X_{i-1}|X_i (Markov assumption), then

    p(x1,,xn)=p(x1)p(x2x1)p(x3x2)p(xnxn1)p(x_1,\dots,x_n)=p(x_1)p(x_2|x_1)p(x_3|x_2)\cdots p(x_n|x_{n-1})

  • How many parameters?

    2n12n - 1
  • Hence, by leveraging the Markov assumption, we get exponential reduction on the number of parameters.

  • Auto-regressive models leverage this conditional independency.



Auto-regressive Model

  • Suppose we have 28×2828 \times 28 binary pixels.
  • Our goal is to learn p(x)=p(x1,,x784)p(x)=p(x_1,\dots,x_{784}) over x{0,1}784x \in \{0,1\}^{784}
  • How can we parametrize p(x)p(x)?
    • Let's use the chain rule to factor the joint distribution.
    • p(x1:784)=p(x1)p(x2x1)p(x3x1:2)p(x_{1:784})=p(x_1)p(x_2|x_1)p(x_3|x_{1:2})\cdots
    • This is called an autoregressive model.
    • Note that we need an ordering of all random variables.
profile
초보 개발자입니다

0개의 댓글