Condition
- suppose we have a collection of m points {x(1),x(2),⋯,x(m)} in Rn
- suppose we would like to apply lossy compression to these points
- Lossy compression means strong the points in a way that requires less memory but may lose some precision
- we would like to lose as little precision as possible
Statement
- we can make encoding function f such that for each point x(i)∈Rn corresponds code vector ci∈Rl , where l<n
- f(x)=c
- we can make decoding function g such that x∼g(f(x))
- For simple, choose suppose D∈Rn×l
- let columns of D to be orthogonal to each other
- we can control the scale of D. in this case, we constrain all of the columns of D to have unit norm
- Let g(c)=Dc
- we can find optical point c∗ by calculating the minimized distance between c∗=argminc∥x−g(c)∥
- we can switch to the squared L2 norm instead of L2 norm itself, because both are minimized by same value of c
- c∗=argminc∥x−g(c)∥22
- (x−g(c))T(x−g(c))
- =xTx−xTg(c)−g(c)Tx+g(c)Tg(c)
- =xTx−2xTg(c)+g(c)Tg(c) (because xTg(c) is scalar)
- we can ignore first term since this term does not depend on c
- c∗=argmin{c}−2xTg(c)+g(c)Tg(c)
- =argmin{c}−2xTDc+cTDTDc
- =argmin{c}−2xTDc+cTIlc
- (by orthogonality and unit norm constraints of D)
- =argmin{c}−2xTDc+cTc
- ∇c(−2xTDc+cTc)=0
- −2DTx+2c=0
- c=DTx
Statement2
- r(x)=g(f(x))=DDTx
- We will find the D minimizing the L2 distance between inputs and reconstructions
- D∗=argmin{D}i,j∑(xj(i)−r(x(i))j)2 (subject to DTD=Il)
- To derive the algorithm for finding D∗, we will start by considering the case where l=1
- In this single case D is just a single vector d
- d∗=argmin{d}i∑∥x(i)−ddTx(i)∥2 subject to ∥d2∥=1
- exploiting the fact that a scalr is its own transpose
- d∗=argmin{d}i∑∥x(i)−dTx(i)d∥2 subject to ∥d2∥=1
- let X∈Rm×n be matrix defined by stacking all of the vectors describing the points such that Xi,:=x(i)T
- we can now rewrite the problem as
- d∗=argmin{d}∥X−XddT∥F2
- 여기서 Frobenius norm은 ∥X∥F2=Tr(XXT)=i∑j∑∣xij∣2이다
- argmin{d}∥X−XddT∥F2
- =argmin{d}Tr((X−XddT)T(X−XddT))
- =argmin{d}Tr(XTX−XTXddT−ddTXTX+ddTXTXddT)
- =argmin{d}−Tr(XTXddT)−Tr(ddTXTX)+Tr(ddTXTXddT)
- (term not involving d ignored since do not affect the arg min)
- use property of 'Trace of matrix'
- =argmin{d}−2Tr(XTXddT+Tr(XTXddTddT))
- subject to dTd=1
- =-argmin{d}Tr(XTXddT)
- subject to dTd=1
- =argmax{d}Tr(dTXTXd)
- subject to dTd=1
- optimal d is given by eigenvector of XTX corresponding to the largest eigenvalue
- In the general case, the matrix D is given by l eigenvectors corresponding to the largest eigenvalues This may be shown using proof by induction