If a set of vectors, x1,x2,x3,x4,...,xn are linearly independent, then there exists a set of coefficients, c1,c2,c3,...,cn such that
c1x1+c2x2+c3x3+...+cnxn=0∀ck=0,k=1,...,n
Generic matrix
Matrix Product: C=AB
The length of the column of the A matrix should be the same of the length of the row of the B matrix.
And the shape of the C would consist of len(row) of A and len(col) of B.
Property
XY=YX,XYZ=(XY)Z=X(YZ)
Hadamand Product: C=A⊙B
The shape of two matrices should be the same.
Square Matrix
Determinant of square matrix A = (Aij)d×d is denoted by ∣A∣, M is minor matrix formed by removing the ith row and the kth column.
∣A∣=k=1∑dAik∣Mik∣(−1)k+i
∣A∣=∣AT∣
Trace of square matrix A = (Aij)d×d is the sum of its diagonal elements
tr(A)=k=1∑dAkk
Rank of square matrix A = (Aij)d×d is the number of linearly independent rows/ columns.
rank(A)
This rank concept is used as a feature representation in the neural net.
Non-singluar A=(Aij)d×d is the square matrix of rank(A)=d, In this case, ∣A∣ should be =0
singular matrix means by the square matrix not having the inverse, meaning by determinant is zero.
Orthogonal or Orthonomal A = (Aij)d×d is the square matrix of the property: ATA=AAT=Id
where Id is the identity matrix.
It means, AT=A−1
Inverse of A = (Aij)d×d is the square matrix A−1 of the property: A−1A=AA−1=Id
The inverse of A = (Aij)d×d exists iif it is a non-sigular square matrix
(rank = dimension)
Semi-definite matrix A = (Aij)d×d is a square matrix of the property:
positive semi-definite:
∀x∈Rd,xTAx≥0
negative semi-definite:
∀x∈Rd,xTAx≤0
Eigen Anaysis
Given square matrix A = (Aij)d×d , the eigen analysis would find an eigenvector, and its corresponding eigenvalue.
Av=λv
Where, v is eigenvectors, and λ is eigenvalues.
Property
If A = (Aij)d×d is non-singular (rank=d),
all d eigenvalues are non-zero, λi=0(i=1,...,d), and
∣A∣=i=1∏dλi=λ1λ2λ3...λd
If A = (Aij)d×d is real and symmetric, A=AT
All eigenvalues are real
The eigenvectors associated with distinct eigenvalues are orthogonal
Linear Transformation
Linear Transformation is a linear mapping, P=(Pij)p×d, from a vector space, x∈Rd onto another vector space y∈Rp
In the ML context, dimensionality of two vector spaces should be different.
When p<d, low-dimenstional representation (via dimension reduction)
When p>d, over-complete(sparse) representation
All linear representation learning models aim at learning a projection matrix P, for feature extraction to generate a new representation, y, for a raw data point, x
Matrix Decomposition
Spectral Decomposition is a powerful tool for matrix decomposition,
Given a real symmetrix matrixA = (Aij)d×d where Aij=Aji, it can be decomposed by product of matrices consisting of its eigenvectors and eigenvalues:
Ad×d=Vd×dΣd×dVd×dT
V is an orthogonal matrix, VTV=I, where column i is the ith eigenvector of A.
Σ is a diagonal matrix where Σii=λi and Σii=0 if i=j where λi is the ith eigenvalue of A
Singular Value Decomposition (SVD) is yet another powerful tool for matrix decomposition.
Given a matrix of any form, $X_{d \times N}, SVD decomposes it into following form:
Xd×N=Ud×dΣd×NVN×NT
U is orthogonal matrix, UTU=Id×d, where column iis the ith eigen vector of XXT (Left Singular Vector)
V is orthogonal matrix, VTV=IN×N, where column iis the ith eigen vector of XTX (Right SIngular Vector)
Σ is a diagonal matrix representing square of eigenvalues shared by XXT and XTX (so called singular values)
Σii=λi, Σii≥Σjj if i<j and λi is the ith eigenvalue shared by XXT and XTX.
This also could be represented vector-wise form as below:
Eigen decomposition requires no condition for symmetric matrix,
It is viable to decompose any square (not even symmetric or real condition) matrix.
If the matrix is symmetric and real, it is then spectral decomposition possible
Ad×d=Pd×dΣd×dPd×d−1
P is the matrix of eigenvectors, but not orthogonal.
Σ is the the diagonal matrix of eigenvalues.
Because P is not orthogonal then, PT=P−1
Example 1
What is a positive-definite matrix? If M is a positive-definite matrix in the domain of real numbers, list at least four non-tirivial properties of M.
The positive definite matrix shows the positive result of quadratic form.
∀x∈Rd,xTAx>0
And satisfying the symmetric matrix and all the eigenvalues outputted by spectral decomposition are positive.
Then,
M is symmetric matrix
M is non-singular (Non-singular is meaning that the matrix is invertible such that all of the eigenvalues are non-zero)
All the eigenvalues are real and positives, and all the corresponding eigenvectors are orthogonal
M can be decomposed to PTP where P is invertible matrix.
The 4 is True based on the Cholesky decomposition
The Cholesky decomposition is a specific way of decomposing a symmetric positive-definite matrix into the product of a lower triangular matrix and its transpose.
Covariance matrix
Covariance matrix is a square matrix of which off-diagonal elements correspond to the covariance between each pair of elements of a given random vector and its main diagonal contains variances of those random variables in the random vector. Any covariance matrix is symmetric and positive semi-definite.
Given data matrix
Xd×N={x1,…,xN}
Get the mean vector per each dimension feature (d)
md×1=[m1,m2,…,md]T=N1n=1∑Nxn
(a) element-wise form
(Cij)d×d=N−11n=1∑N(xin−mi)(xjn−mj)T
(b) vector form
Cd×d=N−11n=1∑N(xn−m)(xn−m)T
(c) matrix form, if centralised matrix of X^, where xn^=x−m