Linear Algebra Basics

Rainy Night for Sapientia·2023년 11월 1일

Mathematics for AI

목록 보기
1/3

Basic Concepts and Their Notations

  • Inner (dot) product of two vectors

    <x,y>=xTy=i=1dxiyi<\mathbf{x}, \mathbf{y}> = \mathbf{x}^T\mathbf{y} = \sum_{i=1}^dx_iy_i
  • Outer (cross) product of two vectors

    xy=xyT=(xiyj)d×n,wherex=(xi)d×1,and(yj)n×1\mathbf{x} \: \otimes \: \mathbf{y} = \mathbf{x}\mathbf{y}^T = (x_iy_j)_{d \times n}, \\ \\ \\\text{where} \:\:\mathbf{x} = (x_i)_{d \times 1},\: \text{and}\:\: (y_j)_{n \times 1}
  • Magnitude of a vector

    x=xTx=(i=1dxi2)12||\mathbf{x}|| = \sqrt{\mathbf{x}^T\mathbf{x}} = \left ( \sum_{i=1}^dx_i^2 \right )^\frac{1}{2}
  • Angle between two vectors

    cosθ=xTyxy\text{cos}\: \theta = \frac{\mathbf{x}^T\mathbf{y}}{||\mathbf{x}||||\mathbf{y}||}
  • Orthogonal projection of vector y onto x

    yx:(yTuX)uXy \mapsto x : (\mathbf{y}^T\mathbf{u}_X)\mathbf{u}_X
    uX=xx\mathbf{u}_X = \frac {\mathbf{x}}{||\mathbf{x}||}
  • Orthogonal and Orthonomal vectors

    • If x\mathbf{x} is orthogonal to y\mathbf{y}, then xTy=0\mathbf{x}^T\mathbf{y} = 0
    • If x\mathbf{x} is orthonomal to y\mathbf{y}, then xTy=0\mathbf{x}^T\mathbf{y} = 0 and x=y=1||\mathbf{x}|| = ||\mathbf{y}|| = 1
  • Linearly Dependent and Linearly Independent

    • If a set of vectors, x1,x2,x3,x4,...,xn\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3,\mathbf{x}_4,...,\mathbf{x}_n are linearly dependent, then there exists a set of coefficients, c1,c2,c3,...,cnc_1, c_2, c_3, ..., c_n such that
c1x1+c2x2+c3x3+...+cnxn=0ck0,k=1,...,nc_1\mathbf{x}_1 + c_2\mathbf{x}_2 + c_3\mathbf{x}_3 + ... + c_n\mathbf{x}_n = 0 \\\exists c_k \neq 0, k=1, ..., n
  • If a set of vectors, x1,x2,x3,x4,...,xn\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3,\mathbf{x}_4,...,\mathbf{x}_n are linearly independent, then there exists a set of coefficients, c1,c2,c3,...,cnc_1, c_2, c_3, ..., c_n such that
c1x1+c2x2+c3x3+...+cnxn=0ck=0,k=1,...,nc_1\mathbf{x}_1 + c_2\mathbf{x}_2 + c_3\mathbf{x}_3 + ... + c_n\mathbf{x}_n = 0 \\\forall c_k = 0, k=1, ..., n

Generic matrix

  • Matrix Product: C=ABC=AB
    The length of the column of the A matrix should be the same of the length of the row of the B matrix.
    And the shape of the C would consist of len(row) of A and len(col) of B.

  • Property

    XYYX,XYZ=(XY)Z=X(YZ)XY \neq YX, \:\:XYZ = (XY)Z = X(YZ)
  • Hadamand Product: C=ABC= A \odot B
    The shape of two matrices should be the same.

Square Matrix

  • Determinant of square matrix AA = (Aij)d×d(A_{ij})_{d \times d} is denoted by A|A|, MM is minor matrix formed by removing the ith row and the kth column.
A=k=1dAikMik(1)k+i|A| = \sum_{k=1}^d A_{ik}|M_{ik}|(-1)^{k+i} \\
A=AT|A| = |A^T|
  • Trace of square matrix AA = (Aij)d×d(A_{ij})_{d \times d} is the sum of its diagonal elements
    tr(A)=k=1dAkk\text{tr}(A) = \sum_{k=1}^{d}A_{kk}
  • Rank of square matrix AA = (Aij)d×d(A_{ij})_{d \times d} is the number of linearly independent rows/ columns.

    rank(A)\text{rank}(A)

    This rank concept is used as a feature representation in the neural net.

  • Non-singluar A=(Aij)d×dA=(A_{ij})_{d \times d} is the square matrix of rank(A)=d\text{rank}(A) = d, In this case, A|A| should be 0\neq 0

    • singular matrix means by the square matrix not having the inverse, meaning by determinant is zero.
  • Orthogonal or Orthonomal AA = (Aij)d×d(A_{ij})_{d \times d} is the square matrix of the property: ATA=AAT=IdA^TA = AA^T = I_d
    where IdI_d is the identity matrix.
    It means, AT=A1A^T = A^{-1}

  • Inverse of AA = (Aij)d×d(A_{ij})_{d \times d} is the square matrix A1A^{-1} of the property: A1A=AA1=IdA^{-1}A = AA^{-1} = I_d
    The inverse of AA = (Aij)d×d(A_{ij})_{d \times d} exists iif it is a non-sigular square matrix
    (rank = dimension)

  • Semi-definite matrix AA = (Aij)d×d(A_{ij})_{d \times d} is a square matrix of the property:

    • positive semi-definite:
      xRd,xTAx0\forall \mathbf{x} \in \mathbb{R}^d, \mathbf{x}^T A \mathbf{x} \geq 0
    • negative semi-definite:
      xRd,xTAx0\forall \mathbf{x} \in \mathbb{R}^d, \mathbf{x}^T A \mathbf{x} \leq 0

Eigen Anaysis

  • Given square matrix AA = (Aij)d×d(A_{ij})_{d \times d} , the eigen analysis would find an eigenvector, and its corresponding eigenvalue.

    Av=λvA\mathbf{v} = \lambda\mathbf{v}

    Where, v\mathbf{v} is eigenvectors, and λ\lambda is eigenvalues.

  • Property

    • If AA = (Aij)d×d(A_{ij})_{d \times d} is non-singular (rank=d),
      all dd eigenvalues are non-zero, λi0(i=1,...,d)\lambda_i \neq 0(i=1,...,d), and

      A=i=1dλi=λ1λ2λ3...λd|A| = \prod_{i=1}^d \lambda_i = \lambda_1\lambda_2\lambda_3...\lambda_d
    • If AA = (Aij)d×d(A_{ij})_{d \times d} is real and symmetric, A=ATA = A^T

      • All eigenvalues are real
      • The eigenvectors associated with distinct eigenvalues are orthogonal

Linear Transformation

  • Linear Transformation is a linear mapping, P=(Pij)p×dP = (P_{ij})_{p \times d}, from a vector space, xRd\mathbf{x} \in \mathbb{R}^d onto another vector space yRp\mathbf{y} \in \mathbb{R}^p
  • In the ML context, dimensionality of two vector spaces should be different.
    • When p<dp<d, low-dimenstional representation (via dimension reduction)
    • When p>dp>d, over-complete(sparse) representation
  • All linear representation learning models aim at learning a projection matrix PP, for feature extraction to generate a new representation, y\mathbf{y}, for a raw data point, x\mathbf{x}

Matrix Decomposition

  • Spectral Decomposition is a powerful tool for matrix decomposition,

    • Given a real symmetrix matrix AA = (Aij)d×d(A_{ij})_{d \times d} where Aij=AjiA_{ij} = A_{ji}, it can be decomposed by product of matrices consisting of its eigenvectors and eigenvalues:
    Ad×d=Vd×dΣd×dVd×dTA_{d \times d} = V_{d \times d} \Sigma_{d \times d} V^T_{d \times d}
    • VV is an orthogonal matrix, VTV=IV^TV = I, where column ii is the iith eigenvector of AA.
    • Σ\Sigma is a diagonal matrix where Σii=λi\Sigma_{ii} = \lambda_i and
      Σii=0\Sigma_{ii} = 0 if iji \neq j where λi\lambda_i is the iith eigenvalue of AA
  • Singular Value Decomposition (SVD) is yet another powerful tool for matrix decomposition.

    • Given a matrix of any form, $X_{d \times N}, SVD decomposes it into following form:

      Xd×N=Ud×dΣd×NVN×NTX_{d \times N} = U_{d \times d} \Sigma_{d \times N} V^T_{N \times N}
    • UU is orthogonal matrix, UTU=Id×dU^TU = I_{d \times d}, where column iiis the iith eigen vector of XXTXX^T (Left Singular Vector)

    • VV is orthogonal matrix, VTV=IN×NV^TV = I_{N \times N}, where column iiis the iith eigen vector of XTXX^TX (Right SIngular Vector)

    • Σ\Sigma is a diagonal matrix representing square of eigenvalues shared by XXTXX^T and XTXX^TX (so called singular values)

    • Σii=λi\Sigma_{ii} = \sqrt{\lambda_i}, ΣiiΣjj\Sigma_{ii} \geq \Sigma_{jj} if i<ji < j and λi\lambda_i is the iith eigenvalue shared by XXTXX^T and XTXX^TX.

    • This also could be represented vector-wise form as below:

    Xd×N=i=1pλiuiviT=i=1pλi(ui)d×1(viT)1×NX_{d \times N} = \sum_{i=1}^{p} \lambda_i \mathbf{u}_i \mathbf{v}_i^T = \sum_{i=1}^{p} \lambda_i (\mathbf{u}_i)_{d \times 1} (\mathbf{v}_i^T )_{1 \times N}
    where,pmin(d,N)\text{where,} \:\:p \le \text{min}(d, N)
  • Eigen decomposition requires no condition for symmetric matrix,

    • It is viable to decompose any square (not even symmetric or real condition) matrix.
    • If the matrix is symmetric and real, it is then spectral decomposition possible
    Ad×d=Pd×dΣd×dPd×d1A_{d \times d} = P_{d \times d} \Sigma_{d \times d} P^{-1}_{d \times d}
    • PP is the matrix of eigenvectors, but not orthogonal.
    • Σ\Sigma is the the diagonal matrix of eigenvalues.
    • Because PP is not orthogonal then, PTP1P^T \neq P^{-1}

Example 1

What is a positive-definite matrix? If M is a positive-definite matrix in the domain of real numbers, list at least four non-tirivial properties of M.

The positive definite matrix shows the positive result of quadratic form.

xRd,xTAx>0\forall \mathbf{x} \in \mathbb{R}^d, \mathbf{x}^T A \mathbf{x} > 0

And satisfying the symmetric matrix and all the eigenvalues outputted by spectral decomposition are positive.

Then,

  1. M is symmetric matrix
  2. M is non-singular (Non-singular is meaning that the matrix is invertible such that all of the eigenvalues are non-zero)
  3. All the eigenvalues are real and positives, and all the corresponding eigenvectors are orthogonal
  4. M can be decomposed to PTPP^TP where PP is invertible matrix.

The 4 is True based on the Cholesky decomposition

The Cholesky decomposition is a specific way of decomposing a symmetric positive-definite matrix into the product of a lower triangular matrix and its transpose.

Covariance matrix

Covariance matrix is a square matrix of which off-diagonal elements correspond to the covariance between each pair of elements of a given random vector and its main diagonal contains variances of those random variables in the random vector. Any covariance matrix is symmetric and positive semi-definite.

  • Given data matrix
Xd×N={x1,,xN}X_{d \times N} = \{ \mathbf{x}_1, \dotsc, \mathbf{x}_N \}
  • Get the mean vector per each dimension feature (dd)

    md×1=[m1,m2,,md]T=1Nn=1Nxn\mathbf{m}_{d \times 1} = [m_1, m_2, \dotsc , m_d]^T = \frac{1}{N}\sum_{n=1}^{N}\mathbf{x_n}
  • (a) element-wise form

(Cij)d×d=1N1n=1N(xinmi)(xjnmj)T(C_{ij})_{d \times d} = \frac{1}{N - 1} \sum_{n =1}^{N} (\mathbf{x}_{in} - \mathbf{m}_i)(\mathbf{x}_{jn} - \mathbf{m}_j)^T
  • (b) vector form

    Cd×d=1N1n=1N(xnm)(xnm)TC_{d \times d} = \frac{1}{N - 1} \sum_{n =1}^{N} (\mathbf{x}_{n} - \mathbf{m})(\mathbf{x}_{n} - \mathbf{m})^T
  • (c) matrix form, if centralised matrix of X^\hat{X}, where xn^=xm\hat{\mathbf{x}_n} = \mathbf{x} - \mathbf{m}

    Cd×d=1N1X^X^TC_{d \times d} = \frac{1}{N-1} \hat{\mathbf{X}}\hat{\mathbf{X}}^T
profile
Artificial Intelligence study note

0개의 댓글