Linear Algebra Basics

Rainy Night for Sapientia·2023년 11월 1일

Mathematics for AI

목록 보기

1/3

Basic Concepts and Their Notations

Inner (dot) product of two vectors
$<\mathbf{x}, \mathbf{y}> = \mathbf{x}^T\mathbf{y} = \sum_{i=1}^dx_iy_i$
Outer (cross) product of two vectors
$\mathbf{x} \: \otimes \: \mathbf{y} = \mathbf{x}\mathbf{y}^T = (x_iy_j)_{d \times n}, \\ \\ \\\text{where} \:\:\mathbf{x} = (x_i)_{d \times 1},\: \text{and}\:\: (y_j)_{n \times 1}$
Magnitude of a vector
$||\mathbf{x}|| = \sqrt{\mathbf{x}^T\mathbf{x}} = \left ( \sum_{i=1}^dx_i^2 \right )^\frac{1}{2}$
Angle between two vectors
$\text{cos}\: \theta = \frac{\mathbf{x}^T\mathbf{y}}{||\mathbf{x}||||\mathbf{y}||}$
Orthogonal projection of vector y onto x
$y \mapsto x : (\mathbf{y}^T\mathbf{u}_X)\mathbf{u}_X$ $\mathbf{u}_X = \frac {\mathbf{x}}{||\mathbf{x}||}$
Orthogonal and Orthonomal vectors
- If $\mathbf{x}$ is orthogonal to $\mathbf{y}$ , then $\mathbf{x}^T\mathbf{y} = 0$
- If $\mathbf{x}$ is orthonomal to $\mathbf{y}$ , then $\mathbf{x}^T\mathbf{y} = 0$ and $||\mathbf{x}|| = ||\mathbf{y}|| = 1$
Linearly Dependent and Linearly Independent
- If a set of vectors, $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3,\mathbf{x}_4,...,\mathbf{x}_n$ are linearly dependent, then there exists a set of coefficients, $c_1, c_2, c_3, ..., c_n$ such that

c_1\mathbf{x}_1 + c_2\mathbf{x}_2 + c_3\mathbf{x}_3 + ... + c_n\mathbf{x}_n = 0 \\\exists c_k \neq 0, k=1, ..., n

If a set of vectors, $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3,\mathbf{x}_4,...,\mathbf{x}_n$ are linearly independent, then there exists a set of coefficients, $c_1, c_2, c_3, ..., c_n$ such that

c_1\mathbf{x}_1 + c_2\mathbf{x}_2 + c_3\mathbf{x}_3 + ... + c_n\mathbf{x}_n = 0 \\\forall c_k = 0, k=1, ..., n

Generic matrix

Matrix Product: $C=AB$
The length of the column of the A matrix should be the same of the length of the row of the B matrix.
And the shape of the C would consist of len(row) of A and len(col) of B.
Property
$XY \neq YX, \:\:XYZ = (XY)Z = X(YZ)$
Hadamand Product: $C= A \odot B$
The shape of two matrices should be the same.

Square Matrix

Determinant of square matrix $A$ = $(A_{ij})_{d \times d}$ is denoted by $|A|$ , $M$ is minor matrix formed by removing the ith row and the kth column.

|A| = \sum_{k=1}^d A_{ik}|M_{ik}|(-1)^{k+i} \\

|A| = |A^T|

Trace of square matrix $A$ = $(A_{ij})_{d \times d}$ is the sum of its diagonal elements $\text{tr}(A) = \sum_{k=1}^{d}A_{kk}$

Rank of square matrix $A$ = $(A_{ij})_{d \times d}$ is the number of linearly independent rows/ columns.
$\text{rank}(A)$
This rank concept is used as a feature representation in the neural net.
Non-singluar $A=(A_{ij})_{d \times d}$ is the square matrix of $\text{rank}(A) = d$ , In this case, $|A|$ should be $\neq 0$
- singular matrix means by the square matrix not having the inverse, meaning by determinant is zero.
Orthogonal or Orthonomal $A$ = $(A_{ij})_{d \times d}$ is the square matrix of the property: $A^TA = AA^T = I_d$
where $I_d$ is the identity matrix.
It means, $A^T = A^{-1}$
Inverse of $A$ = $(A_{ij})_{d \times d}$ is the square matrix $A^{-1}$ of the property: $A^{-1}A = AA^{-1} = I_d$
The inverse of $A$ = $(A_{ij})_{d \times d}$ exists iif it is a non-sigular square matrix
(rank = dimension)
Semi-definite matrix $A$ = $(A_{ij})_{d \times d}$ is a square matrix of the property:
- positive semi-definite: $\forall \mathbf{x} \in \mathbb{R}^d, \mathbf{x}^T A \mathbf{x} \geq 0$
- negative semi-definite: $\forall \mathbf{x} \in \mathbb{R}^d, \mathbf{x}^T A \mathbf{x} \leq 0$

Eigen Anaysis

Given square matrix $A$ = $(A_{ij})_{d \times d}$ , the eigen analysis would find an eigenvector, and its corresponding eigenvalue.
$A\mathbf{v} = \lambda\mathbf{v}$
Where, $\mathbf{v}$ is eigenvectors, and $\lambda$ is eigenvalues.
Property
- If $A$ = $(A_{ij})_{d \times d}$ is non-singular (rank=d),
  all $d$ eigenvalues are non-zero, $\lambda_i \neq 0(i=1,...,d)$ , and
  $|A| = \prod_{i=1}^d \lambda_i = \lambda_1\lambda_2\lambda_3...\lambda_d$
- If $A$ = $(A_{ij})_{d \times d}$ is real and symmetric, $A = A^T$
  - All eigenvalues are real
  - The eigenvectors associated with distinct eigenvalues are orthogonal

Linear Transformation

Linear Transformation is a linear mapping, $P = (P_{ij})_{p \times d}$ , from a vector space, $\mathbf{x} \in \mathbb{R}^d$ onto another vector space $\mathbf{y} \in \mathbb{R}^p$

In the ML context, dimensionality of two vector spaces should be different.
- When $p<d$ , low-dimenstional representation (via dimension reduction)
- When $p>d$ , over-complete(sparse) representation

All linear representation learning models aim at learning a projection matrix $P$ , for feature extraction to generate a new representation, $\mathbf{y}$ , for a raw data point, $\mathbf{x}$

Matrix Decomposition

Spectral Decomposition is a powerful tool for matrix decomposition,
- Given a real symmetrix matrix $A$ = $(A_{ij})_{d \times d}$ where $A_{ij} = A_{ji}$ , it can be decomposed by product of matrices consisting of its eigenvectors and eigenvalues:
$A_{d \times d} = V_{d \times d} \Sigma_{d \times d} V^T_{d \times d}$
- $V$ is an orthogonal matrix, $V^TV = I$ , where column $i$ is the $i$ th eigenvector of $A$ .
- $\Sigma$ is a diagonal matrix where $\Sigma_{ii} = \lambda_i$ and
  $\Sigma_{ii} = 0$ if $i \neq j$ where $\lambda_i$ is the $i$ th eigenvalue of $A$

Singular Value Decomposition (SVD) is yet another powerful tool for matrix decomposition.
- Given a matrix of any form, $X_{d \times N}, SVD decomposes it into following form:
  $X_{d \times N} = U_{d \times d} \Sigma_{d \times N} V^T_{N \times N}$
- $U$ is orthogonal matrix, $U^TU = I_{d \times d}$ , where column $i$ is the $i$ th eigen vector of $XX^T$ (Left Singular Vector)
- $V$ is orthogonal matrix, $V^TV = I_{N \times N}$ , where column $i$ is the $i$ th eigen vector of $X^TX$ (Right SIngular Vector)
- $\Sigma$ is a diagonal matrix representing square of eigenvalues shared by $XX^T$ and $X^TX$ (so called singular values)
- $\Sigma_{ii} = \sqrt{\lambda_i}$ , $\Sigma_{ii} \geq \Sigma_{jj}$ if $i < j$ and $\lambda_i$ is the $i$ th eigenvalue shared by $XX^T$ and $X^TX$ .
- This also could be represented vector-wise form as below:
$X_{d \times N} = \sum_{i=1}^{p} \lambda_i \mathbf{u}_i \mathbf{v}_i^T = \sum_{i=1}^{p} \lambda_i (\mathbf{u}_i)_{d \times 1} (\mathbf{v}_i^T )_{1 \times N}$

$\text{where,} \:\:p \le \text{min}(d, N)$
Eigen decomposition requires no condition for symmetric matrix,
- It is viable to decompose any square (not even symmetric or real condition) matrix.
- If the matrix is symmetric and real, it is then spectral decomposition possible
$A_{d \times d} = P_{d \times d} \Sigma_{d \times d} P^{-1}_{d \times d}$
- $P$ is the matrix of eigenvectors, but not orthogonal.
- $\Sigma$ is the the diagonal matrix of eigenvalues.
- Because $P$ is not orthogonal then, $P^T \neq P^{-1}$

Example 1

What is a positive-definite matrix? If M is a positive-definite matrix in the domain of real numbers, list at least four non-tirivial properties of M.

The positive definite matrix shows the positive result of quadratic form.

\forall \mathbf{x} \in \mathbb{R}^d, \mathbf{x}^T A \mathbf{x} > 0

And satisfying the symmetric matrix and all the eigenvalues outputted by spectral decomposition are positive.

Then,

M is symmetric matrix
M is non-singular (Non-singular is meaning that the matrix is invertible such that all of the eigenvalues are non-zero)
All the eigenvalues are real and positives, and all the corresponding eigenvectors are orthogonal
M can be decomposed to $P^TP$ where $P$ is invertible matrix.

The 4 is True based on the Cholesky decomposition

The Cholesky decomposition is a specific way of decomposing a symmetric positive-definite matrix into the product of a lower triangular matrix and its transpose.

Covariance matrix

Covariance matrix is a square matrix of which off-diagonal elements correspond to the covariance between each pair of elements of a given random vector and its main diagonal contains variances of those random variables in the random vector. Any covariance matrix is symmetric and positive semi-definite.

Given data matrix

X_{d \times N} = \{ \mathbf{x}_1, \dotsc, \mathbf{x}_N \}

Get the mean vector per each dimension feature ( $d$ )
$\mathbf{m}_{d \times 1} = [m_1, m_2, \dotsc , m_d]^T = \frac{1}{N}\sum_{n=1}^{N}\mathbf{x_n}$
(a) element-wise form

(C_{ij})_{d \times d} = \frac{1}{N - 1} \sum_{n =1}^{N} (\mathbf{x}_{in} - \mathbf{m}_i)(\mathbf{x}_{jn} - \mathbf{m}_j)^T

(b) vector form
$C_{d \times d} = \frac{1}{N - 1} \sum_{n =1}^{N} (\mathbf{x}_{n} - \mathbf{m})(\mathbf{x}_{n} - \mathbf{m})^T$
(c) matrix form, if centralised matrix of $\hat{X}$ , where $\hat{\mathbf{x}_n} = \mathbf{x} - \mathbf{m}$
$C_{d \times d} = \frac{1}{N-1} \hat{\mathbf{X}}\hat{\mathbf{X}}^T$

Rainy Night for Sapientia

Artificial Intelligence study note

다음 포스트