maths / linear-algebra

Notes
Vector space: set with special strucutre
- could be infinite-or-finite dimension. will limit to finite below
- span, linear independent, subspace, direct sum, finally basis
- dimension is defined as number of basis
Linear maps from V -> W.
- they could form vector spaces as well if equipped with add/mul...
- fundamental theorem of linear algebra - relationship between fundamental sub-spaces
  - fundamental subspaces:
    - kernel in V. So dim(ker) < dim(V)
    - image/range in W. So dim(img) < dim(W)
  - dim(V) = dim(kernel) + dim(image)
    - Among all input dimensions dim(V): dim(kernel(A)) are zeroed, what's left is dim(V) - dim(kernel(A)) = dim(range(A)) <= dim(W)
  - in/sur-jection depends on domain and co-domain we talk about
  - empty kernel <=> injective
  - empty W-range(A) <=> surjective
- Iso-morphism:
  - "iso" means same, "morph" means shape.
  - any two same-dim finite vector spaces are isomorphic
  - isomorphism is bi-jective linear map
  - echo CategoryTheory
Matrix of linear map and matrix product
- “左行乘右列”: each operation is a inner product, producing one scalar in result
- left columns linear combined using right column:
  - can be considered a vectorized version of "左行乘右列"
  - note num. columns is the same as input space dimension
  - k-th column are coordinates of in output space's standard basis, using k-th standard basis of input space as input.
  - similarly, k-th columns of matrix product $AB$ is $A B_{k-\text{th col}}$
- leverage matrix to consider dim(linear map from dim(m) to dim(n)): $\text{dim}=mn$
  - choose one cell as 1, leaving others to be zero. This is a basis. There're m times n cells.
- mxn matrix is isomorphism to linear map from dim m to dim n.
Quotient space: TBD. might be related to machine-learning/dim-reduction
Dual map, linear functionals.
- linear functional maps from vector space to scalar
- dual map maps a functional in one space to a functional in another space
operators: linear map that has identical domain and co-domain
- Projection operator: $P^2 = P$ , so $P$ is always singular
  - understanding it: 1) using the vector entries as coefficient to linearly combine column vectors of $P$ ; 2) inner-product of each row vector of $P$
Solving linear systems
- fundamental theorem of linear maps
- determinant for non-square matrix is generalized to SVD
Eigen space:
- motivation: find invariant sub-space so we could study linear maps easier
- 1-D invariant subspace -> eigenvalue and eigenvector
- ALL square matrices have n (with multiplicity) complex eigen-values
  - For those diagonolizable:
    - Real eigenvalues correspond to (non-uniform) scaling (reflection as a special -1 scaling) in eigen directions;
    - (conjugate pairs of) complex eigenvalues correspond to scaling + rotation
      - scaling is defined as magnitude of the complex number, rotation is defined as the arg of one of the pair. $e^{i\theta}$
        Every rotation can only be considered in a 2D plane. And note C^1 is isomorphic to R^2 as vector spaces:
        > A single complex number can be visualized by a vector in the complex plane. Note that the set $\mathbb{C}$ of complex numbers is a one-dimensional vector space $\mathbb{C}^1$ over itself (i.e. over $\mathbb{C}$ , as a complex vector space). So in that sense it's more like a "complex number line" — akin to the real number line $\mathbb{R}$ as a one-dimensional vector space $\mathbb{R}^1$ over itself (i.e. over $\mathbb{R}$ , as a real vector space). We visualize the complex plane as a plane, however, because $\mathbb{C}$ is two-dimensional as a real vector space over $\mathbb{R}$ . And we as humans (at least, the vast majority of us) are much better at visualizing real dimensions, and only up to three of them.
        note Euler's formula $e^{i\theta} = cos(\theta) + isin(\theta)$
  - For those non-diagonolizable:
    - $A = [[1, 1], [0, 1]]$ a _sheer_ to right transformation
  - translation is impossible. Possible ones are just scaling and rotation:
    - sheer could be broken down to scaling+rotation but awkward
    - reflection is just scaling by -1
    - projection is scaling by zero
    - identity, ofc
- Characteristic polynomial is calculated off matrix but ultimately about operator, so they actually don't depend on basis of choice.
  - geometric multiplicity <= algebraic <= dimension
- Diagonolized matrix A makes it easy to compute polynomial of A
- any rotation could be represented as product of a series of elementary rotation, each dealing with one plane
Inner-product space:
- to talk about distance and angles
- minimization via projection
  - OLS linear-regression could be thought of a projection from Y space (dim n) to feature space (dim k). This leads to "residual orthogonal to X" as well.
    - $Yhar = X(X'X)^{-1}X'Y$ does not look like a projection matrix
      - but seems indeed https://en.wikipedia.org/wiki/Projection_matrix#:~:text=In%20statistics%2C%20the%20projection%20matrix,has%20on%20each%20fitted%20value.
  - projection matrix naturally has: $P^2 = P$
- Riesz Representation Theorem:
  - functionals on Hilbert (complete inner product) space are isomorphic to inner products with a fixed vector in that space.
    - i.e. f(v) is bijectively mapped to <v, u> where u uniquely exists for each f.
  - related application: Gaussian Process Regression
- Adjoint and conjugate transpose
  - <Tu, v> = <u, T*v>
  - the existence of the right hand side is a obvious result of Riesz repr theorem. Then we find a T from v and that unique vector (Tv).
  - self-adjoint i.e. Hermitian
    - Could be diagnolized with eigen values are all real, eigen vectors are orthognol
- Polar decomposition $A = U |A|$ :
  - $U$ : unitary, isometry, non-singular. rotation transform.
  - $|A| = \sqrt{ A A^*}$ : positive semi-definite. Modulars of $A$ . Scaling transform
- SVD $A=U|A| = U S V^* = \sum_i \sigma_i u_i v_i^T$
  - it separates the PD from polar decomposition
  - diagnolization w.r.t. two different unitary basis
  - singular values are just sqrt of eigen values of $AA^*$
  - full SVD and reduced/economic SVD (up to larger-than-thresh singular values)
  - geometric interpretation: unit ball to ellipsoid
  - implementation:
    - first eigen decompose $AA^*$ . Eigen values sqrt = singular values; eigen vectors are $V$
    - then calculate $U$ : $\frac{1}{\sigma_i} \mathbf{A}v_i$
  - linear system: error in y = cond_number * error in input
    - the amplification is defined by condition number, ratio between max and min singular value
  - OLS linear-regression could be done via SVD
    - replace X in $\beta_{\text{hat}} = (X^TX)^{-1}X^TY$ with $X = U S V^T$ we get $\beta_{\text{hat}} = V S^{-1} U^T Y$
    - reference: https://sthalles.github.io/svd-for-regression/
    - This is minimal-norm solution?
  - PCA:
    - first demean
    - could eigen decompose Cov(X)
    - or could SVD(X) directly
  - the absolute value of the determinant is the product of the singular values
    -
Change of basis
- Represent a vector under another basis:
  - If $B$ is a matrix of column vectors as basis, then given $v_E$ how do we get $v_B$ ? Here $v_E$ is vector coordinates under standard basis
    - Say $v_B = [v_1, v_2, \dots , v_n]^T$ , then $v_E = \sum_i v_i b_i = Bv_B$ , where $b_i$ are columns of $B$ . Therefore, $v_B = B^{-1}v_E$
- Represent a linear map under different source and destination basis:
  - $[T]_{PQ} = [I]_{PM} [T]_{MN} [I]_{NQ}$
    - we get _similar matrix_ if we set $M=N$ and $P=Q$
      - i.e. making intput/output basis the same, then change basis
      - we get $[T]_{PP} = B [T]_{MM} B^{-1}$ where $B = [I]_{PM}$
  - $[I]_{PM}$ is easier to calculate if we notice $[I]_{PM} = [I]_{PE} [I]_{EE} [I]_{EM} = [I]_{PE} [I]_{EM}$ where $E$ is standard basis
Quadratic form
- bi-linear form: linear in both variables
- quadratic form expressed by matrix:
  - pick symmetric matrix
  - $<Ax, x>$ is equivalent with $x' A x$
- Check P.D.
  - det(A_i) > 0 for all i from 1 to n. A_i is top-left sub-matrix of A
  - all eigen values > 0
    -
Determinant $\textbf{Det}(A)$ and Trace $\textbf{Tr}(A)$ where A is square
- Det
  - Zero: some input dim collapsed; rank < n, kernel space non trivial.
  - Non-zero: magnitude is overall scaling = product of eigen-values; sign is
  - always equal to product of eigenvalues (in complex numbers)
  - $(-1)^n$ time const term in characteristic polynomial
- Trace
  - $z^{n-1}$ term coef in characteristic polynomial
Numerical methods
- Jordan Cononical Form: not stable. intuitively diagnolizable matrices are dense
- eigen decomposition, inversion and LU decomposition are not stable when it's ill-conditioned.
- SVD is stable
- QR is stable
- Find largest eigen-value: $A^n v$ will get closer to $\lambda_{max}v$ as $n \rightarrow +\inf$
  - find smallest eigen-value, use $A^{-1}$
  - find full eigen-decomposition: QR-decomposition under the hood
  - find roots for characteristic poly: too slow and no general solution
    -
    -
Books
_Axler,_Sheldon_-_Linear_Algebra_Done_Right-Springer_International_Publishing_Imprint_Springer_(2015)_1694840400161_0.pdf)
-

1 page links here

maths

Notes

Linear maps from V -> W.

Matrix of linear map and matrix product

Solving linear systems

Eigen space:

Inner-product space:

Change of basis

Quadratic form

Determinant Det(A)\textbf{Det}(A)Det(A) and Trace Tr(A)\textbf{Tr}(A)Tr(A) where A is square

Numerical methods

Books

1 page links here

Determinant $\textbf{Det}(A)$ and Trace $\textbf{Tr}(A)$ where A is square