maths / linear-algebra

  • Notes

  • Vector space: set with special strucutre
    • could be infinite-or-finite dimension. will limit to finite below
    • span, linear independent, subspace, direct sum, finally basis
    • dimension is defined as number of basis
  • Linear maps from V -> W.

    • they could form vector spaces as well if equipped with add/mul...
    • fundamental theorem of linear algebra - relationship between fundamental sub-spaces
      • fundamental subspaces:
        • kernel in V. So dim(ker) < dim(V)
        • image/range in W. So dim(img) < dim(W)
      • dim(V) = dim(kernel) + dim(image)
        • Among all input dimensions dim(V): dim(kernel(A)) are zeroed, what's left is dim(V) - dim(kernel(A)) = dim(range(A)) <= dim(W)
      • in/sur-jection depends on domain and co-domain we talk about
      • empty kernel <=> injective
      • empty W-range(A) <=> surjective
    • Iso-morphism:
      • "iso" means same, "morph" means shape.
      • any two same-dim finite vector spaces are isomorphic
      • isomorphism is bi-jective linear map
  • Matrix of linear map and matrix product

    • “左行乘右列”: each operation is a inner product, producing one scalar in result
    • left columns linear combined using right column:
      • can be considered a vectorized version of "左行乘右列"
      • note num. columns is the same as input space dimension
      • k-th column are coordinates of in output space's standard basis, using k-th standard basis of input space as input.
      • similarly, k-th columns of matrix product ABAB is ABkth colA B_{k-\text{th col}}
    • leverage matrix to consider dim(linear map from dim(m) to dim(n)): dim=mn\text{dim}=mn
      • choose one cell as 1, leaving others to be zero. This is a basis. There're m times n cells.
    • mxn matrix is isomorphism to linear map from dim m to dim n.
  • Quotient space: TBD. might be related to machine-learning/dim-reduction
  • Dual map, linear functionals.
    • linear functional maps from vector space to scalar
    • dual map maps a functional in one space to a functional in another space
  • operators: linear map that has identical domain and co-domain
    • Projection operator: P2=PP^2 = P, so PP is always singular
      • understanding it: 1) using the vector entries as coefficient to linearly combine column vectors of PP; 2) inner-product of each row vector of PP
  • Solving linear systems

    • fundamental theorem of linear maps
    • determinant for non-square matrix is generalized to SVD
  • Eigen space:

    • motivation: find invariant sub-space so we could study linear maps easier
    • 1-D invariant subspace -> eigenvalue and eigenvector
    • ALL square matrices have n (with multiplicity) complex eigen-values
      • For those diagonolizable:
        • Real eigenvalues correspond to (non-uniform) scaling (reflection as a special -1 scaling) in eigen directions;
        • (conjugate pairs of) complex eigenvalues correspond to scaling + rotation
          • scaling is defined as magnitude of the complex number, rotation is defined as the arg of one of the pair. eiθe^{i\theta}
            • Every rotation can only be considered in a 2D plane. And note C^1 is isomorphic to R^2 as vector spaces:
              • > A single complex number can be visualized by a vector in the complex plane. Note that the set C\mathbb{C} of complex numbers is a one-dimensional vector space C1\mathbb{C}^1 over itself (i.e. over C\mathbb{C}, as a complex vector space). So in that sense it's more like a "complex number line" — akin to the real number line R\mathbb{R} as a one-dimensional vector space R1\mathbb{R}^1 over itself (i.e. over R\mathbb{R}, as a real vector space). We visualize the complex plane as a plane, however, because C\mathbb{C} is two-dimensional as a real vector space over R\mathbb{R}. And we as humans (at least, the vast majority of us) are much better at visualizing real dimensions, and only up to three of them.
            • note Euler's formula eiθ=cos(θ)+isin(θ)e^{i\theta} = cos(\theta) + isin(\theta)
      • For those non-diagonolizable:
        • A=[[1,1],[0,1]]A = [[1, 1], [0, 1]] a _sheer_ to right transformation
      • translation is impossible. Possible ones are just scaling and rotation:
        • sheer could be broken down to scaling+rotation but awkward
        • reflection is just scaling by -1
        • projection is scaling by zero
        • identity, ofc
    • Characteristic polynomial is calculated off matrix but ultimately about operator, so they actually don't depend on basis of choice.
      • geometric multiplicity <= algebraic <= dimension
    • Diagonolized matrix A makes it easy to compute polynomial of A
    • any rotation could be represented as product of a series of elementary rotation, each dealing with one plane
  • Inner-product space:

    • to talk about distance and angles
    • minimization via projection
    • Riesz Representation Theorem:
      • functionals on Hilbert (complete inner product) space are isomorphic to inner products with a fixed vector in that space.
        • i.e. f(v) is bijectively mapped to <v, u> where u uniquely exists for each f.
      • related application: Gaussian Process Regression
    • Adjoint and conjugate transpose
      • <Tu, v> = <u, T*v>
      • the existence of the right hand side is a obvious result of Riesz repr theorem. Then we find a T from v and that unique vector (Tv).
      • self-adjoint i.e. Hermitian
        • Could be diagnolized with eigen values are all real, eigen vectors are orthognol
    • Polar decomposition A=UAA = U |A|:
      • UU: unitary, isometry, non-singular. rotation transform.
      • A=AA|A| = \sqrt{ A A^*}: positive semi-definite. Modulars of AA. Scaling transform
    • SVD A=UA=USV=iσiuiviTA=U|A| = U S V^* = \sum_i \sigma_i u_i v_i^T
      • it separates the PD from polar decomposition
      • diagnolization w.r.t. two different unitary basis
      • singular values are just sqrt of eigen values of AAAA^*
      • full SVD and reduced/economic SVD (up to larger-than-thresh singular values)
      • geometric interpretation: unit ball to ellipsoid
      • implementation:
        • first eigen decompose AAAA^*. Eigen values sqrt = singular values; eigen vectors are VV
        • then calculate UU: 1σiAvi\frac{1}{\sigma_i} \mathbf{A}v_i
      • linear system: error in y = cond_number * error in input
        • the amplification is defined by condition number, ratio between max and min singular value
      • OLS linear-regression could be done via SVD
        • replace X in βhat=(XTX)1XTY\beta_{\text{hat}} = (X^TX)^{-1}X^TY with X=USVTX = U S V^T we get βhat=VS1UTY\beta_{\text{hat}} = V S^{-1} U^T Y
        • This is minimal-norm solution?
      • PCA:
        • first demean
        • could eigen decompose Cov(X)
        • or could SVD(X) directly
      • the absolute value of the determinant is the product of the singular values
        -
  • Change of basis

    • Represent a vector under another basis:
      • If BB is a matrix of column vectors as basis, then given vEv_E how do we get vBv_B? Here vEv_E is vector coordinates under standard basis
        • Say vB=[v1,v2,,vn]Tv_B = [v_1, v_2, \dots , v_n]^T, then vE=ivibi=BvBv_E = \sum_i v_i b_i = Bv_B, where bib_i are columns of BB. Therefore, vB=B1vEv_B = B^{-1}v_E
    • Represent a linear map under different source and destination basis:
      • [T]PQ=[I]PM[T]MN[I]NQ[T]_{PQ} = [I]_{PM} [T]_{MN} [I]_{NQ}
        • we get _similar matrix_ if we set M=NM=N and P=QP=Q
          • i.e. making intput/output basis the same, then change basis
          • we get [T]PP=B[T]MMB1[T]_{PP} = B [T]_{MM} B^{-1} where B=[I]PMB = [I]_{PM}
      • [I]PM[I]_{PM} is easier to calculate if we notice [I]PM=[I]PE[I]EE[I]EM=[I]PE[I]EM[I]_{PM} = [I]_{PE} [I]_{EE} [I]_{EM} = [I]_{PE} [I]_{EM} where EE is standard basis
  • Quadratic form

    • bi-linear form: linear in both variables
    • quadratic form expressed by matrix:
      • pick symmetric matrix
      • <Ax,x><Ax, x> is equivalent with xAxx' A x
    • Check P.D.
      • det(A_i) > 0 for all i from 1 to n. A_i is top-left sub-matrix of A
      • all eigen values > 0
        -
  • Determinant Det(A)\textbf{Det}(A) and Trace Tr(A)\textbf{Tr}(A) where A is square

    • Det
      • Zero: some input dim collapsed; rank < n, kernel space non trivial.
      • Non-zero: magnitude is overall scaling = product of eigen-values; sign is
      • always equal to product of eigenvalues (in complex numbers)
      • (1)n(-1)^n time const term in characteristic polynomial
    • Trace
      • zn1z^{n-1} term coef in characteristic polynomial
  • Numerical methods

    • Jordan Cononical Form: not stable. intuitively diagnolizable matrices are dense
    • eigen decomposition, inversion and LU decomposition are not stable when it's ill-conditioned.
    • SVD is stable
    • QR is stable
    • Find largest eigen-value: AnvA^n v will get closer to λmaxv\lambda_{max}v as n+infn \rightarrow +\inf
      • find smallest eigen-value, use A1A^{-1}
      • find full eigen-decomposition: QR-decomposition under the hood
      • find roots for characteristic poly: too slow and no general solution
        -
        -
  • Books

  • (Undergraduate texts in mathematics) Axler, Sheldon - Linear Algebra Done Right-Springer International Publishing _ Imprint_ Springer (2015).pdf_Axler,_Sheldon_-_Linear_Algebra_Done_Right-Springer_International_Publishing_Imprint_Springer_(2015)_1694840400161_0.pdf)
  • LADW-2014-09.pdf
    -

A digital garden, perpetually growing.