maths / linear-algebra
Notes
- Vector space: set with special strucutre
- could be infinite-or-finite dimension. will limit to finite below
- span, linear independent, subspace, direct sum, finally basis
- dimension is defined as number of basis
Linear maps from V -> W.
- they could form vector spaces as well if equipped with add/mul...
- fundamental theorem of linear algebra - relationship between fundamental sub-spaces
- fundamental subspaces:
- kernel in V. So dim(ker) < dim(V)
- image/range in W. So dim(img) < dim(W)
- dim(V) = dim(kernel) + dim(image)
- Among all input dimensions dim(V): dim(kernel(A)) are zeroed, what's left is dim(V) - dim(kernel(A)) = dim(range(A)) <= dim(W)
- in/sur-jection depends on domain and co-domain we talk about
- empty kernel <=> injective
- empty W-range(A) <=> surjective
- Iso-morphism:
- "iso" means same, "morph" means shape.
- any two same-dim finite vector spaces are isomorphic
- isomorphism is bi-jective linear map
- echo CategoryTheory
Matrix of linear map and matrix product
- “左行乘右列”: each operation is a inner product, producing one scalar in result
- left columns linear combined using right column:
- can be considered a vectorized version of "左行乘右列"
- note num. columns is the same as input space dimension
- k-th column are coordinates of in output space's standard basis, using k-th standard basis of input space as input.
- similarly, k-th columns of matrix product is
- leverage matrix to consider dim(linear map from dim(m) to dim(n)):
- choose one cell as 1, leaving others to be zero. This is a basis. There're m times n cells.
- mxn matrix is isomorphism to linear map from dim m to dim n.
- Quotient space: TBD. might be related to machine-learning/dim-reduction
- Dual map, linear functionals.
- linear functional maps from vector space to scalar
- dual map maps a functional in one space to a functional in another space
- operators: linear map that has identical domain and co-domain
- Projection operator: , so is always singular
- understanding it: 1) using the vector entries as coefficient to linearly combine column vectors of ; 2) inner-product of each row vector of
Solving linear systems
- fundamental theorem of linear maps
- determinant for non-square matrix is generalized to SVD
Eigen space:
- motivation: find invariant sub-space so we could study linear maps easier
- 1-D invariant subspace -> eigenvalue and eigenvector
- ALL square matrices have n (with multiplicity) complex eigen-values
- For those diagonolizable:
- Real eigenvalues correspond to (non-uniform) scaling (reflection as a special -1 scaling) in eigen directions;
- (conjugate pairs of) complex eigenvalues correspond to scaling + rotation
- scaling is defined as magnitude of the complex number, rotation is defined as the arg of one of the pair.
- Every rotation can only be considered in a 2D plane. And note C^1 is isomorphic to R^2 as vector spaces:
- > A single complex number can be visualized by a vector in the complex plane. Note that the set of complex numbers is a one-dimensional vector space over itself (i.e. over , as a complex vector space). So in that sense it's more like a "complex number line" — akin to the real number line as a one-dimensional vector space over itself (i.e. over , as a real vector space). We visualize the complex plane as a plane, however, because is two-dimensional as a real vector space over . And we as humans (at least, the vast majority of us) are much better at visualizing real dimensions, and only up to three of them.
- note Euler's formula
- For those non-diagonolizable:
- a _sheer_ to right transformation
- translation is impossible. Possible ones are just scaling and rotation:
- sheer could be broken down to scaling+rotation but awkward
- reflection is just scaling by -1
- projection is scaling by zero
- identity, ofc
- Characteristic polynomial is calculated off matrix but ultimately about operator, so they actually don't depend on basis of choice.
- geometric multiplicity <= algebraic <= dimension
- Diagonolized matrix A makes it easy to compute polynomial of A
- any rotation could be represented as product of a series of elementary rotation, each dealing with one plane
Inner-product space:
- to talk about distance and angles
- minimization via projection
- OLS linear-regression could be thought of a projection from Y space (dim n) to feature space (dim k). This leads to "residual orthogonal to X" as well.
- does not look like a projection matrix
- projection matrix naturally has:
- Riesz Representation Theorem:
- functionals on Hilbert (complete inner product) space are isomorphic to inner products with a fixed vector in that space.
- i.e. f(v) is bijectively mapped to <v, u> where u uniquely exists for each f.
- related application: Gaussian Process Regression
- Adjoint and conjugate transpose
- <Tu, v> = <u, T*v>
- the existence of the right hand side is a obvious result of Riesz repr theorem. Then we find a T from v and that unique vector (Tv).
- self-adjoint i.e. Hermitian
- Could be diagnolized with eigen values are all real, eigen vectors are orthognol
- Polar decomposition :
- : unitary, isometry, non-singular. rotation transform.
- : positive semi-definite. Modulars of . Scaling transform
- SVD
- it separates the PD from polar decomposition
- diagnolization w.r.t. two different unitary basis
- singular values are just sqrt of eigen values of
- full SVD and reduced/economic SVD (up to larger-than-thresh singular values)
- geometric interpretation: unit ball to ellipsoid
- implementation:
- first eigen decompose . Eigen values sqrt = singular values; eigen vectors are
- then calculate :
- linear system: error in y = cond_number * error in input
- the amplification is defined by condition number, ratio between max and min singular value
- replace X in with we get
- This is minimal-norm solution?
- PCA:
- first demean
- could eigen decompose Cov(X)
- or could SVD(X) directly
- the absolute value of the determinant is the product of the singular values
-
Change of basis
- Represent a vector under another basis:
- If is a matrix of column vectors as basis, then given how do we get ? Here is vector coordinates under standard basis
- Say , then , where are columns of . Therefore,
- Represent a linear map under different source and destination basis:
- we get _similar matrix_ if we set and
- i.e. making intput/output basis the same, then change basis
- we get where
- is easier to calculate if we notice where is standard basis
Quadratic form
- bi-linear form: linear in both variables
- quadratic form expressed by matrix:
- pick symmetric matrix
- is equivalent with
- Check P.D.
- det(A_i) > 0 for all i from 1 to n. A_i is top-left sub-matrix of A
- all eigen values > 0
-
Determinant and Trace where A is square
- Det
- Zero: some input dim collapsed; rank < n, kernel space non trivial.
- Non-zero: magnitude is overall scaling = product of eigen-values; sign is
- always equal to product of eigenvalues (in complex numbers)
- time const term in characteristic polynomial
- Trace
- term coef in characteristic polynomial
Numerical methods
- Jordan Cononical Form: not stable. intuitively diagnolizable matrices are dense
- eigen decomposition, inversion and LU decomposition are not stable when it's ill-conditioned.
- SVD is stable
- QR is stable
- Find largest eigen-value: will get closer to as
- find smallest eigen-value, use
- find full eigen-decomposition: QR-decomposition under the hood
- find roots for characteristic poly: too slow and no general solution
-
-
Books
_Axler,_Sheldon_-_Linear_Algebra_Done_Right-Springer_International_Publishing_Imprint_Springer_(2015)_1694840400161_0.pdf)
-