当前位置：网站首页>[mathematical basis of machine learning] (I) linear algebra (Part 1 +)

[mathematical basis of machine learning] (I) linear algebra (Part 1 +)

2022-07-04 18:56:00 【Binary artificial intelligence】

【 The mathematical basis of machine learning 】（ One ） linear algebra (Linear Algebra)（ On ）

2 linear algebra (Linear Algebra)（ On +）

2.3 The solution of linear equations

Earlier, we introduced the general form of equations , namely ：
$\begin{aligned}a_{11} x_{1}+\cdots+a_{1 n} x_{n} &=b_{1} \\& \vdots \\a_{m 1} x_{1}+\cdots+a_{m n} x_{n} &=b_{m}\end{aligned}$

among $a_{i j} \in \mathbb{R}$ , $b_{i} \in \mathbb{R}$ Is a known constant , $x_j$ Is an unknown quantity , $\ldots, m, j=1, \ldots, n$ . up to now , We see that matrix can be used as a concise method to express linear equations , In this way, we can write the linear equations as $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}$ . Besides , We also define the basic matrix operations , Such as matrix addition and multiplication . below , We will focus on the solution of linear equations , An algorithm for finding the inverse of matrix is provided .

2.3.1 Special solution and general solution

Before discussing how to solve linear equations , Let's start with an example . Consider the system of equations

$\left[\begin{array}{cccc}1 & 0 & 8 & -4 \\0 & 1 & 2 & 12\end{array}\right]\left[\begin{array}{l}x_{1} \\x_{2} \\x_{3} \\x_{4}\end{array}\right]=\left[\begin{array}{c}42 \\8\end{array}\right]\qquad (2.38)$

This system of equations has two equations and four unknowns . therefore , Generally speaking , We can get an infinite number of solutions . The form of this system of equations is a minimalist form , The first two columns consist of a 0 And a 1 form .

Our goal is to find scalars $x_{1}, \ldots, x_{4}$ , bring $\sum_{i=1}^{4} x_{i} \boldsymbol{c}_{i}=\boldsymbol{b}$ , among $\boldsymbol{c}_i$ Is the first... Of the matrix $i$ Column , $b$ Is the of the system of equations (2.38) The right side of the . For this system of equations , This can be done by taking the... In the first column 42 Times and the second column 8 Times the solution ：

$\boldsymbol{b}=\left[\begin{array}{c}42 \\8\end{array}\right]=42\left[\begin{array}{l}1 \\0\end{array}\right]+8\left[\begin{array}{l}0 \\1\end{array}\right]$

therefore , One solution of the system of equations is $[42,8,0,0]^{\top}$ , This solution is called Special solution (particular solution or special solution).

However , This is not the only solution to this system of linear equations . To get other solutions , We need to creatively use the columns of the matrix to Extraordinary ( non-trivial) Generate vectors in the same way $\mathbf{0}$ ： The special solution is multiplied by the matrix of the system of equations , Then add... On both sides of the equation at the same time $\mathbf{0}$ Does not affect the equation .

We use the first two columns （ Their form is very simple ） Represents the third column
$\left[\begin{array}{l}8 \\2\end{array}\right]=8\left[\begin{array}{l}1 \\0\end{array}\right]+2\left[\begin{array}{l}0 \\1\end{array}\right]$

Then there are $\mathbf{0}=8 \boldsymbol{c}_{1}+2 \boldsymbol{c}_{2}-1 \boldsymbol{c}_{3}+0 \boldsymbol{c}_{4}$ , Get the solution ： $\left(x_{1}, x_{2}, x_{3}, x_{4}\right)=(8,2,-1,0)$ . in fact , This solution is arbitrary $\lambda_{1} \in \mathbb{R}$ Scaling will produce $\mathbf{0}$ vector , namely ：
$\left[\begin{array}{llll}1 & 0 & 8 & -4 \\0 & 1 & 2 & 12\end{array}\right]\left(\lambda_{1}\left[\begin{array}{c}8 \\2 \\-1 \\0\end{array}\right]\right)=\lambda_{1}\left(8 c_{1}+2 c_{2}-c_{3}\right)=\mathbf{0}$

similarly , We use the first two columns to represent the fourth column of the matrix in the system of equations , For any $\lambda_{2} \in \mathbb{R}$ Generate another set $\mathbf{0}$ An extraordinary version of
$\left[\begin{array}{llll}1 & 0 & 8 & -4 \\0 & 1 & 2 & 12\end{array}\right]\left(\lambda_{2}\left[\begin{array}{c}-4 \\12 \\0 \\-1\end{array}\right]\right)=\lambda_{2}\left(-4 \boldsymbol{c}_{1}+12 \boldsymbol{c}_{2}-\boldsymbol{c}_{4}\right)=\mathbf{0}$

Liberate all together , All solutions of the equations are obtained , be called general solution (general solution), Expressed in the form of a set as ：
$\left\{x \in \mathbb{R}^{4}: x=\left[\begin{array}{c}42 \\8 \\0 \\0\end{array}\right]+\lambda_{1}\left[\begin{array}{c}8 \\2 \\-1 \\0\end{array}\right]+\lambda_{2}\left[\begin{array}{c}-4 \\12 \\0 \\-1\end{array}\right], \lambda_{1}, \lambda_{2} \in \mathbb{R}\right\}$

remarks ：
The general solution of linear equations includes the following three steps ：

(1) find $A x = b$ Special solution of .
(2) find $A x = 0$ All the solutions of .
(3) Combine steps (1) and (2) Get the general solution .
Note that neither the general solution nor the special solution is unique .

The linear equations in the above example are easy to solve , Because the matrix in the system of equations has a particularly simple form , This enables us to obtain the special solution and general solution through the surrogate test . However , The general equations will not be in this simple form .

Fortunately, , There is a constructive algorithm that can convert any system of linear equations into this particularly simple form ： Gauss elimination ( Gaussian elimination). The key of Gauss elimination method is the elementary transformation of linear equations , Convert the equations into a simple form . then , We can use the above three steps to solve the linear equations .

2.3.2 Elementary transformation

The key to solving linear equations is Elementary transformation (elementary transformations), It can keep the solution set unchanged , Transform the equations into simpler forms ：

（1） Two equations （ The row of the matrix representing the system of equations ） In exchange for
（2） equation ( That's ok ) Multiply by a constant $\ { 0 } \lambda \in \mathbb{R} \backslash\{0\}$
（3） Two equations ( That's ok ) The addition of

remarks ：（1）、（2）、（3） Can be combined .

example 2.6
about $\in \mathbb{R}$ , Find all solutions of the following equations ：
$\begin{array}{rrrrrrrr}-2 x_{1} & + & 4 x_{2} & - & 2 x_{3} & - & x_{4}&+ & 4 x_{5} & = & -3 \\4 x_{1} & - & 8 x_{2} & + & 3 x_{3} & - & 3 x_{4}&+ & x_{5} & = & 2 \\x_{1} & - & 2 x_{2} & + & x_{3} & - & x_{4}&+ & x_{5} & = & 0 \\x_{1} & - & 2 x_{2} & & & - &3 x_{4} & + & 4 x_{5} & = & a\end{array}$

We first convert this system of equations into a compact matrix representation $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}$ . We no longer explicitly mention variables $\boldsymbol{x}$ , It's about building Augmented matrix (augmented matrix)（ In the form of $[\boldsymbol{A} \mid \boldsymbol{b}]$ ）

We use vertical lines to separate the left and right sides of the equations .
$\left[\begin{array}{rrrrr|r}-2 & 4 & -2 & -1 & 4 & -3 \\4 & -8 & 3 & -3 & 1 & 2 \\1 & -2 & 1 & -1 & 1 & 0 \\1 & -2 & 0 & -3 & 4 & a\end{array}\right]\begin{array}{l}\text{Swap with } R_3\\ \\\text{Swap with }R_1 \\\\\end{array}$

Swap the first line $R_1$ And the third line $R_3$ obtain ：

$\left[\begin{array}{rrrrr|r}1 & -2 & 1 & -1 & 1 & 0 \\4 & -8 & 3 & -3 & 1 & 2 \\-2 & 4 & -2 & -1 & 4 & -3 \\1 & -2 & 0 & -3 & 4 & a\end{array}\right] \begin{array}{l} \\ -4 R_{1} \\+2 R_{1} \\-R_{1}\end{array}$

We use the transformation specified in the above formula （ for example , The above formula includes 2 Line minus four times 1 That's ok ） after , We get
$\left[\begin{array}{rrrrr|r}1 & -2 & 1 & -1 & 1 & 0 \\0 & 0 & -1 & 1 & -3 & 2 \\0 & 0 & 0 & -3 & 6 & -3 \\0 & 0 & -1 & -2 & 3 & a\end{array}\right]$
We use it $\rightsquigarrow$ To represent the elementary transformation of the augmented matrix .

$\qquad\left[\begin{array}{rrrrr|r}1 & -2 & 1 & -1 & 1 & 0 \\0 & 0 & -1 & 1 & -3 & 2 \\0 & 0 & 0 & -3 & 6 & -3 \\0 & 0 & -1 & -2 & 3 & a\end{array}\right]\begin{array}{l} \\\\\\-R_{2}-R_{3}\end{array}$

$\rightsquigarrow\left[\begin{array}{rrrrr|r}1 & -2 & 1 & -1 & 1 & 0 \\0 & 0 & -1 & 1 & -3 & 2 \\0 & 0 & 0 & -3 & 6 & -3 \\0 & 0 & 0 & 0 & 0 & a+1\end{array}\right] \begin{array}{l} \cdot (-1)\\\cdot (-\frac{1}{3})\\ \end{array}$
$\rightsquigarrow \quad\left[\begin{array}{rrrrr|r}1 & -2 & 1 & -1 & 1 & 0 \\0 & 0 & 1 & -1 & 3 & -2 \\0 & 0 & 0 & 1 & -2 & 1 \\0 & 0 & 0 & 0 & 0 & a+1\end{array}\right]\qquad\qquad$

This （ Augmentation ） The matrix now becomes a simple form —— Line ladder form （row-echelon form,REF）. Reduce this compact representation to an explicit representation , We get
$\begin{array}{rrrrrr} x_{1} & -&2x_{2}&+&x_{3} & -&x_{4}&+ &x_{5} & = & 0 \\&&&&x_{3}&-& x_{4}&+&3 x_{5} & = & -2 \\&&&&& & x_{4}&-&2 x_{5} & = & 1 \\& &&&&&& & 0 & = & a+1\end{array}$

Only when the $a = - 1$ The equations have a solution . A special solution is ：
$\left[\begin{array}{l}x_{1} \\x_{2} \\x_{3} \\x_{4} \\x_{5}\end{array}\right]=\left[\begin{array}{c}2 \\0 \\-1 \\1 \\0\end{array}\right]$

general solution ：
$\left\{x \in \mathbb{R}^{5}: x=\left[\begin{array}{c}2 \\0 \\-1 \\1 \\0\end{array}\right]+\lambda_{1}\left[\begin{array}{l}2 \\1 \\0 \\0 \\0\end{array}\right]+\lambda_{2}\left[\begin{array}{c}2 \\0 \\-1 \\2 \\1\end{array}\right], \quad \lambda_{1}, \lambda_{2} \in \mathbb{R}\right\}$

below , We will introduce in detail a constructive method to obtain the special and general solutions of linear equations .

remarks ： Principal element and ladder structure

Yes Leading coefficient （leading coefficient, The first non-zero number from the left ） be called Principal component (Pivots ), And always strictly to the right of the principal element of the upper row . therefore , Any row ladder (row-echelon form) All equations have “ ladder (staircase)” structure .

Definition 2.6( Row ladder )

A matrix is Row ladder (row-echelon form) The matrix needs to meet ：

All rows containing only zeros are at the bottom of the matrix ; Accordingly , All rows containing at least one non-zero element are at the top of rows containing only zero .
Just look at non-zero lines , The first non-zero number from the left （ Also known as principal element or leading coefficient ） Always strictly to the right of the row principal above it .

remarks ： Basic and free variables

The variable corresponding to the principal element of the row ladder type is called Basic variables (basic variables), Other variables are called Free variable (free variables).

for example , about
$\begin{array}{rrrrrr} x_{1} & -&2x_{2}&+&x_{3} & -&x_{4}&+ &x_{5} & = & 0 \\&&&&x_{3}&-& x_{4}&+&3 x_{5} & = & -2 \\&&&&& & x_{4}&-&2 x_{5} & = & 1 \\& &&&&&& & 0 & = & a+1\end{array}$
$x_1,x_3,x_4$ Is the basic variable , $x_2,x_5$ It's a free variable .

remarks ：( Seeking special solution )

When we need to determine a particular solution , The row ladder is convenient for us to solve . To do this , We use the principal component column to represent the right side of the system of equations , namely $\boldsymbol{b}=\sum_{i=1}^{P} \lambda_{i} \boldsymbol{p}_{i}$ , among $\boldsymbol{p}_{i}, i=1, \ldots, P$ The column of the primary element , That is, the main element column .

$λ_i$ It's easy to be sure , We can Start with the rightmost primary column , One to the left Once certain .

In the previous example , We're trying to find $λ_1,λ_2,λ_3$ , bring ：
$\lambda_{1}\left[\begin{array}{l}1 \\0 \\0 \\0\end{array}\right]+\lambda_{2}\left[\begin{array}{l}1 \\1 \\0 \\0\end{array}\right]+\lambda_{3}\left[\begin{array}{c}-1 \\-1 \\1 \\0\end{array}\right]=\left[\begin{array}{c}0 \\-2 \\1 \\0\end{array}\right]$

From here , We find relatively intuitively $\lambda_{3}=1, \lambda_{2}=-1, \lambda_{1}=2$ . For non principal Columns , We implicitly set its coefficient to 0. therefore , We get the special solution as ： $[2,0,-1,1,0]^{\top}$

remarks ： The simplest step

A system of equations is The simplest step (Reduced Row Echelon Form, Also known as row-reduced echelon form or row canonical form) Need to meet ：

It's a row ladder
Each principal element is 1
The principal element is the only non-zero item in its column .

The simplest step of the line will be in the next 2.3.3 Section plays an important role , Because it allows us to directly determine the general solution of linear equations

remarks ： Gauss elimination

Gauss elimination (Gaussian elimination) It is an algorithm that transforms a system of linear equations into a row simplest ladder type through elementary transformation .

example 2.7 The simplest step

There are the following lines of the simplest ladder matrix ( bold 1 The main element )：
$\boldsymbol{A}=\left[\begin{array}{ccccc}\mathbf{1} & 3 & 0 & 0 & 3 \\0 & 0 & \mathbf{1} & 0 & 9 \\0 & 0 & 0 & \mathbf{1} & -4\end{array}\right]\qquad (2.49)$

seek $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{0}$ The key to the solution of is the non principal element column , We need to express it as the of the primary column （ linear ） Combine ; The simplest ladder makes this relatively simple . We use the sum of multiples of the principal column on the left to represent the non principal column ：

The second column is the... Of the first column 3 times （ We can ignore the primary column to the right of the second column ）. therefore , In order to get $\mathbf{0}$ , We need to subtract the second column from three times the first column .

Now? , Let's look at the second non principal column —— The fifth column .

The fifth column can be composed of... Of the first principal column 3 times 、 Of the second principal column 9 times And the third principal column −4 times Express . We need to use the index of the primary column , And convert the fifth column to the... Of the first column 3 times 、 Second column （ Non principal column ） Of 0 times 、 The third column ( The second non principal column ) Of 9 Times and the fourth column -4 times （ That is, the third principal element column ） The sum of the , Then subtract the fifth column to get $\mathbf{0}$ . Last , You can solve this homogeneous system of equations .

All in all , $\boldsymbol{A} \boldsymbol{x}=0, \boldsymbol{x} \in \mathbb{R}^{5}$ All solutions of are given by
$\left\{\boldsymbol{x} \in \mathbb{R}^{5}: \boldsymbol{x}=\lambda_{1}\left[\begin{array}{c}3 \\-1 \\0 \\0 \\0\end{array}\right]+\lambda_{2}\left[\begin{array}{c}3 \\0 \\9 \\-4 \\-1\end{array}\right], \quad \lambda_{1}, \lambda_{2} \in \mathbb{R}\right\}\qquad(2.50)$

2.3.3 Minus-1 skill

below , We will introduce a practical technique to solve homogeneous linear equations $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{0}$ Solution $\boldsymbol{x}$ , among $\boldsymbol{A} \in \mathbb{R}^{k \times n}, \boldsymbol{x} \in \mathbb{R}^{n}$

First , We assume that $\boldsymbol{A}$ yes The simplest step , And there are no rows that contain only zeros , namely ：
$\boldsymbol{A}=\left[\begin{array}{ccccccccccccccc}0 & \cdots & 0 & \mathbf{1} & * & \cdots & * & 0 & * & \cdots & * & 0 & * & \cdots & * \\\vdots & & \vdots & 0 & 0 & \cdots & 0 & \mathbf{1} & * & \cdots & * & \vdots & \vdots & & \vdots \\\vdots & & \vdots & \vdots & \vdots & & \vdots & 0 & \vdots & & \vdots & \vdots & \vdots & & \vdots \\\vdots & & \vdots & \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots & 0 & \vdots & & \vdots \\0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 & 1 & * & \cdots & *\end{array}\right]$

among $*$ Is a random real number , $\boldsymbol{A}$ The first non-zero item of each line must be 1, All other items in the corresponding column must be 0.

Columns with principal elements $j_{1}, \ldots, j_{k}$ （ Mark it in bold ） Is the standard unit vector $e_{1}, \ldots, e_{k} \in \mathbb{R}^{k}$ . We add $n - k$ Expand the matrix to $n \times n$ - matrix $\tilde{\boldsymbol{A}}$ , The form of the line added by the extension is ：

$\left[\begin{array}{lllllll}0 & \cdots & 0 & -1 & 0 & \cdots & 0\end{array}\right]\qquad (2.52)$

So the augmented matrix $\tilde{\boldsymbol{A}}$ The diagonal of contains 1 or -1. then , contain −1 The column as the principal element is a system of homogeneous equations $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{0}$ Solution . More precisely , These columns make up $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{0}$ Basis of solution space , We also call it kernel or zero space ( see 2.7.3 section ).

example 8 (Minus-1 Trick)
about (2.49) This simplest step (REF) matrix ：
$\boldsymbol{A}=\left[\begin{array}{ccccc}1 & 3 & 0 & 0 & 3 \\0 & 0 & 1 & 0 & 9 \\0 & 0 & 0 & 1 & -4\end{array}\right]$

Now let's add... Where the principal element is missing on the diagonal (2.52) Lines of form , Expand this matrix into a 5 × 5 Matrix
$\tilde{\boldsymbol{A}}=\left[\begin{array}{ccccc}1 & 3 & 0 & 0 & 3 \\\textcolor{blue}{0} & \textcolor{blue}{-1} & \textcolor{blue}{0} & \textcolor{blue}{0} & \textcolor{blue}{0} \\0 & 0 & 1 & 0 & 9 \\0 & 0 & 0 & 1 & -4 \\\textcolor{blue}{0} & \textcolor{blue}{0} & \textcolor{blue}{0} & \textcolor{blue}{0} & \textcolor{blue}{-1}\end{array}\right]$

We can take $\tilde{\boldsymbol{A}}$ It contains... On the diagonal -1 Columns of , Get... Immediately $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{0}$ Solution ：
$\left\{\begin{array}{l}\left.x \in \mathbb{R}^{5}: \boldsymbol{x}=\lambda_{1}\left[\begin{array}{c}3 \\-1 \\0 \\0 \\0\end{array}\right]+\lambda_{2}\left[\begin{array}{c}3 \\0 \\9 \\-4 \\-1\end{array}\right], \quad \lambda_{1}, \lambda_{2} \in \mathbb{R}\right\}\end{array}\right\}$

And example 2.7 Same result for .

Seeking inverse

For calculation $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ The inverse of $\boldsymbol{A}^{-1}$ , We need to find satisfaction $\boldsymbol{A X}=\boldsymbol{I}_{n}$ Matrix $\boldsymbol{X}$ . $\boldsymbol{X}=\boldsymbol{A}^{-1}$ . We can write it as a set of linear equations $\boldsymbol{A X}=\boldsymbol{I}_{n}$ And solve $\boldsymbol{X}=\left[\boldsymbol{x}_{1}|\cdots| \boldsymbol{x}_{n}\right]$ . We use augmented matrix representation to represent this set of linear equations , And make the following transformation ：
$\left[\boldsymbol{A} \mid \boldsymbol{I}_{n}\right] \quad \rightsquigarrow \cdots \rightsquigarrow \quad\left[\boldsymbol{I}_{n} \mid \boldsymbol{A}^{-1}\right]$

It means , If we simplify the augmented system of equations into a row simplest ladder , We can read the inverse of the matrix on the right-hand side of the equations . therefore , Determining the inverse matrix of a matrix is equivalent to solving a system of linear equations .

example 2.9 Use Gauss elimination method to find the inverse matrix

seek
$\boldsymbol{A}=\left[\begin{array}{llll}1 & 0 & 2 & 0 \\1 & 1 & 0 & 0 \\1 & 2 & 0 & 1 \\1 & 1 & 1 & 1\end{array}\right]$
The inverse of .

Let's write the augmented matrix

$\left[\begin{array}{cccc|cccc}1 & 0 & 2 & 0 & 1 & 0 & 0 & 0 \\1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 \\1 & 2 & 0 & 1 & 0 & 0 & 1 & 0 \\1 & 1 & 1 & 1 & 0 & 0 & 0 & 1\end{array}\right]$
The Gauss elimination method is used to transform it into the simplest step ：
$\left[\begin{array}{cccc|cccc}1 & 0 & 0 & 0 & -1 & 2 & -2 & 2 \\0 & 1 & 0 & 0 & 1 & -1 & 2 & -2 \\0 & 0 & 1 & 0 & 1 & -1 & 1 & -1 \\0 & 0 & 0 & 1 & -1 & 0 & -1 & 2\end{array}\right]$

such , The required inverse matrix is given on the right ：
$\boldsymbol{A}^{-1}=\left[\begin{array}{cccc}-1 & 2 & -2 & 2 \\1 & -1 & 2 & -2 \\1 & -1 & 1 & -1 \\-1 & 0 & -1 & 2\end{array}\right]$

We can perform multiplication $\boldsymbol{A} \boldsymbol{A}^{-1}$ Is it equal to $\boldsymbol{I}_4$ To test .

2.3.4 Algorithm for solving linear equations

In the following , We will briefly discuss $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}$ The solution method of linear equations in form . Here we assume that there is a solution . If there is no solution , We need to resort to approximate solutions , Such as linear regression in Chapter 9 , There is no introduction here .

If we can determine the inverse $\boldsymbol{A}^{−1}$ , that $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}$ The solution of is expressed as $\boldsymbol{x}=\boldsymbol{A}^{-1} \boldsymbol{b}$ . However , Only when $\boldsymbol{A}$ When it is a square matrix and reversible , That's how it works , But it's not usually the case . however , Under appropriate assumptions （ namely $\boldsymbol{A}$ Need to have linearly independent columns ）, We can use the following transformations ：
$\boldsymbol{A} \boldsymbol{x}=\boldsymbol{b} \Longleftrightarrow \boldsymbol{A}^{\top} \boldsymbol{A} \boldsymbol{x}=\boldsymbol{A}^{\top} \boldsymbol{b} \Longleftrightarrow \boldsymbol{x}=\left(\boldsymbol{A}^{\top} \boldsymbol{A}\right)^{-1} \boldsymbol{A}^{\top} \boldsymbol{b}$

That is to use Moore-Penrose Pseudo inverse (Moore-Penrose pseudo-inverse) $\left(\boldsymbol{A}^{\top} \boldsymbol{A}\right)^{-1} \boldsymbol{A}^{\top}$ To make sure $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}$ Of Moore-Penrose Pseudo inverse solution $\left(\boldsymbol{A}^{\top} \boldsymbol{A}\right)^{-1} \boldsymbol{A}^{\top}\boldsymbol{b}$ , This also corresponds to the solution of the minimum norm least squares method .

The disadvantage of this method is that it needs to sum the product of the matrix $\boldsymbol{A}^{\top} \boldsymbol{A}$ Do a lot of calculations . Besides , For reasons of numerical accuracy , It is generally not recommended to calculate inverse or pseudo inverse . therefore , In the following , We will briefly discuss other methods for solving linear equations .

Gauss elimination method is used to calculate determinant 、 Check whether the vector set is linearly independent 、 Calculate the inverse of the matrix , Calculate the rank of the matrix , And determining the basis of vector space . Gauss elimination method is an intuitive and constructive method to solve a system of linear equations with thousands of variables . However , For a system of equations with millions of variables , This is impractical , Because the amount of computation required increases according to the cubic of the number of simultaneous equations .

In practice , Many linear equations are solved by Steady iterative method (stationary iterative methods) Indirectly solved , Such as Richardson Method 、Jacobi Method 、Gauß-Seidel Method and successive over relaxation method , or Krylov Subspace method , Such as conjugate gradient 、 Generalized minimum residual or biconjugate gradient .

set up $\boldsymbol{x}_{*}$ yes $\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}$ Solution . The key idea of these iterative methods is to establish the following forms of iteration
$\boldsymbol{x}^{(k+1)}=\boldsymbol{C} \boldsymbol{x}^{(k)}+\boldsymbol{d}$

By looking for the right $\boldsymbol{C}$ and $\boldsymbol{d}$ , Reduce the residual in each iteration $\left\|\boldsymbol{x}^{(k+1)}-\boldsymbol{x}_{*}\right\|$ Until it converges to $\boldsymbol{x}_{*}$ . We will be in 3.1 Section introduces norm $\|\cdot\|$ , It allows us to calculate the similarity between vectors .