当前位置：网站首页>Deep understanding of Kalman filter (1): background knowledge

Deep understanding of Kalman filter (1): background knowledge

2022-07-27 08:51:00 【DeepDriving】

Deep understanding of Kalman filter （1）： Background knowledge

Insert picture description here

Statement ： The pictures and pictures of this article are from https://www.kalmanfilter.net/, If there is infringement, please contact to delete .

This article is from the official account of WeChat 【DeepDriving】 Arrangement , The full text is divided into 3 Part of it . Official account 【DeepDriving】, Background reply keyword 【 Kalman filter 】 The full text is available PDF.

Background knowledge

Before introducing the Kalman filter , Let's first learn some basic knowledge related to Mathematics .

Mean and expectation

mean value （Mean） And expectations （Expected Value） Are two similar but different concepts . If we have 2 gold 5 Penny and 3 gold 10 Cent coins , It's easy to calculate their mean ：

$V_{mean}= \frac{1}{N} \sum _{n=1}^{N}V_{n}= \frac{1}{5} \left( 5+5+10+10+10 \right) = 8 branch$

The above results cannot be called expected values , Because the state of the system is not implicit and we use all 5 Coins to calculate the average .

Now suppose a person tests continuously 5 Secondary weight , The results are as follows ：79.8 kg 、80 kg 、 80.1 kg 、79.8 kg 、80.2 kg , The random measurement error of the scale itself leads to different measurement values each time . We don't know what the real weight is , Because this is an implicit variable , But we can deal with 5 Calculate the average value of the measurement results of times to estimate a relatively accurate weight value ：

$\frac{1}{N} \sum _{n=1}^{N}W_{n}= \frac{1}{5} \left( 79.8+80+80.1+79.8+80.2 \right) = 79.98 kg$

The above average value can be called the expected value of implicit variable weight .
The average is usually written in Greek $\mu$ To express , The expected value is written in letters $E$ To express .

Variance and standard deviation

variance （Variance） Used to measure the dispersion of a set of data , That is, the deviation between the sample data and the mean ; Standard deviation （Standard Deviation） Is the square root of the variance , It's usually in Greek $\sigma$ To express , And the variance is expressed as $\sigma^{2}$ .

Suppose there are two high school basketball team members whose heights are shown in the following table ：

	team member 1	team member 2	team member 3	team member 4	team member 5	Average
A team	1.89m	2.1m	1.75m	1.98m	1.85m	1.914m
B team	1.94m	1.9m	1.97m	1.89m	1.87m	1.914m

We want to compare the height data of the two basketball teams . First , From the above table, we can know that the average height of the two teams is the same . Further , We can compare their variances and standard deviations . use $x$ It means height , $\mu$ Indicates the average height , According to the calculation formula of variance and standard deviation

$\sigma ^{2}= \frac{1}{N} \sum _{n=1}^{N} \left( x_{n}- \mu \right) ^{2}$

$\sigma =\sqrt[]{\frac{1}{N} \sum _{n=1}^{N} \left( x_{n}- \mu \right) ^{2}}$

Get A Variance of team height $\sigma^{2}_{A}=0.014m^{2}$ , Standard deviation $\sigma_{A}=0.12m$ ;B Variance of team height $\sigma^{2}_{B}=0.0013m^{2}$ , Standard deviation $\sigma_{B}=0.036m$ . We can know from the variance of the height of the two teams ,A There is a greater difference in the height of team members .

Suppose we want to calculate the mean and variance of the height of all the players of all high school basketball teams , This will be a difficult task , Because we need to collect statistics from every team member in every school . However, we can collect a relatively large data set , Then the mean and variance of the height of all team members are estimated through this data set . such as , We can charge randomly 100 Height data of team members , This data set is enough to accurately estimate the mean and variance of the height of all team members . It should be noted that , At this time, the formula for calculating variance is slightly different from the above , Divisor is $N - 1$ instead of $N$ ：

$\sigma ^{2}= \frac{1}{N-1} \sum _{n=1}^{N} \left( x_{n}- \mu \right) ^{2}$

coefficient $N - 1$ It is called Bessel correction （Bessel's correction）, For detailed mathematical proof, please refer to This article .

Normal distribution

Many natural phenomena follow normal distribution （Normal Distribution） Laws . Take the height of basketball players as an example , If we randomly select the height of the team members to build a large data set , And draw a graph of height value and its occurrence frequency , We will get a bell curve similar to the figure below ：

Insert picture description here

You can see that this curve is based on the mean 1.9m A symmetrical curve centered , And the number of occurrences of values near the mean value is much higher than that of remote values . The standard deviation of this set of data is 0.2m, As shown in the figure below , Yes 68.26% The value of is within one standard deviation from the mean （1.7m~2.1m）:

Insert picture description here

Normal distribution is also called Gaussian distribution , The formula is as follows ：

$\left( x; \mu , \sigma ^{2} \right) = \frac{1}{\sqrt[]{2 \pi \sigma ^{2}}}e^{\frac{- \left( x- \mu \right) ^{2}}{2 \sigma ^{2}}}$

The above curve is called the probability density function of normal distribution （Probability Density Function,PDF）.

The measurement error usually conforms to the normal distribution , Therefore, when designing Kalman filter, we will assume that the measurement error is normally distributed .

A random variable

If you use a speed gun to measure the speed of a moving vehicle , Then the measured value of the velocimeter gun is a random variable , The measurement results are normally distributed . Random variables can be continuous , It can also be discrete , All measurements are continuous random variables .

It is estimated that 、 Accuracy and accuracy

It is estimated that （Estimate） It is an estimation of the implicit state of the system . For example, the real position of the aircraft is an implicit state value for the observer , We can use radar and other sensors to measure and improve the accuracy of estimation through multi-sensor fusion and tracking algorithm . The measured or calculated parameters are estimated values .

Accuracy （Accuracy） Used to indicate the closeness between the measured value and the real value .

accuracy （Precision） Used to express the reproducibility of measurement results .

Estimation needs to consider the accuracy and accuracy of the system , The following figure illustrates the relationship between accuracy and precision ：

Insert picture description here

The variance of the measured value of the high-precision system is small （ Low uncertainty ）, conversely , The variance of the measured value of the low accuracy system is large （ High uncertainty ）, Variance is caused by random measurement error .

Systems with low accuracy are called biased systems , Because there will always be an internal systematic error in its measured value （ deviation ）.

Averaging or smoothing the measured values can significantly reduce the influence of variance . such as , If we use a thermometer with random measurement error to measure temperature , Measurement error will cause the measured value to be higher or lower than the true value . We can make multiple measurements and average them , This estimate will be close to the real value , The more measurements , The closer the estimated value is to the real value . But if the thermometer itself has deviation , Then the estimated value will have a fixed systematic error .

The following figure describes the measured values from a statistical point of view ：

Insert picture description here

The measured value is a random variable described by the probability density function ;
The mean value of the measured value is the expected value of the random variable ;
The deviation between the mean value of the measured value and the true value is called deviation or systematic measurement error , Used to indicate the accuracy of measurement ;
The degree of dispersion of the measured value distribution is the accuracy of the measured value , Also known as measurement noise （measurement noise）、 Random measurement error （random measurement error） Or measurement uncertainty （measurement uncertainty）.

Covariance and covariance matrix

Covariance is used to measure two random variables $x$ and $y$ The degree of joint change , It represents the overall error of two variables , This is different from variance , Variance represents only one variable error . Variance can be regarded as a special case of covariance , That is, the two variables are the same . If the trends of the two variables are the same , That is, if one of them is greater than its own expectations , The other is also greater than one's own expectations , Then the covariance between the two variables is positive . If the change trend of two variables is opposite , That is, one of them is greater than its own expectation , And the other is less than its own expectations , Then the covariance between the two variables is negative . If $x$ And $y$ It's statistically independent , So the covariance between them is going to be 0. A random variable $x$ And $y$ The covariance of is calculated as follows ：

$\sigma(x,y)= \frac{1}{N-1} \sum _{n=1}^{N} \left( x_{n}- \mu_{x} \right) \left( y_{n}- \mu_{y} \right)$

among , $\mu_{x}$ and $\mu_{y}$ They are random variables $x$ And $y$ Average value .

For a sample containing $k$ Vectors of elements $\boldsymbol{x}$

$\begin{bmatrix} x_{1} & x_{2} & x_{3} & \dots & x_{k} \end{bmatrix}^{T}$

The covariance matrix is

$COV(\boldsymbol{x}) = E\left( \left[ \begin{matrix} (x_{1} - \mu_{x_{1}})^{2} & (x_{1} - \mu_{x_{1}})(x_{2} - \mu_{x_{2}}) & \cdots & (x_{1} - \mu_{x_{1}})(x_{k} - \mu_{x_{k}}) \\ (x_{2} - \mu_{x_{2}})(x_{1} - \mu_{x_{1}}) & (x_{2} - \mu_{x_{2}})^{2} & \cdots & (x_{2} - \mu_{x_{2}})(x_{k} - \mu_{x_{k}}) \\ \vdots & \vdots & \ddots & \vdots \\ (x_{k} - \mu_{x_{k}})(x_{1} - \mu_{x_{1}}) & (x_{k} - \mu_{x_{k}})(x_{2} - \mu_{x_{2}}) & \cdots & (x_{k} - \mu_{x_{k}})^{2} \\ \end{matrix} \right] \right) \\ = E\left( \left[ \begin{matrix} (x_{1} - \mu_{x_{1}}) \\ (x_{2} - \mu_{x_{2}}) \\ \vdots \\ (x_{k} - \mu_{x_{k}}) \\ \end{matrix} \right] \left[ \begin{matrix} (x_{1} - \mu_{x_{1}}) & (x_{2} - \mu_{x_{2}}) & \cdots & (x_{k} - \mu_{x_{k}}) \end{matrix} \right] \right) \\ = E\left( \left( \boldsymbol{x - \mu_{x}} \right) \left( \boldsymbol{x - \mu_{x}} \right)^{T} \right)$

Basic expectation operation rules

A random variable $X$ The expectations of the $E (X)$ Equal to its average ：

$\mu_{X}$

Some basic expectation operation rules are as follows ：

The rules	remarks
$\mu_{X}=\sum{xp(x)}$	$p (x)$ yes $x$ Probability
$E (a) = a$	$a$ Constant
$E (a X) = a E (X)$	$a$ Constant
$E(a\pm{X}) = a\pm{E(X)}$	$a$ Constant
$E(a\pm{bX}) = a\pm{bE(X)}$	$a, b$ Constant
$E(X\pm{Y}) = E(X)\pm{E(Y)}$	$Y$ For another random variable
$E (X Y) = E (X) E (Y)$	If $X$ and $Y$ Are independent of each other

The random variable $X$ and $Y$ The variances of are recorded as $V (X)$ and $V (Y)$ , Their covariance is recorded as $C O V (X, Y)$ , Here are some basic operation rules ：

The rules	remarks
$V (a) = 0$	$a$ Constant
$V(a\pm{X})=V(X)$	$a$ Constant
$V(X)=E(X^{2})-\mu^{2}_{X}$
$COV(X,Y)=E(XY)-\mu_{X}\mu_{Y}$
$C O V (X, Y) = 0$	If $X$ and $Y$ Are independent of each other
$V(aX) = a^{2}V(X)$	$a$ Constant
$V(X\pm{Y}) = V(X)+V(Y)\pm{2COV(X,Y)}$
$\ne V(X)V(Y)$

The following is the proof of several formulas ：
(1).
$E((X-\mu_{X})^{2}) \\ = E(X^{2}-2X\mu_{X}+\mu^{2}_{X}) \\ = E(X^{2})-E(2X\mu_{X})+E(\mu^{2}_{X}) \\ = E(X^{2})-2\mu_{X}E(X)+\mu^{2}_{X} \\ = E(X^{2})-2\mu_{X}\mu_{X}+\mu^{2}_{X} \\ = E(X^{2})-\mu^{2}_{X} \\$

(2).
$E((X-\mu_{X})(Y-\mu_{Y})) \\ = E(XY - X \mu_{Y} - Y \mu_{X} + \mu_{X}\mu_{Y}) \\ = E(XY) - E(X \mu_{Y}) - E(Y \mu_{X}) + E(\mu_{X}\mu_{Y}) \\ = E(XY) - \mu_{Y} E(X) - \mu_{X} E(Y) + E(\mu_{X}\mu_{Y}) \\ = E(XY) - \mu_{Y} \mu_{X} - \mu_{X} \mu_{Y} + \mu_{X}\mu_{Y} \\ = E(XY) - \mu_{X}\mu_{Y} \\$

(3).
$E((aX)^{2})-(a\mu_{X})^{2} \\ = E(a^{2}X^{2})-a^{2}\mu_{X}^{2} \\ = a^{2}E(X^{2})-a^{2}\mu_{X}^{2} \\ = a^{2}(E(X^{2})-\mu_{X}^{2}) \\ = a^{2}V(X) \\$

(4).
$V(X\pm{Y}) = E((X \pm Y)^{2}) - (\mu_{X} \pm \mu_{Y})^{2} \\ = E(X^{2} \pm 2XY + Y^{2}) - (\mu_{X}^2 \pm 2\mu_{X}\mu_{Y} + \mu_{y}^2) \\ = {E(X^{2}) - \mu_{X}^2} + {E(Y^{2}) - \mu_{Y}^2} \pm 2(E(XY) - \mu_{X}\mu_{Y} ) \\ = {V(X)} + {V(Y)} \pm 2(E(XY) - \mu_{X}\mu_{Y} ) \\ = V(X) + V(Y) \pm 2COV(X,Y) \\$

Differential of matrix product trace

Here we will prove two formulas .

(1).
$\frac{d}{d\boldsymbol{A}} \left( tr\left( \boldsymbol{AB} \right) \right) = \boldsymbol{B}^{T}$

prove ：

Given two matrices $\boldsymbol{A}$ ( $m\times n$ ) and $\boldsymbol{B}$ ( $n\times m$ ), The product of them is

$\boldsymbol{AB}= \left[ \begin{matrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \\ \end{matrix} \right] \left[ \begin{matrix} b_{11} & \cdots & b_{1m} \\ \vdots & \ddots & \vdots \\ b_{n1} & \cdots & b_{nm} \\ \end{matrix} \right] = \left[ \begin{matrix} \sum_{i=1}^{n}a_{1i}b_{i1} & \cdots & \sum_{i=1}^{n}a_{1i}b_{im} \\ \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}a_{mi}b_{i1} & \cdots & \sum_{i=1}^{n}a_{mi}b_{im} \\ \end{matrix} \right]$

matrix $\boldsymbol{AB}$ The trace of $tr(\boldsymbol{AB})$ Is the sum of its main diagonal elements ：

$tr(\boldsymbol{AB}) = \sum_{i=1}^{n}a_{1i}b_{i1} + \cdots + \sum_{i=1}^{n}a_{mi}b_{im} = \sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij}$

Contralateral trace $tr(\boldsymbol{AB})$ Find differentiation

$\frac{\partial tr(\boldsymbol{AB})}{\partial\boldsymbol{A}} = \left[ \begin{matrix} \frac{\partial (\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij})}{\partial a_{11}} & \cdots & \frac{\partial (\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij})}{\partial a_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial (\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij})}{\partial a_{m1}} & \cdots & \frac{\partial (\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij})}{\partial a_{mn}} \\ \end{matrix} \right] \\ = \left[ \begin{matrix} b_{11} & \cdots & b_{n1} \\ \vdots & \ddots & \vdots \\ b_{1m} & \cdots & b_{nm} \\ \end{matrix} \right] \\ = \boldsymbol{B}^{T}$

(2).

$\frac{d}{d\boldsymbol{A}} \left( tr\left( \boldsymbol{ABA^{T}} \right) \right) = 2\boldsymbol{AB}$

among $\boldsymbol{B}$ Is symmetric matrix .

prove ：

Given two matrices $\boldsymbol{A}$ ( $m\times n$ ) and $\boldsymbol{B}$ ( $n\times m$ ),

$\boldsymbol{ABA}^{T} = \left[ \begin{matrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \\ \end{matrix} \right] \left[ \begin{matrix} b_{11} & \cdots & b_{1n} \\ \vdots & \ddots & \vdots \\ b_{n1} & \cdots & b_{nn} \\ \end{matrix} \right] \left[ \begin{matrix} a_{11} & \cdots & a_{m1} \\ \vdots & \ddots & \vdots \\ a_{1n} & \cdots & a_{mn} \\ \end{matrix} \right] \\ = \left( \left[ \begin{matrix} \sum_{i=1}^{n}a_{1i}b_{i1} & \cdots & \sum_{i=1}^{n}a_{1i}b_{in} \\ \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}a_{mi}b_{i1} & \cdots & \sum_{i=1}^{n}a_{mi}b_{in} \\ \end{matrix} \right] \right) \left[ \begin{matrix} a_{11} & \cdots & a_{m1} \\ \vdots & \ddots & \vdots \\ a_{1n} & \cdots & a_{mn} \\ \end{matrix} \right] \\ = \left[ \begin{matrix} \sum_{j=1}^{n}\sum_{i=1}^{n}a_{1i}b_{ij}a_{1j} & \cdots & \sum_{j=1}^{n}\sum_{i=1}^{n}a_{1i}b_{ij}a_{mj} \\ \vdots & \ddots & \vdots \\ \sum_{j=1}^{n}\sum_{i=1}^{n}a_{mi}b_{ij}a_{1j} & \cdots & \sum_{j=1}^{n}\sum_{i=1}^{n}a_{mi}b_{ij}a_{mj} \\ \end{matrix} \right] \\$

matrix $\boldsymbol{ABA}^{T}$ The trace of $tr(\boldsymbol{ABA}^{T})$ Is the sum of its main diagonal elements ：

$tr(\boldsymbol{ABA}^{T}) = \sum_{j=1}^{n}\sum_{i=1}^{n}a_{1i}b_{ij}a_{1j} + \cdots + \sum_{j=1}^{n}\sum_{i=1}^{n}a_{mi}b_{ij}a_{mj} = \sum_{k=1}^{m}\sum_{j=1}^{n}\sum_{i=1}^{n}a_{ki}b_{ij}a_{kj}$

Contralateral trace $tr(\boldsymbol{ABA}^{T})$ Find differentiation
Insert picture description here

Because the matrix $\boldsymbol{B}$ It's a symmetric matrix , therefore $\boldsymbol{B=B^{T}}$ , Available

$\frac{\partial tr(\boldsymbol{ABA}^{T})}{\partial\boldsymbol{A}} = \boldsymbol{AB}^{T} + \boldsymbol{AB} = \boldsymbol{AB} + \boldsymbol{AB} = 2\boldsymbol{AB}$