当前位置：网站首页>Slam Kalman filter & nonlinear optimization

Slam Kalman filter & nonlinear optimization

2022-06-11 00:52:00 【Little Hawking】

SLAM Kalman filtering && Nonlinear optimization

Gaussian distribution （ Normal distribution ） Is a common continuous probability distribution . A mathematical expectation or expectation of a normal distribution $μ$ Equal to the position parameter , Determines the location of the distribution ; Its variance $\sigma ^{2}$ The square root or standard deviation of $\sigma$ Equal to scale parameter , It determines the magnitude of the distribution .

1. Gaussian distribution

A random variable $x$ Obey Gauss distribution $N$ , Then its probability density function is ：
$p(x)=\frac{1} { {\sigma\sqrt{2\pi}}}exp(-\frac{(x-\mu)^2}{2\sigma^2})$
Its high dimensional form is ：
$\frac{1}{\sqrt{(2\pi)^N det(\Sigma)}}exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))$

2. Operation of Gaussian distribution

2.1 Linear operation

Let two independent Gaussian distributions ：
$x\sim N(\mu_x,\Sigma_{xx}),y\sim N(\mu_y,\Sigma_{yy})$
So their sum is still a Gaussian distribution ：
$x+y\sim N(\mu_x+\mu_y,\Sigma_{xx}+\Sigma_{yy})$
If you take a constant $a$ multiply $x$ , that $a x$ Satisfy ：
$ax\sim N(\mu_ax,a^2 \Sigma_{xx})$
If you take $y = A x$ , that $y$ Satisfy ：
$y\sim N(\mu_Ax,A^2 \Sigma_{xx})$

2.2 The product of

Let the product of two Gaussian distributions satisfy $p(xy)=N(\mu,\Sigma)$ , that ：
$\Sigma^{-1}=\Sigma^{-1}_{xx}+\Sigma^{-1}_{yy}\\\Sigma_{\mu}=\Sigma^{-1}_{xx}\mu_x+\Sigma^{-1}_{yy}\mu_y$

2.2 Compound operation

At the same time, consider $x$ and $y$ , When they are not independent , The composite distribution is ：
$p(x,y)=N\left (\begin{bmatrix}\mu_x\\\mu_y\end{bmatrix}, \begin{bmatrix}\Sigma_{xx}&\Sigma_{xy}\\\Sigma_{yx}&\Sigma_{yy}\end{bmatrix}\right)$
among $\Sigma_{yx}$ and $\Sigma_{yx}$ Is the covariance of two variables , Satisfy ：
$C o v (X, Y) = E [(X - E (X)) (Y - E (Y))] = E [X Y] - E [X] E [Y]$
The covariance is positive , It means that these two variables are positively correlated , Zero is irrelevant , Negative is negative correlation .
By conditional distribution （ Bayes' rule ） Expansion $p (x, y) = p (x ∣ y) p (y)$ Can be launched , Conditional probability $p (x ∣ y)$ Satisfy ：
$p(x|y)=N(\mu_x+\Sigma_{xx}\Sigma_{yy}^{-1}(y-\mu_y),\Sigma_{xx}-\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx})$

Shuerbu

Now let's deduce how to reach this conclusion ：
Shure complement theory ： Decompose the matrix into upper triangular matrix 、 Diagonal matrix 、 The form of product of lower triangular matrix , Easy to calculate , namely ：
$\begin{bmatrix}A&B\\C&D\end{bmatrix}\\= \begin{bmatrix}I&BD^{-1}\\0&I\end{bmatrix}\begin{bmatrix}\Delta D&0\\0&D\end{bmatrix}\begin{bmatrix}I&0\\D^{-1}C&I\end{bmatrix}$
among $\Delta D=A-BD^{-1}C$ It's called a matrix $D$ Of Shuerbu .
similarly ：
$\begin{bmatrix}A&B\\C&D\end{bmatrix}^{-1}\\= \begin{bmatrix}I&0\\-D^{-1}C&I\end{bmatrix}\begin{bmatrix}\Delta D^{-1}&0\\0&D^{-1}\end{bmatrix}\begin{bmatrix}I&-BD^{-1}\\0&I\end{bmatrix}$

Specific derivation process （ understand ）

because $p (x ∣ y) = p (x, y) / p (y)$ , According to its exponential partial expansion form, we can get ：
$\left(\begin{bmatrix}x\\y\end{bmatrix}-\begin{bmatrix}\mu_x\\\mu_y\end{bmatrix} \right)^T\begin{bmatrix}\Sigma_{xx}&\Sigma_{xy}\\\Sigma_{yx}&\Sigma_{yy}\end{bmatrix}^T\left(\begin{bmatrix}x\\y\end{bmatrix}-\begin{bmatrix}\mu_x\\\mu_y\end{bmatrix} \right)\\=\left(\begin{bmatrix}x\\y\end{bmatrix}-\begin{bmatrix}\mu_x\\\mu_y\end{bmatrix} \right)^T \begin{bmatrix}1&0\\-\Sigma_{yy}^{-1}\Sigma_{yx}&1\end{bmatrix} \begin{bmatrix}(\Sigma_{xx}-\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx})^{-1}&0\\0&\Sigma_{yy}^{-1}\end{bmatrix} \begin{bmatrix}1&-\Sigma_{xy}\Sigma_{yy}^{-1}\\0&1\end{bmatrix}\left(\begin{bmatrix}x\\y\end{bmatrix}-\begin{bmatrix}\mu_x\\\mu_y\end{bmatrix} \right)\\=\left(x-\mu_x-\Sigma_{xx}\Sigma_{yy}^{-1}(y-\mu_y)\right)^{T}(\Sigma_{xx}-\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx})^{T}\left(x-\mu_x-\Sigma_{xx}\Sigma_{yy}^{-1}(y-\mu_y)\right)^{T}+(y-\mu_y)^T\Sigma_{yy}^{-1}(y-\mu_y)$
because $p (y)$ The index part of is ：
$(y-\mu_y)^T\Sigma_{yy}^{-1}(y-\mu_y)$
therefore ,
$p(x|y)=p(x,y)/p(y)=N(\mu_x+\Sigma_{xx}\Sigma_{yy}^{-1}(y-\mu_y),\Sigma_{xx}-\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx})$

3. Examples of compound operations

Consider random variables $x\sim N(\mu_x,\Sigma_{xx})$ , Another variable $y$ Satisfy ：
$y = A x + b + w$
among $A, b$ Is the coefficient matrix and offset of the linear variable , $w$ Is the noise term , Gaussian distribution with zero mean ： $\sim N(0,R)$ . therefore ：
$E(y)=E(Ax+b+w)=AE(x)+b+E(w)=A\mu_x+b\\ Cov(y)=E[(x-E[x])(x-E(x))]+Cov(w)=AE[(x-\mu_x)(x-\mu_x)^T]A^T+R=A\Sigma_{xx} A^T+R$
$p(y)=N(A\mu_x+b,A\Sigma_{xx}A^T+R)$

4.SLAM The mathematical representation of the problem

Usually , A robot or an unmanned vehicle will carry a sensor to measure its own motion （IMU, Wheel speed odometer, etc ）, Through integral operation, we can abstract a mathematical model about ontology perception ：
$x_k=f(x_{k-1},u_k,w_k)$
there $u_k$ Is the movement sensor reading （ Sometimes called input ）, $w_k$ Is noise , This equation is called Equation of motion , At present, there is no specific sports significance , Because the motion sensor can be selected , Different equations can be listed according to different motion sensors , Or a combination of them , That is, multi-sensor fusion .
alike , stay SLAM There is also an environment sensing sensor in the （ Laser radar 、 The camera 、 Infrared equipment ） The observation part of , The corresponding equation of environmental perception is The observation equation , When a robot or an unmanned car is $x_k$ A road marking is observed at the position of $y_j$ , At this point, an observation data is generated z_{k,j}, Similarly, the mathematical model is abstracted as ：
$z_{k,j}=h(y_j,x_k,v_{k,j})$
among , $v_{k,j}$ To observe noise , Similarly, the observation equation has different forms according to different environmental sensors .
therefore SLAM The problem can be summed up in two basic equations ：
$\left\{ \begin{array}{lr} x_k=f(x_{k-1},u_k,w_k) \\ z_{k,j}=h(y_j,x_k,v_{k,j})& \end{array} \right.$
It can be seen from the above equation ：
1. In the observation equation , Only in position $x_k$ I saw it $y_j$ The observation data will be generated , Otherwise, there will be no . But in practice , A small number of road signs can usually be seen at one location , At the same time, whether it is a camera or a lidar to observe features , There are many feature points , So in the actual process The observation equation will be much larger than the number of motion equations .
2. In the equation of motion , In the actual process , If you do pure vision SLAM Or pure laser SLAM, The equation of motion is often missing ; In this case, it is usually assumed that the vehicle is moving at a constant speed or at a constant speed or at a constant acceleration .
3. Postures $x$ （ location ） And road signs $y$ （ Drawing ） as A random variable that obeys a certain probability distribution , Not a single number . Since both are quantities to be estimated , Change the mark , Make $x_k$ by $k$ All the unknowns of time , At the same time, it includes the vehicle posture at the current time （ It is assumed that the calibration of the sensor and the vehicle body has been completed ） And m A road mark , be ：
$x_k\triangleq \{x_k,y_1,......,y_m\}$
In the same way $k$ All observations at all times are recorded as $z_k$ , therefore ：
$\left\{ \begin{array}{lr} x_k=f(x_{k-1},u_k) +w_k\\ z_{k}=h(x_k)+v_{k,j}& \end{array} \right.$
The posterior probability density to be estimated is directly given as ：
$P(x_k|x_0,u_{1:k},z_{1:k})$
among ：
$x_0$ The initial value representing the state ; $u_{1:k}$ Express $1$ To $k$ Input of motion sensor ; $z_{1:k}$ It means $0$ To $k$ All observations at all times , The question means ：** Based on all historical data ( Input 、 observation 、 The initial state ) Get the final fusion result , That is, the filtering problem .** It is shown by a graph as ：

Insert picture description here
according to Bayes The laws of , take $z_k$ and $x_k$ Swap places ：
$P(x_k|x_0,u_{1:k},z_{1:k})\propto P(z_k|x_k)P(x_k|x_0,u_{1:k},z_{1:k-1})$
Note here ： The first item is called likelihood , The second item is called transcendental . In different ways , The prior part is actually obtained by motion prediction , be called forecast part , There are also sections based on all past states called transcendental , as follows , according to $x_k-1$ Time is the conditional probability expansion ：
$P(x_k|x_0,u_{1:k},z_{1:k-1})=\int P(x_k|x_{k-1},x_0,u_{1:k},z_{1:k-1})P(x_{k-1}|x_0,u_{1:k},z_{1:k-1})dx_{k-1}$
For this formula , This is called （A） The formula , Draw the following conclusions and assumptions ：
1. If we think about what we were like a long time ago , You can also continue to expand this formula .
2. If you only care about $k$ Time and $k - 1$ moment , We can assume that First order Markov property ： $k$ The state of the moment is only related to $k - 1$ Time state is irrelevant , It has nothing to do with the previous state , Get the filtering method , Such as Extended Kalman filter （EKF）： Estimate by the state at a certain time , To the next moment .
3. If you think about $k$ Time status and before All States The connection of , obtain An optimization framework based on nonlinear optimization .

5. Kalman filtering （KF）

Yes （A） The formula makes Markov assumption , The current moment is only related to the previous moment , The forecast part is ：
$P(x_k|x_{k-1},x_0,u_{1:k},z_{1:k-1})=P(x_k|x_{k-1},u_k)$
The prior part can be reduced to ：
$P(x_{k-1}|x_0,u_{1:k},z_{1:k-1})=P(x_{k-1}|x_0,u_{1:k-1},z_{1:k-1})$
As can be seen from the figure, the input quantity $u_{k}$ And $k - 1$ Time independent . The two equations above ： With $k - 1$ State estimation of time , Deduce to $k$ moment .
hypothesis ：
1. The state quantity here obeys the Gaussian distribution , It only needs to consider the mean and covariance of the maintenance state quantity .
2. All States and noises satisfy Gaussian distribution .
Linear Gauss system can be expressed by motion equation and observation equation ：
$\left\{ \begin{array}{lr} x_k=A_kx_{k-1}+u_k+w_k\\ z_{k}=C_kx_k+v_k& \end{array} \right.$
The noise obeys the zero mean Gaussian distribution , be ：
$w_k\sim N(0,\bm R),v_k\sim N(0,\bm Q)$
remember $\hat x$ Is posterior , $\overline x$ Is a priori distribution , Suppose that $k - 1$ A posteriori state estimation of time ： $\hat x_{k-1}$ And the covariance at this time $\hat{\bm P}_{k-1}$ . The purpose at this time is relatively clear ： according to $k - 1$ A posteriori state estimation of time , And input and observation data determination $k$ A posteriori distribution of time .
1. Forecast part
according to Equation of motion and Gaussian distribution Determined as ：
$P(x_k|x_{k-1},x_0,u_{1:k},z_{1:k-1})=N(\bm A_k\hat x_{k-1}+u_k,\bm A_k\bm {\hat P}_{k-1}\bm A_k^T+\bm R)$
remember ：
$\overline x=\bm A_k\hat x_{k-1}+u_k,\overline {\bm P}=\bm A_k\bm {\hat P}_{k-1}\bm A_k^T+\bm R$
2. The observation equation
The observation equation can be calculated in a certain state , Data should be observed , according to The observation equation and Gaussian distribution Determined as ：
$P(z_k|x_k)=N(C_kx_k,Q)$
Let the result be $x_k\sim N(\hat x_k,\hat {\bm P}_k)$ , According to the formula ：
$P(x_k|x_0,u_{1:k},z_{1:k})\propto P(z_k|x_k)P(x_k|x_0,u_{1:k},z_{1:k-1})$
You can get a formula familiar with the previous derivation ：
$N(\hat x_k,\hat {\bm P}_k)=N(\overline x_k,\overline {\bm P}_k)\cdot N(\bm C_kx_k,\bm Q)=N\left (\begin{bmatrix}\mu_x\\\mu_y\end{bmatrix}, \begin{bmatrix}\Sigma_{xx}&\Sigma_{xy}\\\Sigma_{yx}&\Sigma_{yy}\end{bmatrix}\right)$
Use the conclusion directly ：
$N(\hat x_k,\hat {\bm P}_k)=N(\mu_x+\Sigma_{xy}\Sigma_{yy}^{-1}(y-\mu_y),\Sigma_{xx}-\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx})$
You can see... Directly here ：
$\bm {\overline P}_k=\Sigma_{xx}\\\Sigma_{yy}=\Sigma(C_kx_k)+\Sigma(n)=E[(C_kx_k-u_x)(C_kx_k-u_x)^T]+\bm Q= C_k\bm {\overline P}_kC_k^T+\bm Q$
Can be derived
$\Sigma_{xy}=E[(x-u_x)(z_k-u_z)^T]\\ =E[(x-u_x)(C_kx_k-C_ku_x+n)^T)]\\ =E[(x-u_x)(C_kx_k-C_ku_x)^T+(x-u_x)n^T]\\ =\Sigma_{xx}C_k^T\\ =\bm {\overline P}_kC_k^T\\ \Sigma_{yx}=\Sigma_{xy}^T=C_k\bm {\overline P}_k$
Definition ：
$\bm K=\Sigma_{xy}\Sigma_{yy}^{-1}=\bm {\overline P}_kC_k^T(C_k\bm {\overline P}_kC_k^T+\bm Q)^{-1}$
Then there are
$\hat{\bm P}_k=\bm {\overline P}_k-\bm KC_k\bm {\overline P}_k=(\bm I-\bm KC_k)\bm {\overline P}_k\\ \hat x_k=\overline x_k+\bm K(z_k-C_k\overline x)$
Sort it out , The five equations are ：

1. forecast ： $\overline x=\bm A_k\hat x^{k-1}+u_k,\bm {\overline { P_k}}=\bm A_k\bm {\hat P}_{k-1}\bm A_k^T+\bm R$
2. to update ： To calculate $\bm K$ , Also known as Kalman gain ：
$\bm K=\bm {\overline P}_kC_k^T(C_k\bm {\overline P}_kC_k^T+\bm Q)^{-1}$
Then calculate the posterior probability distribution ： $\hat{\bm P}_k=(\bm I-\bm KC_k)\bm {\overline P}_k\\ \hat x_k=\overline x_k+\bm K(z_k-C_k\overline x)$

thus , We deduce the whole process of the classical Kalman filter . In fact, there are several ways to deduce Kalman filter , We use the form of maximum posterior probability estimation from the perspective of probability . We see , In a linear Gaussian system , Kalman filter constitutes the maximum posterior probability estimation in the system . and , Because the Gaussian distribution is
After linear transformation, it still obeys Gaussian distribution , So we didn't make any approximations in the whole process . so to speak , Kalman filter constitutes the best unbiased estimation of linear system .

6. Extended Kalman filter （EKF）

Kalman filter is usually applied to linear systems , But in SLAM in , The equations of motion and observation are usually nonlinear functions . Regardless of vision SLAM The model is still a laser SLAM, Use lie algebra to represent the pose , It's even less likely to be a linear system . A Gaussian distribution , After nonlinear transformation , It's often no longer Gaussian , So in nonlinear systems , Approximate treatment is required , A non Gaussian distribution is approximated to a Gaussian distribution .
The results of Kalman filter are usually extended to nonlinear systems , It is called extended Kalman filter （Extended Kalman Filter,EKF）, The usual way is , Consider the first-order Taylor expansion of the equation of motion and the observation equation near a point , Only the first-order term is retained , The linear part , Then it is deduced according to the linear system . The nonlinear equations of motion and observation are ：
Equation of motion ：
$x_k=f(x_{k-1},u_k,w_k)$
The observation equation is ：
$z_k=h(x_k,n_k)$

Make $k - 1$ The mean and covariance matrix at the time point is $\hat \bm{x}_{k-1}$ , $\hat \bm{P}_{k-1}$ . stay $k$ moment , Combine the equation of motion with the equation of observation , stay $\hat \bm{x}_{k-1}$ , $\hat \bm{P}_{k-1}$ Linearization at （ Equivalent to first-order Taylor expansion ）：
$\left. x_k\approx f(\hat x_{k-1},u_k,w_k)+\frac{\partial f}{\partial x_{k-1}}\right|_ {\hat x_{k-1}}(x_{k-1}-\hat x_{k-1})+\left.\frac{\partial f}{\partial w_{k}}\right|_ {\hat x_{k-1}}w_k$
Note that the partial derivative here is ：
$\left. \bm F=\frac{\partial f}{\partial x_{k-1}}\right|_ {\hat x_{k-1}}$
$\bm B_{k-1}=\left.\frac{\partial f}{\partial w_{k}}\right|_ {\hat x_{k-1}}$
For the observation equation , Similarly, there are ：
$\left. z_k\approx h(\overline x_k)+\frac{\partial h}{\partial x_{k}}\right|_ {\overline x_{k}}(x_{k}-\overline x_k)+\left.\frac{\partial h}{\partial n_{k}}\right|_ {\overline x_{k}}n_k$
remember :
$\left. \bm H=\frac{\partial h}{\partial x_{k}}\right|_ {\hat x_{k}}$
$C_k=\left.\frac{\partial h}{\partial n_{k}}\right|_ {\overline x_{k}}$
The expansion linearization result equation is ：
$E[x_k]=f(\hat x_{k-1},u_k,0)+\bm F_{k-1}(x_{k-1}-\hat x_{k-1})+E[\bm B_{k-1}w_k]$
here $w_k$ and $n_k$ Obey the zero mean distribution ：
$w_k\sim N(0,\bm R),n_k\sim N(0,\bm Q)$
therefore $E[\bm B_{k-1}w_k]=0$ ,
here $E(x_{k-1}]=\hat x_{k-1}$ $E(\hat x_{k-1})=\hat x_{k-1}$
The result is ：
$E[x_k]=f(\hat x_{k-1},u_k,0)$
The covariance matrix is ：
$\begin{aligned} Cov[x_k]&=E[(x_{k-1}-E[x_{k-1}])(x_{k-1}-E[x_{k-1}])^T]+Cov[\bm B_{k-1}w_k]\\ &=\bm F \bm{\hat{P}}_{k-1}\bm F^T +\bm B_{k-1}\bm R_k\bm B_{k-1}^T \end{aligned}$
* The derivation here is based on the transitivity of Gaussian distribution covariance ：
$E(y)=E(Ax+b+w)=AE(x)+b+E(w)=A\mu_x+b\\ Cov(y)=E[(x-E[x])(x-E(x))]+Cov(w)=AE[(x-\mu_x)(x-\mu_x)^T]A^T+R=A\Sigma_{xx} A^T+R$
$Cov[\bm B_{k-1}w_k]=\bm B_{k-1}\bm Cov[w_k]\bm B_{k-1}^T$
Synthesis can lead to
$p(x_k|x_0,u_{1:k},z_{0:k-1})=N(f(\hat x_{k-1},u_k,0),\bm F \bm{\hat{P}}_{k-1}\bm F^T +\bm B_{k-1}\bm R_k\bm B_{k-1}^T)$
Similar to Kalman filter , Here you are ：
$\overline x_k=f(\hat x_{k-1},u_k,0),\bm {\overline P}_{k}=\bm F \bm{\hat{P}}_{k-1}\bm F^T +\bm B_{k-1}\bm R_k\bm B_{k-1}^T$
similarly , In the observation equation ：
$E[z_k]=E[h(\overline x_k)+\bm H(x_k-\overline x_{k})+\bm C_kn_k]=h(\overline x_k)+\bm H(x_k-\overline x_{k})\\ Cov[z_k]=E[(\overline x_k-E[\overline x_k])(\overline x_k-E[\overline x_k])^T]+Cov[\bm C_kn_k]\\ =\bm C_kE[n_k]\bm C_k^T=\bm C_k\bm Q\bm C_k^T$
Observed as
$p(z_k|x_k)=N(h(\overline x_k)+\bm H(x_k-\overline x_{k}),\bm C_k\bm Q\bm C_k^T)$
deduction EKF For the same ：
$N(\hat x_k,\hat {\bm P}_k)=N(\overline x_k,\overline {\bm P}_k)\cdot N(h(\overline x_k)+\bm H(x_k-x_{k-1}),\bm C_k\bm Q\bm C_k^T)=N\left (\begin{bmatrix}\mu_x\\\mu_y\end{bmatrix}, \begin{bmatrix}\Sigma_{xx}&\Sigma_{xy}\\\Sigma_{yx}&\Sigma_{yy}\end{bmatrix}\right)$
Also use the conclusion directly ：
Use the conclusion directly ：
$N(\hat x_k,\hat {\bm P}_k)=N(\mu_x+\Sigma_{xy}\Sigma_{yy}^{-1}(y-\mu_y),\Sigma_{xx}-\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx})$
You can see... Directly here ：
$\bm {\overline P}_k=\Sigma_{xx}\\\Sigma_{yy}=\Sigma(h(\overline x_k))+\Sigma(H_k(x_k-\overline x_k))+\Sigma(\bm C_kn_k)=0+\bm HE[(x_{k-1}-E[x_{k-1}])(x_{k-1}-E[x_{k-1}])^T]\bm H^T+\bm C_k\bm Q\bm C_k^T= \bm H\bm {\overline P}_k\bm H^T+\bm C_k\bm Q\bm C_k^T$
Can be derived
$\Sigma_{xy}=E[(x-u_x)(z_k-u_z)^T]\\ =E[(x-u_x)(h(\overline x_k)+\bm H(x_{k}-\overline x_k)+\bm Cn_k-h(\overline x_k)]\\ =E[(x-u_x)(\bm H_kx_k-\bm H_ku_x)^T+(x-u_x)n^T]\\ =\Sigma_{xx}\bm H_k^T\\ =\bm {\overline P}_k\bm H_k^T\\ \Sigma_{yx}=\Sigma_{xy}^T=\bm H_k\bm {\overline P}_k$
Define the Kalman gain ：
$\bm K_k=\Sigma_{xy}\Sigma_{yy}^{-1}=\bm {\overline P}_kH_k^T(H_k\bm {\overline P}_kH_k^T+\bm C_k\bm Q\bm C_k^T)^{-1}$
Based on Kalman gain ：
$\hat x_k=\overline x_k+\bm K_k(z_k-h(\overline x_k))$
$\bm {\hat P}_k=\overline {\bm P}_k-\bm K_k\bm H_k\bm {\overline P}_k=(\bm I-\bm K_k\bm H_k)\bm {\overline P}_k$
Sum up , obtain EKF The five classical formulas of are ：

1. forecast ：
$\overline x_k=f(\hat x_{k-1},u_k,0),\bm {\overline P}_{k}=\bm F \bm{\hat{P}}_{k-1}\bm F^T +\bm B_{k-1}\bm R_k\bm B_{k-1}^T$
2. to update ： To calculate $\bm K$ , Also known as Kalman gain ：
$\bm K_k=\Sigma_{xy}\Sigma_{yy}^{-1}=\bm {\overline P}_kH_k^T(H_k\bm {\overline P}_kH_k^T+\bm C_k\bm Q\bm C_k^T)^{-1}$
Then calculate the posterior probability distribution ：
$\bm {\hat P}_k=\overline {\bm P}_k-\bm K_k\bm H_k\bm {\overline P}_k=(\bm I-\bm K_k\bm H_k)\bm {\overline P}_k\\ \hat x_k=\overline x_k+\bm K_k(z_k-h(\overline x_k))$

thus , Extended Kalman filter （EKF） The derivation has been completed ,EKF Be concise in form 、 Widely used , In the early SLAM Has occupied a dominant position for a long time , however EKF There are some limitations ：

1. The filter assumes Markov property to some extent ： $k$ The state of the moment is only related to $k - 1$ Time related , And with the $k - 1$ The previous state has nothing to do with the observation （ Or related to the first few finite time states ). This assumption leads to laser / The visual odometer only considers the relationship between two adjacent frames , In case of loopback , Difficult to deal with , This requires Nonlinear optimization All historical data are put together for optimization , But the disadvantage is that it increases the amount of calculation , Consume more computing resources .
2.EKF The filter is only used in $\hat x_{k-1}$ A linearization is done at , And then directly based on the linearization results , Calculate a posterior probability . This method considers the linearized approximation at this point , The posterior probability is still valid . But in fact , When we are far away from the work site , The first-order Taylor expansion does not necessarily approximate the entire function , This depends on the nonlinearity of the motion model and the observation model . If they have strong nonlinearity , Then the linear approximation only holds in a very small range , It can't be considered that it can still be approximated by linearity in far places . This is it. EKF Nonlinear error of , Is its main problem .
3. In terms of program implementation ,EKF It is necessary to store the mean and variance of the state quantity , And maintain and update them . If you put the road signs into the State , Because of vision SLAM There are a large number of middle road signs , This amount of storage is considerable , And increases squarely with the state quantity （ Because you want to store the covariance matrix ）. therefore ,EKF SLAM It is generally considered unsuitable for large-scale scenarios .

7. Iterative extended Kalman filter （IEKF）

Because of the linearization approximation in the nonlinear model , When the degree of nonlinearity is stronger , The error will be large . however , Since the linearized operating point is closer to the true value , The smaller the linearization error , So one way to solve this problem is , Find the exact linearization point gradually through iteration , So as to improve the accuracy .
stay EKF In the derivation of , Only change the linearized observation point of the observation ：
$z_k\approx h(x_{op,k},0)+\bm H_k(x_{k}-x_{op,k})+\bm C_kn_k$
Define the Kalman gain ：
$\bm K_k=\Sigma_{xy}\Sigma_{yy}^{-1}=\bm {\overline P}_kH_k^T(H_k\bm {\overline P}_kH_k^T+\bm C_k\bm Q\bm C_k^T)^{-1}$
Based on Kalman gain ：
$\hat x_k=\overline x_k+\bm K_k((z_k-H_k\overline x_k)-(z_{op,k}-H(\overline x_k))$
During filtering , Repeat this 2 A formula , Take the posterior mean of the last time as the linear working point of this time , The purpose of reducing nonlinear error can be achieved . It should be noted that , In this filtering mode , A posteriori variance should be carried out in the last step .

8. Advantages and disadvantages of other filters

8.1 MSCKF

MSCKF The goal is solve EKF Dimension explosion problem of , That is to say EKF The third point of the discussion . Tradition EKF-SLAM Add feature points to the state vector and IMU State is estimated together , When the environment is very big , There will be many characteristic points , The dimension of the state vector becomes very large .SCKF Instead of adding feature points to the state vector , Instead, the positions and poses at different times are added to the state vector , Feature points will be seen by multiple sensors , Thus in multiple sensor states （Multi-State） Form geometric constraints between （Constraint）, Then the observation model is constructed by using geometric constraints EKF Conduct update. Because the number of sensor pose will be much smaller than the number of feature points ,MSCKF The dimensions of the state vector are compared to EKF-SLAM Greatly reduced , The historical camera state will be continuously removed , Maintain only a fixed number of camera poses （Sliding Window）, Thus to MSCKF The calculation amount of the back end is limited .
Study reference links and papers ：
1.A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation
2.Improving the Accuracy of EKF-Based Visual-Inertial Odometry
3. MSCKF Those things （ One ）MSCKF Introduction to the algorithm
4. MSCKF Those things （ Two ）S-MSCKF Trial and source code analysis

9. Kalman filtering - Discretization of system equations

Discrete derivation process ：
Given continuous time linear stochastic systems （ Linear stochastic differential equation ）：
$\dot {\bm X}(t)=\bm F(t)\bm X(t)+\bm G(t)w(t)$
According to linear system theory , The equivalent discretization form of the above formula is ：
$\bm X_k=\Phi_{k/k-1}\bm X_{k-1}+\eta_{k-1}$
In the formula above ,
$\bm X_k=\bm X(t_k)\\ \Phi_{k/k-1}=\Phi(t_k,t_{k-1})\approx e^{\int_{t_{k-1}}^{t_k}\bm F(\tau)d\tau}\\ \eta_{k-1}={\int_{t_{k-1}}^{t_k}\Phi(t_k,\tau)}\bm G(\tau)w(\tau)d\tau$
Note the discrete time interval $T_s=t_k-t_{k-1}$ , When $\bm F(t)$ In a short integral interval $t_{k-1},t_k]$ When the internal change is not too drastic , And set $\bm F(t_{k-1})T_s\ll \bm I$ , Then the further transfer matrix can be approximated as follows ：
$\Phi_{k/k-1}=\Phi(t_k,t_{k-1})\approx e^{\bm F(t_{k-1})T_s}=\bm I+\bm F(t_{k-1})T_s+\bm F^2(t_{k-1})\frac {T^2_s}{2!}+\bm F^3(t_{k-1})\frac {T^3_s}{3!}.......\approx \bm I+\bm F(t_{k-1})T_s$
$\eta_{k-1}$ Is a linear transformation of Gaussian white noise , The result is still a random vector with normal distribution , So we can use the mean value , variance , That is to say , Second order statistical characteristics to describe and equivalent $\eta_{k-1}$ .
Finally, the discretized formula is as follows ：
$\bm X_k=\Phi_{k/k-1}\bm X_{k-1}+\Gamma_{k-1}w_{k-1}$
among ：
$\Gamma_{k-1}\approx [\bm I+\bm F(t_{k-1})T_s]\bm G(k-1)\approx \bm G(k-1)$