当前位置：网站首页>Derivation of Kalman filter (KF) and extended Kalman filter (EKF)

Derivation of Kalman filter (KF) and extended Kalman filter (EKF)

2022-06-11 01:48:00 【Captain xiaoyifeng】

Background knowledge

Kalman filter is based on Bayesian filter and Gaussian distribution , So let's talk about these two first .

Bayesian filtering

First, some basic concepts and formulas are given

Joint distribution
$p (x, y) = p (X = x, Y = y)$
Express x,y Probability of simultaneous occurrence
Conditional probability
$p (x ∣ y) = p (X = x ∣ Y = y)$
Express x stay y The probability of occurrence based on what has already occurred , set up x、y Are independent of each other （ It will be the same in the future ）, Then there are
$p(x|y)=\frac{p(x,y)}{p(y)}$
Prior probability
It can be called experience , Before the current data is read , The probability distribution of the state estimated by the system
Posterior probability
After reading the data （ After observation ）, The probability distribution of the state obtained by integrating the prior probability and the observation probability
Law of total probability
$p(x)=\sum_{y}p(x|y)p(y)（ Discrete situation ）$
$p(x)=\int p(x|y)p(y)dy（ Continuous situation ）$
Bayes rule
$p(x|y)=\frac{p(y|x)p(x)}{p(y)}=\frac{p(y|x)p(x)}{\sum_{x^{'}}p(y|x^{'})p(x^{'})}（ discrete ）$
$p(x|y)=\frac{p(y|x)p(x)}{p(y)}=\frac{p(y|x)p(x)}{\int p(y|x^{'})p(x^{'})dx^{'}}（ continuity ）$
among x It's called state ,y It's called data , $p (x)$ It is called a priori probability , $p (y ∣ x)$ go by the name of “ The inverse ” Conditional probability , and $p(y)^{-1}$ and x irrelevant , Often used as a coefficient $\eta$ .
It should be noted that , In general ,x Is used as an argument during the operation , our p yes x Probability distribution function of （ It is Gaussian distribution in Kalman filter , In some cases it may be a piecewise function ）, Not specific x The probability of . During the transition of state , The predicted probability distribution consists of all the possible x Take the weighted integral ／ Sum to get
integrity / Markov sex
Suppose a state $x_t$ Can best predict the future , That is to say, all the control information in the past is contained in it and cannot have an impact on the future （ In other words, information from the past should be used in the future , Must depend on $x_t$ ）, said $x_t$ Is a full , Bayesian filtering is based on this assumption .

Bayesian filtering is based on Bayesian criterion , The basic method is to obtain a posteriori probability through a priori probability and observation value , Get a more accurate probability distribution .
The algorithm of Bayesian filtering is simply summarized as two （ 3、 ... and ） That's ok ：

$\begin{array}{l}1:\overline{bel}(x_t)=\int p(x_t|u_t,x_{t-1})bel(x_{t-1})dx_{t-1}（ continuity ）\\2:\overline{bel}(x_t)=\sum p(x_t|u_t,x_{t-1})bel(x_{t-1})（ discrete ）\\3:bel(x_t)=\eta p(z_t|x_t)\overline{bel}(x_t)\end{array}$

among ：

$\eta$ It's the normalization factor , In the process of calculation, various constants are proposed and constantly changed .
$x$ Represents the state quantity
$z$ It means observation data
$u$ Represents the movement data recorded by the odometer
$bel、\overline{bel}$ representative Confidence value , $bel(x_t)=p(x_t|z_t),$ I.e. obtained after observation $x_t$ A probability distribution , $\overline{bel}$ Represents the confidence value of the internal prediction .
$p(x_t|u_t,x_{t-1})$ Express State transition probability , Express $x_{t-1}、u_t$ In certain cases , $x_t$ Probability distribution function of
$p(z_t|x_t)$ Apparent representation $x_t$ In certain cases , $z_t$ Probability distribution of （ adopt z About x Function of ）, In fact, because of x It is unknown. ,z It is known. , In the process of calculation x As an independent variable ,z As a constant .
That's ok 1、2 It stands for forecast Or call it Control updates
That's ok 3 It stands for Observation update

The premise of the algorithm is the integrity assumption , If used $p(x_t|u_t,x_{t-1})$ To express $p(x_t|u_{1:t},x_{0:t-1},z_{1:t-1})$

Simple deduction

forecast

$\begin{aligned} \overline{bel}(x_t)&=p(x_t|z_{1:t-1},u_{1:t})\\ &=\int p(x_t|x_{t-1},z_{1:t-1},u_{1:t})p(x_{t-1}|z_{1:t-1},u_{1:t})dx_{t-1}\\ &=\int p(x_t|x_{t-1},u_{t})bel(x_{t-1})dx_{t-1}\\ \end{aligned}$

Observation update

$\begin{aligned} bel(x_t)=p(x_t|z_{1:t},u_{1:t})&=\frac{p(z_t|x_t,z_{1:t-1},u_{1:t})p(x_t|z_{1:t-1},u_{1:t})}{p(z_t|z_{1:t-1,u_{1:t}})}\\ &=\eta p(z_t|x_t,z_{1:t-1},u_{1:t})p(x_t|z_{1:t-1},u_{1:t})\\ &=\eta p(z_t|x_t)\overline{bel}(x_t) \end{aligned}$

Gaussian distribution （ Normal distribution ）

Bayesian filtering x The value range is discrete , So Bayesian filtering is not really an available algorithm , Bayesian filtering is based on Gaussian distribution Gauss filtering It can make x The values are continuous , Kalman filter is one of them .

The Gaussian distribution is represented by the following function
$p(x)=(2\pi\sigma^2)^{-\frac12}e^{-\frac12\frac{(x-\mu)^2}{\sigma^2}}（x For the scalar ）$
among $\sigma^2$ Is variance , $\mu$ Is the mean
$p(\underline{x})=(2\pi\Sigma)^{-\frac12}e^{-\frac12{(\underline{x}-\underline{\mu})^T}{\Sigma^{-1}(\underline{x}-\underline{\mu})}}（\underline{x} Vector ）$
among $\Sigma$ Is variance （ Covariance matrix ）, $\underline{\mu}$ Is the mean
The reason why Gaussian distribution is used is that Gaussian distribution exists widely in nature , And has good properties , Although there are also shortcomings （ Unimodal ）. In Kalman filter , There is no definite state , Measured value 、 Both the predicted quantity and the final correction value are expressed by Gaussian distribution .

In Gaussian distribution , $p$ Integral （ Add up ） by 1, Of course, this is also the law of all probability distribution functions .

Fusion of Gaussian distributions

During the first derivation No, Use the following formula , But after using it, it seems that the following derivation is in vain … Dig a hole first

Gaussian distribution times Gaussian distribution or Gaussian distribution （ Structurally, it is obvious that ）
For the product of Gaussian distribution $X=X_1X_2$ , among
$\left\{\begin{array}{l}X_1\sim\mathbb{N}(\mu_1,\Sigma_1)\\X_2\sim\mathbb{N}(\mu_2,\Sigma_2)\end{array}\right.$
You can get the following results ：

$\left\{\begin{array}{l}K=\Sigma_1(\Sigma_1+\Sigma_2)^{-1})\\\mu=\mu_1+K(\mu_2-\mu_1)\\\Sigma=\Sigma_1-K\Sigma_1\end{array}\right.$
among K Is the Kalman gain , Prove slightly

Line generation correlation

Review by yourself ／ preview

Kalman filtering （KF）

Kalman filter is based on linear assumption , We assume that ：
$x_t=A_tx_{t-1}+B_tu_t+\varepsilon_t（ State transition function ）$
among $\varepsilon_t$ Yes, the mean is 0, The variance of $R_t$ Gaussian random vector

$z_t=C_tx_t+\delta_t（ Measurement function ）$
among $\delta_t$ Yes, the mean is 0, The variance of $Q_t$ Gaussian random vector

$bel(x_0)=p(x_0)（ Initial confidence ）$
among $p$ Yes, the mean is $\mu_0$ , The variance of $\Sigma_0$ Gaussian random vector

The Kalman filter can be expressed as the following lines of algorithms

$\begin{array}{l} 1:\overline{\mu}_t=A_t\mu_{t-1}+B_tu_t \\ 2:\overline{\Sigma}_t=A_t\Sigma_{t-1}A_t^{T}+R_t\\ 3:K_t=\overline{\Sigma}_tC_t^T(C_t\overline{\Sigma}_tC_t^T+Q_t)^{-1}\\ 4:\mu_t=\overline{\mu}_t+K_t(z_t-C_t\overline{\mu}_t)\\ 5:\Sigma_t=(I-K_tC_t)\overline{\Sigma}_t \end{array}$

Here's the proof

KF Mathematical derivation of

forecast

According to Bayesian filtering , We get
$\overline{bel}(x_t)=\int p(x_t|u_t,x_{t-1})bel(x_{t-1})dx_{t-1}$
So there is
$\overline{bel}(x_t)=\eta\int e^{-L_t}dx_{t-1}$
among
$L_t=\frac12(x_t-A_tx_{t-1}-B_tu_t)^TR_t^{-1}(x_t-A_tx_{t-1}-B_tu_t)+\frac12(x_{t-1}-\mu_{t-1})^T\Sigma_{t-1}^{-1}(x_{t-1}-\mu_{t-1})$
$L_t$ yes $x_t$ It's also $x_{t-1}$ The quadratic function of
To avoid integral operations , We make
$L_t=L_t(x_{t-1},x_t)+L_t(x_t)$
Propose an item that does not include $x_{t-1}$ Of $L_t(x_t)$
$\overline{bel}(x_t)=\eta e^{-L_t(x_t)}\int e^{-L_t(x_{t-1},x_t)}dx_{t-1}$
The following $L_t(x_{t-1},x_t)$ Constructed as Quadratic type （ After formulation ）（ It is my understanding that doing so will not raise the question of $x_t$ The item ）
Next, calculate $L_t$ About $x_{t-1}$ The first and second derivatives of , obtain
$L_t(x_{t-1},x_t)=\frac12(x_{t-1}-\Psi[A_t^TR_t^{-1}(x_t-B_tu_t)+\Sigma_{t-1}^{-1}\mu_{t-1}])^T\Psi^{-1}(x_{t-1}-\Psi[A_t^TR_t^{-1}(x_t-B_tu_t)+\Sigma_{t-1}^{-1}\mu_{t-1}])$

Again because
$\int det(2\pi\Psi)^{-\frac12}e^{-L_t(x_{t-1},x_t)}dx_{t-1}=1$
therefore
$\int e^{-L_t(x_{t-1},x_t)}dx_{t-1}=det(2\pi\Psi)^{\frac12}$

therefore
$\overline{bel}(x_t)=\eta e^{-L_t(x_t)}$
Now calculate $L_t(x_t)$ :
$\begin{aligned} L_t(x_t)&=L_t-L_t(x_{t-1},x_t)\\ &=...( contain x_{t-1} All items are deleted )\\ &=\frac12(x_t-B_tu_t)^TR_t^{-1}(x_t-B_tu_t)+\frac12\mu_{t-1}^T\Sigma_{t-1}^{-1}\mu_{t-1}-\frac12[A_t^TR_t^{-1}(x_t-B_tu_t)+\Sigma_{t-1}^{-1}\mu_{t-1}]^T(A_t^TR_t^{-1}A_t+\Sigma_{t-1}^{-1})[A_t^TR_t^{-1}(x_t-B_tu_t)+\Sigma_{t-1}^{-1}\mu_{t-1}] \end{aligned}$
Although this is not about $x_t$ The quadratic form of （ After formulation ）, But it's really a quadratic function , It just affects the previous coefficient .
Find a second derivative to get $x_t$ The mean and variance of ：
$\begin{aligned} \frac{\partial L_t(x_t)}{\partial x_t}&=R_t^{-1}(x_t-B_tu_t)-R_t^{-1}A_t(A_t^TR_t^{-1}A_t+\Sigma_{t-1}^{-1})^{-1}[A_t^TR_t^{-1}(x_t-B_tu_t)+\Sigma_{t-1}^{-1}\mu_{t-1}]\\ &=[\underline{R_t^{-1}-R_t^{-1}A_t(A_t^TR_t^{-1}A_t+\Sigma_{t-1}^{-1})^{-1}A_t^TR_t^{-1}}](x_t-B_tu_t)-R_t^{-1}A_t(A_t^TR_t^{-1}A_t+\Sigma_{t-1}^{-1})^{-1}\Sigma_{t-1}^{-1}\mu_{t-1} \end{aligned}$

By Sherman Morrison formula （ It proved that the fight was too slow , Tips ： Can pass $\begin{bmatrix} {A}&{B}\\ {C}&{D}\\ \end{bmatrix} \begin{bmatrix} {x_A}\\ {x_B}\\ \end{bmatrix} =\begin{bmatrix} {y_A}\\ {y_B}\\ \end{bmatrix}$ Find the block matrix $\begin{bmatrix} {A}&{B}\\ {C}&{D}\\ \end{bmatrix}$ The expression of the two inverse matrices of , Using the equality of two inverse matrices, we get .）
$R_t^{-1}-R_t^{-1}A_t(A_t^TR_t^{-1}A_t+\Sigma_{t-1}^{-1})^{-1}A_t^TR_t^{-1}=(R_t+A_t\Sigma_{t-1}A_t^T)^{-1}$

therefore
$\begin{aligned} \frac{\partial L_t(x_t)}{\partial x_t}&=(R_t+A_t\Sigma_{t-1}A_t^T)^{-1}(x_t-B_tu_t)-R_t^{-1}A_t(A_t^TR_t^{-1}A_t+\Sigma_{t-1}^{-1})^{-1}\Sigma_{t-1}^{-1}\mu_{t-1}\\ &=0 \end{aligned}$

obtain
$\begin{aligned} x_t&=B_tu_t+(R_t+A_t\Sigma_{t-1}A_t^T)R_t^{-1}A_t(A_t^TR_t^{-1}A_t+\Sigma_{t-1}^{-1})^{-1}\Sigma_{t-1}^{-1}\mu_{t-1}\\ &=B_tu_t+A_t(I+\Sigma_{t-1}A_t^TR_t^{-1}A_t)(I+\Sigma_{t-1}A_t^TR_t^{-1}A_t)^{-1}\mu_{t-1}\\ &=B_tu_t+A_t\mu_{t-1} \end{aligned}$
that
$\overline\mu_t=A_t\mu_{t-1}+B_tu_t$
$\overline\Sigma_{t}=[\frac{\partial^2 L_t(x_t)}{\partial x_t^2}]^{-1}=(A_t\Sigma_{t-1}A_t^T+R_t)$

Survey update

$bel(x_t)=\eta p(z_t|x_t)\overline{bel}(x_t)=\eta e^{-J_t}$
among
$J_t=\frac12(z_t-C_tx_t)^TQ_t^{-1}(z_t-C_tx_t)+\frac12(x_{t}-\overline\mu_{t})^T\Sigma_{t}^{-1}(x_{t}-\overline\mu_{t})$
To find the derivative, we get
$\begin{aligned} \Sigma_t&=(C^TQ_t^{-1}C_t+\overline{\Sigma}_t^{-1})^{-1}\\ \end{aligned}$
Because finding the minimum of a quadratic function , use $\mu_t$ Replace $x_t$ :
$C_t^TQ_t^{-1}(z_t-C_t\mu_t)=\overline{\Sigma}_t^{-1}(\mu_t-\overline{\mu}_t)$
$\begin{aligned} On the left &=C_t^TQ_t^{-1}(z_t-C_t\mu_t+C_t\overline{\mu}_t-C_t\overline{\mu}_t)\\ &=C_t^TQ_t^{-1}(z_t-C_t\overline{\mu}_t)-C_t^TQ_t^{-1}C_t (\mu_t-\overline{\mu}_t)\end{aligned}$
Dai Huide
$C_t^TQ_t^{-1}(z_t-C_t\overline{\mu}_t)=\Sigma_t^{-1}(\mu_t-\overline{\mu}_t)\\ \Sigma_tC_t^TQ_t^{-1}(z_t-C_t\overline{\mu}_t)=(\mu_t-\overline{\mu}_t)$
Make $K=\Sigma_tC_t^TQ_t^{-1}$ , call K Is the Kalman gain
$\mu_t-\overline{\mu}_t=K(z_t-C_t\overline{\mu}_t)$

$\begin{aligned} K_t&=\Sigma_tC_t^TQ_t^{-1}\\ &=\Sigma_tC_t^TQ_t^{-1}(C_t\overline{\Sigma}_tC_t^T+Q_t)(C_t\overline{\Sigma}_tC_t^T+Q_t)^{-1}\\ &=\Sigma_t(C_t^TQ_t^{-1}C_t\overline{\Sigma}_tC_t^T+C_t^TQ_t^{-1}Q_t)(C_t\overline{\Sigma}_tC_t^T+Q_t)^{-1}\\ &=\Sigma_t(C_t^TQ_t^{-1}C_t\overline{\Sigma}_tC_t^T+C_t^T)(C_t\overline{\Sigma}_tC_t^T+Q_t)^{-1}\\ &=\Sigma_t(C_t^TQ_t^{-1}C_t\overline{\Sigma}_tC_t^T+\overline{\Sigma}_t^{-1}\overline{\Sigma}_tC_t^T)(C_t\overline{\Sigma}_tC_t^T+Q_t)^{-1}\\ &=\Sigma_t(C_t^TQ_t^{-1}C_t+\overline{\Sigma}_t^{-1})\overline{\Sigma}_tC_t^T(C_t\overline{\Sigma}_tC_t^T+Q_t)^{-1}\\ &=\overline{\Sigma}_tC_t^T(C_t\overline{\Sigma}_tC_t^T+Q_t)^{-1}\\ \end{aligned}$

here , You can continue to simplify $\Sigma_t$
$\begin{aligned} \Sigma_t&=(C^TQ_t^{-1}C_t+\overline{\Sigma}_t^{-1})^{-1}\\ &=\overline{\Sigma}_t-\overline{\Sigma}_tC_t^T(Q_t+C_t\overline{\Sigma}_tC_t^T)^{-1}C_t\overline{\Sigma}_t\\ &=[(I-K_tC_t)]\overline{\Sigma}_t \end{aligned}$

Extended Kalman filter （EKF）

Considering the nonlinear case
$x_t=g(u_t,x_{t-1})+\varepsilon_t\\ z_t=h(x_t)+\delta_t$
Regarding this ,EKF Taylor expansion is proposed to retain the first derivative ：
$G_t=\frac{\partial g(u_t,x_{t-1})}{\partial x_{t-1}},x_{t-1}=\mu_{t-1}\\ H_t=\frac{\partial h(x_{t})}{\partial x_{t}},x_t=\overline{\mu}_t$
that
$g(u_t,x_{t-1})=g(u_t,\mu_{t-1})+G_t(x_{t-1}-\mu_{t-1})\\ h(x_t)=h(\overline{\mu}_t)+H_t(x_t-\overline{\mu}_t)$
similar KF, It can be proved that EKF as follows ：

$\begin{array}{l} 1:\overline{\mu}_t=g(u_t,\mu_{t-1}) \\ 2:\overline{\Sigma}_t=G_t\Sigma_{t-1}G_t^{T}+R_t\\ 3:K_t=\overline{\Sigma}_tH_t^T(H_t\overline{\Sigma}_tH_t^T+Q_t)^{-1}\\ 4:\mu_t=\overline{\mu}_t+K_t(z_t-h(\overline{\mu}_t))\\ 5:\Sigma_t=(I-K_tH_t)\overline{\Sigma}_t \end{array}$