Kalman Filter Basics
The Information about Kalman Filters here can be found in Chapter 1 of Model Predictive Control: Theory, Computation, and Design
System Description
We assume that the system whose state we are trying to estimate can be modeled as follows
This is a linear time-invariant system that is affected by noise from random variables $w\in\mathbb{R}^g$ and $v\in \mathbb{R}^p$. The matrix $G\in\mathbb{R}^{n\times g}$ allows for additional tuning between the relationship between noise $w$ and system state.
Kalman Filter Derivation
Linear Optimal Derivation
Normal Distribution Identities
Before we get into our derivation of the Kalman filter, we are going to introduce a couple of identities about normal distributions that will be useful for our derivation.
Joint Independent Normals
Given $x\sim\mathcal{N}(\mu_x, C_{x})$ and $y\sim\mathcal{N}(\mu_y, C_{y})$ are statistically independent normal rendom variables, the joint density is given as follows.
Linear Transformation of a Normal
Given $x\sim\mathcal{N}(\mu_x, C_{x})$ and the value $y=Ax$ is also a random variable sampled from the following distribution \(y\sim\mathcal{N}(A\mu_x, AC_xA^T)\)
Conditional of a Joint Normal
Given jointly distributed $x,y$ the conditional density of $x\vert y$ is given as follows
Our estimation of $\hat{x}_t$ starts at time $t=0$ with an initial guess for the mean and variance of our state’s normal distribution.
Derivation
In practice, we often do not have access to information about $x_0$ so it is common to initialize $x_0$ to zero and $Q_0$ to some large matrix to represent our uncertainty about $x$.
Given our initial estimate, we can further refine our estimate with measurement information $y_0$
Here we are neglecting control input $u$ for simplicity however, all results derived here can still be applied to systems with nonzero $u$ provided one subtracts $Bu$ and $Du$ from the $x$ and $y$ respectively.
Given $x_0$ and $y_0$, we would like to estimate the normal distribution of $x_{0}^{+}$ given $x_0$ and $y_0$. To do this, we can start by defining the relationship between $x_{0}^{+}, y_0, x_0, v_0$.
The variable $\begin{bmatrix}x_0 \ v_0 \end{bmatrix}$ is itself a random variable which comes from the following distribution
Using the definition from Linear Transformation of a Normal, we can define the joint distribution for $\begin{bmatrix}x_{0}^{+} \ y_0 \end{bmatrix}$ as follows
We would like to go from this joint distribution to the conditional distribution for $x_{0}^{+}\vert y_0$ which we can do by applying the identity for the Conditional of a Joint Normal to get the following
The definitions for $\mu_{0}^{+}$ and $Q_{0}^{+}$ are defined below.
Once we have our refined distribution estimation for $x_{0}^{+}$, we can use it to make an estimate for $x_1$. We can relate $x_1$ to $x_0$ as follows
$x_1$ is a linear transformation of the normal variable $[x_0^+ w_0]^T$ meaning that we can use our Linear Transformation of a Normal identity to define the normal distribution for $x_1$ as follows
Now our updated mean and variance estimate for our state $x_1$ are as follows
Using this $\mu_1$ and $Q_1$, we can restart this process as we did with $\mu_0$ and $Q_0$ and and estimate $\mu_2, Q_2, \mu_3, Q_3, \ldots$
Kalman Filter Steps
Here we highlight the main steps of the Kalman Filter that can be repeated to estimate the normal distribution of $x_t$.
Initial estimate of $\mu_0, P_0$
Typically we have no prior information about $x_0$ so a common chosen initial distribution is a zero-mean normal distribution with a large covariance to indicate uncertainty as follows:
\[\begin{matrix} x_0\sim \mathcal{N}(\mu_0,P_0)\\ \mu_0=0\\ P_0=\alpha I\textbf{ for }\alpha\gg 0 \end{matrix}\]Calculation of $\mu_{k}^{+}, P_{k}^{+}$
Given a normal distribution estimate of $x_k\sim \mathcal{N}(\mu_k, P_k)$ we refine this estimated distribution by integrating measurement $y_k$.
The distribution is as follows
Further definitions are below
Larger variance in $R$ corresponds to a lack of confidence in our sensors and leads to smaller update step size. Smaller variance relates to greater confidence in sensors and a larger update step size. Larger variance in $P$ corresponds to a lack of confidence in our state estimation and leads to larger update step size. Smaller variance in $P$ corresponse to a smaller update in step size.
Propagation of State forward to Estimate $\mu_{k+1}, P_{k+1}$
Using our refined normal distribution, we can step our estimate forward to $k+1$ to get the following distributions
\[x_{k+1}\sim\mathcal{N}(\mu_{k+1}, P_{k+1})\]Further definitions are defined here
\[\begin{matrix} \mu_{k+1}=A\mu_{k}^{+} \\ P_{k+1}=AP_{k}A^{T}+GQG^T \end{matrix}\]The above formulation of the Kalman Filter has the control $u$ removed to slightly simplify the notation. To write a Kalman Filter that takes into account controls requires us to modify the update of $\mu_t$ in our Update Step
\[\mu_k^{+}=\mu_k + L_k[y_k^{+}-(C\mu_k - Du_k)]\]
As well as a modification in the belief propagation step
\[\mu_{k+1}=A\mu_k^{+}+Bu_k\]
Code Example
Below is a simple implementation of the Kalman Filter in Python for reference
class KalmanFilter:
def __init__(self, x_0, P_0, A, C, Q, R, G):
# Set Current State
self.curr_mu = x_0
self.curr_P = P_0
# Initialize State Transition and Probability Matrices
self.A = A
self.C = C
self.Q = Q
self.R = R
self.G = G
def integrate_observation(self, obs):
cpr_mat = self.curr_P @ self.C.T @
np.linalg.inv(self.C @
self.curr_P @ self.C.T + self.R)
L_k = cpr_mat
self.curr_P = self.curr_P - cpr_mat @ self.C @ self.curr_P
self.curr_mu = self.curr_mu + L_k @ (obs - self.C @ self.curr_mu)
def step_state_forward(self):
self.curr_mu = self.A @ self.curr_mu
self.curr_P = self.A @ self.curr_P @
self.A.T + self.G @ self.Q @ self.G.T
def get_state_estimate(self):
return self.curr_mu