Kalman Filter Explained Simply

[+] girzel|2 years ago|reply

No thread on Kalman Filters is complete without a link to this excellent learning resource, a book written as a set of Jupyter notebooks:

https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt...

That book mentions alpha-beta filters as sort of a younger sibling to full-blown Kalman filters. I recently had need of something like this at work, and started doing a bunch of reading. Eventually I realized that alpha-beta filters (and the whole Kalman family) is very focused on predicting the near future, whereas what I really needed was just a way to smooth historical data.

So I started reading in that direction, came across "double exponential smoothing" which seemed perfect for my use-case, and as I went into it I realized... it's just the alpha-beta filter again, but now with different names for all the variables :(

I can't help feeling like this entire neighborhood of math rests on a few common fundamental theories, but because different disciplines arrived at the same systems via different approaches, they end up sounding a little different and the commonality is obscured. Something about power series, Euler's number, gradient descent, filters, feedback systems, general system theory... it feels to me like there's a relatively small kernel of intuitive understanding at the heart of all that stuff, which could end up making glorious sense of a lot of mathematics if I could only grasp it.

Somebody help me out, here!

[+] ndriscoll|2 years ago|reply

Incidentally this is why people miss the mark when they get mad about mathematicians using single letter variable names. Short names let you focus on the structure of equations and relationships, which lets you more easily pattern match and say "wait, this is structurally the same as X other thing I already know but with different names". It's not about saving paper or making it easier to write (it is not easier to write Greek letters with super/subscripts in LaTeX using an English keyboard than it would be to use words). It is about transmitting a certain type of information to the reader that is otherwise very difficult to transmit.

While it uses letters so it looks vaguely like writing, math notation is very pictorial in nature. Long words would obscure the pictures.

[+] duped|2 years ago|reply

You're looking for the theory of linear (or nonlinear) dynamical systems. Unfortunately it's not one kernel of intuition backed by consistent notation, it's many with no consistency. A good course on controls and signals/systems will beat those intuitions into you and you learn the math/parlance without getting attached to any one notational convention.

The real intuition is "everything is a filter." Everything else is about analysis and synthesis of that idea.

[+] bonoboTP|2 years ago|reply

Maybe check out Probabilistic Robotics by Dieter Fox, Sebastian Thrun, and Wolfram Burgard. It has a coherent Bayesian formulation with consistent notation on many Kalman-related topics. Also with the rise of AI/ML, classic control theory ideas are being merged with reinforcement learning.

[+] thundercarrot|2 years ago|reply

If Q and R are constant (as is usually the case), the gain quickly converges, such that the Kalman filter is just an exponential filter with a prediction step. For many people this is a lot easier to understand, and even matches how it is typically used, where Q and R are manually tuned until it “looks good” and never changed again. Moreover, there is just one gain to manually tune instead of multiple quantities Q and R.

[+] plasticchris|2 years ago|reply

Hey, I had very similar thoughts many years ago! The trick is yes, many filters boil down to alpha/beta, and the kalman filter is (edit: can be) really a way to generate those constants given a (linear) model (set of equations describing the dynamics, ie the future states) and good knowledge of the noise (variance) in the measurements. So if the measurements always have the same noise it will just reduce the constants over time, and it is only really useful when the measurement accuracy can be determined well and also changes a lot.

[+] ActorNightly|2 years ago|reply

When you start dealing with linear systems and disturbances, you end up with basically matrix math and covariance in some form and way.

The thing about Kalman filter is that its a pretty well known and exists in many software packages (just like PID) so its fairly easy to implement. But because noise is often not gaussian, and systems are often not linear, its more of a "works well enough" for most applications.

[+] bbstats|2 years ago|reply

there is no better smoother than a future predictor. I'm not entirely sure what the issue is here.

[+] chubs|2 years ago|reply

Recently tasked with implementing a Kalman filter, I found it very very difficult to find good resources that explained it in language that made sense to a developer like me. So after spending a month learning it, I wrote a couple posts on it, perhaps someone might find it helpful?

https://www.splinter.com.au/2023/12/14/the-kalman-filter-for...

https://www.splinter.com.au/2023/12/15/the-kalman-filter-wit...

As a developer I found the maths made sense only after implementing it, ironically. I guess we learn by building on top of what we already know? Is there a term for that?

[+] Subdivide8452|2 years ago|reply

Off topic, but why are you one of those people who do a full screen “subscribe to my mailing list” overlay? I never understand why you’d wanna do that.

[+] foobarbecue|2 years ago|reply

I've always thought math would be much easier to learn if they used descriptive variable names. Or, at least in an interactive medium like the web, add some tooltips at a bare minimum. Whenever I study math I spend 90% of the time looking up the symbols.

Also when this person says the subscript "denotes the order of the measurement" I'm trying to figure out what kind of order he's talking about. I guess that's the index? It's been a while since I did kalman filters:-p

[+] shiandow|2 years ago|reply

People always seem to forget that mathematical notation is designed to make algebraic manipulations easier to follow. It's not really intended as something that makes sense on its own, it's mostly physicists who think something like E=mc^2 should have any meaning.

The more pure the mathematics the shorter the scope of most variables. Typically a variable is defined just before it's used, with a scope no longer than the proof or derivation it's in.

Also some of the choices in this article are just plain silly. Such as using P as both variable and index, and then use it for the covariance matrix when the precision matrix is the exact inverse.

[+] gromneer|2 years ago|reply

> Also when this person says the subscript "denotes the order of the measurement" I'm trying to figure out what kind of order he's talking about.

I hate that the most when reading papers. Authors trying to sound abstract and academic, but only accomplishing being frustratingly vague. AUTHORS YOU STILL HAVE TO INSERT THE SUBJECT INTO YOUR SENTENCES FOR THEM TO MAKE SENSE.

I'm so frustrated at this aspect in research papers more than anything else. You must disambiguate. Use absolute descriptors and do not use relative descriptors. Don't tell me to look right, because I'll look left. Use absolute descriptors! "then after spinning the prism the light cone blah blah blah" SPIN!? SPIN IN WHAT DIRECTION????? LEFT?RIGHT?! LATERAL? UP? DOWN???? How fast? How slow? You imagine all of these CRITICAL ASPECTS in your head when writing such ambiguous sentences, but the reader cannot read your mind.

[+] max_|2 years ago|reply

I completely agree. And my insight similar to yours is that the greatest math book that no one has written is one where the meaning of notation, variables and a clear assortment of theorems across all topics are well curated.

[+] pinkmuffinere|2 years ago|reply

> Also when this person says the subscript "denotes the order of the measurement" I'm trying to figure out what kind of order he's talking about. I guess that's the index? It's been a while since I did kalman filters:-p

The order referred to is the index-in-time that a value correspond to. Eg, x_3 would be the state at the third time step. I think their subscript “p” stands for prediction. x_p at time 3 is the state we expect at time 4. But then when time 4 comes around, we incorporate new measurements and calculate x_4 including that new information. Just to be explicit, this x_4 will be different from the x_p we calculated at time 3, as our prediction is always a bit incorrect

[+] foofie|2 years ago|reply

> I've always thought math would be much easier to learn if they used descriptive variable names.

I think the variable names are already picked to be descriptive. No one is picking them to be more obscure or harder to track. The problem is that those who are starting out still haven't picked up concepts or learned the standard notations for each problem domain, thus we are left with the pain of ramping up on a topic.

[+] unknown|2 years ago|reply

[deleted]

[+] malkia|2 years ago|reply

Check https://mitpress.ublish.com/ebook/structure-and-interpretati...

[+] o11c|2 years ago|reply

> trying to figure out what kind of order

For reference, the Wikipedia page "Order (mathematics)" is a disambiguation page almost as long as the top-level disambiguation page "Order".

https://en.wikipedia.org/wiki/Order_(mathematics)

I generally don't have a problem with variable names, but creating syntax and terminology that conflicts with other mathematical use is a real problem.

[+] Waterluvian|2 years ago|reply

This was 90% of my problem with math.

Same when something is named descriptively: shield volcano, star dunes, vs. some person’s name like Rayleigh scattering.

It’s just an extra layer to memorize and parse.

[+] colechristensen|2 years ago|reply

I've always thought that code would be much easier to understand with shorter, less descriptive variable names. Whenever I look at new code most of the confusion involves searching through layers of abstraction for the part that actually does the thing as opposed to the layer upon layer of connections between abstractions which would be much less necessary if the entire behavior could be encoded in a single line. You can only have a small number of descriptive variables in an expression before it becomes entirely unreadable. That is opposed to single character with sub/superscripts where you can easily see what's happening with tens of variables in a single line of math.

https://wikimedia.org/api/rest_v1/media/math/render/svg/a7d2...

Here's a formula for calculating the downstream Mach number in a certain kind of supersonic flow. I cannot imagine any way to write this in "descriptive variables" which makes the formula understandable at all, you just could not see the structure. (from https://en.wikipedia.org/wiki/Oblique_shock )

[+] shiandow|2 years ago|reply

Kalman filters might be one of those weird cases in mathematics where the 'simple' version is simplified beyond all recognition.

I mean what you're really doing is take a measurement then simulate the possible future states and combine this information with the next measurement and repeat.

You can imagine e.g. taking multiple pictures of a tennis-ball, estimate its position and speed from the first picture, simulate where it's going to end up, and compare this with the next picture to see which estimate is closer to the truth. Or more old school, measure the inclination of the sun and compare the resulting line of possible locations on a map to the spot you thought you were.

Of course the exact calculations are beyond impractical. So you use sampling to simplify. However that still makes it difficult so you assume the distribution is somewhat close to a Gaussian distribution. And then you simplify even more by assuming the evolution of the system is just a linear transformation. And that's how you end up with the Kalman filter discussed here.

I'd be amazed if anyone could really understand what's going on just based on the linear algebra.

[+] foobarian|2 years ago|reply

I don't know what it is about the Kalman filter but so many explanations including the OP have this format: "It's very simple! <complicated obtuse explanation listing the computation steps>"

Your comment is the first I've seen actually providing intuition about what is happening. It doesn't help perhaps that the name itself is misleading as heck to computer people like me: it's not a filter as in stream processing or SQL.

[+] jvanderbot|2 years ago|reply

It's simpler than that. The linear algebra is actually easier.

The kalman filter tries to guess the hidden input that produced the measurements. It does so forming the minimization problem:

'minimize over x, the function [ actual_measurement - expected_measurement(x) ]^2/s^2', here 's' is sigma of noise.

This follows from the state estimation problem:

'maximize over x, the likelihood of seeing the actual_measurement', because the only term that matters in the likelihood function is -([x-expected(x)]/s )^2. (look at the exponent in the Normal distribution, or any exponential distribution really).

'actual_measurement' is a constant, so if it happens that the function 'expected_measurement' is linear, this is trivially solved directly as a convex optimization, and if you take derivative, equate to zero, and solve, you'll get the kalman filter update step.

If it so happens that the function is non-linear, well we just make a single netwton-rhapson step by linearizing the equation, minimizing, and returning the solution to the "pretend linearization".

This is basic calc + linear algebra at an undergrad level, but nobody bothered to tell you that.

---

It's also completely wrong. It's a hack from the 60s to maximize the likelihood function using a recursive, single-step linearization like this. A misreading of the Cramer Rao Lower Bound has "proven" to generations of engineers that this is optimal. It's not, not really.[^1]

Nowadays we have 10,000x more compute, and any one of the following _will_ produce better performance:

* Forming and solving the non-linear equation using many newton-rhapson steps

* Keeping a long history of measurements, and solving using many newton-rhapson steps over this batch

* Using sum-of-guassian representation to accomodate multi-modal measurement functions, esp when including the prior bullets

All of these were well covered by state estimation research from 80s to now, but again, the textbooks seem to be written in stone in 1972.

[^1]: (the cramer rao lower bound is only defined when all measurement likelihood functions are linearized at the true state - which is only possible asymptotically in a batch which preserves all the measurements - and not possible before time infinity and not possible with recursive filter)

[+] nayuki|2 years ago|reply

I like this previous explanation: https://www.bzarg.com/p/how-a-kalman-filter-works-in-picture... , https://news.ycombinator.com/item?id=13449229

[+] Moduke|2 years ago|reply

I enjoyed the simplicity of this explanation as well:

https://praveshkoirala.com/2023/06/13/a-non-mathematical-int...

https://news.ycombinator.com/item?id=36971975

[+] 01HNNWZ0MV43FF|2 years ago|reply

If I really needed a Kalman filter I'm sure I could read this, or the Wikipedia page, or an implementation's source code (https://github.com/LdDl/kalman-rs/blob/master/src/kalman/kal...) and figure it out.

But IME everyone in the entire world is a "visual learner" who learns best by examples. So I'm surprised that the tutorial midway through the page doesn't put any example numbers into the formulas (maybe I glanced over it?) and the pictures only start after a page of "what is a Kalman filter" text, and the pictures are just of more formulas.

[+] the__alchemist|2 years ago|reply

Another comment pointed out variable naming conventions as an obstacle to learning and understanding mathematical topics. I am sympathetic to that perspective, but even more so to this one you post. I am astounded by how common this is. A weaker form of this exists in software libraries that don't include code examples.

[+] syntaxing|2 years ago|reply

One thing that clicked for me is that two uncertain distribution of measurements (a high variance in the distribution) makes a “more certain” measurement (a narrower distribution). Use this more certain measurement and combine it with the next measurement, then rinse and repeat and boom, you have a Kalman filter.

[+] cocostation|2 years ago|reply

I really like this set of videos for explaining the KF. I 'got it' more than I did with the material on the original post.

https://www.youtube.com/watch?v=CaCcOwJPytQ

[+] pfdietz|2 years ago|reply

My late father used these all the time during his career, starting about the time they were invented. He worked on radar and missile guidance systems.

[+] lkdfjlkdfjlg|2 years ago|reply

You mean, actual engineering.

[+] ModernMech|2 years ago|reply

Close your eyes and walk around for a while. Imagine where you are. Now open your eyes. Is your actual location different from where you thought you were?

That last bit, using an observation to update a belief on a state variable, that's what the Kalman filter does.

[+] visarga|2 years ago|reply

Simply put, it is an "online" model that which means that it learns on the fly. Specifically, it is the optimal online learning algorithm for linear systems with Gaussian noise.

In a way it is like a primitive RNN, it has internal state, inputs and outputs.

[+] unknown|2 years ago|reply

[deleted]

[+] Symmetry|2 years ago|reply

I've used them in robotics and for tracking satellites going overhead via radar. Apparently they're also used by economists for guessing the state of the economy, along with other filters in the standard robotics toolkit.

[+] bafe|2 years ago|reply

Ensemble kalman filter (and similar techniques like variational assimilation) are also used heavily in the geosciences to assimilate measurements and model data in order to obtain a "posterior observation" which can be understood intuitively as an interpolation between model and observation weighted by their relative uncertainty (and covariance)

[+] mesofile|2 years ago|reply

Purely idle curiosity – I've heard a lot about the Kalman filter over the years, it's a popular subject here, but what are the other filters in the standard robotics toolkit?

[+] eclectic29|2 years ago|reply

Genuine question: why does kalman filter come up so frequently on HN? Is this something I'm missing? I'm a machine learning engineer, not a data scientist.

[+] scarmig|2 years ago|reply

ML and Kalman filtering try to solve similar problems.

Some theories even suggest that Kalman filtering (or a similar algorithm) provides a basis for neurobiological learning. See predictive coding (e.g. https://arxiv.org/pdf/2102.10021.pdf)

(Why I'm interested in it, at least.)

[+] xchip|2 years ago|reply

It is considered a difficult topic and people want to show they understand it.

A similar thing happens that had the word quantic or relativistic. I'm a physicist and we hardly talk about it, but here in HN we find people bringing in up every other day

89 comments