(no title)
brosco | 2 years ago
The statement that the Kalman filter is mean-square optimal because it generates a correct estimate in expectation is false. In fact, any gain L will generate an estimate whose expected value is x_k, as long as w_k and v_k are zero mean. The Kalman gain is a specific choice of L that is optimal in the mean square sense only when the disturbances are Gaussian. The Kalman gain is also time-varying and depends on the evolution of the estimate covariance, although it will converge to a steady-state value.
What's being described here is more properly called a Luenberger observer, but I guess that name doesn't get the same recognition outside the control community.
I'm also wondering why they chose to include H past estimates and measurements in the transformer. They're already embedding the Kalman gain into the weights of the transformer, so taking just one past estimate/measurement should exactly recover the Kalman filter. Going further into the past just makes the estimate worse, because of the softmax.
gautamcgoel|2 years ago
Regarding your second point: yes, when H = 1 we just recover the standard Kalman Filter, and yes, when H grows large the estimate gets worse and worse, in the sense that the softmax nonlinearity includes more and more irrelevant data from the past in the estimate. The point is that in real-world problems, which are usually messy and nonlinear, we probably want H - the so called context length - to be large, because then we can take advantage of information we collected in the past to help improve decisions in the present. It just so happens that in the special case when the system is linear, this is more harmful then helpful. Here is one way to think about our result: imagine you have a Transformer which takes as input K-dimensional embeddings and context length H. You want to use this Transformer for filtering in some dynamical system. The most basic question you could ask is: if the system is linear, can you do Kalman Filtering? In other words, in the easy, linear scenario, can you match the optimal algorithm? If the answer is no, I see no reason to see why you should expect it to work in harder, nonlinear settings. We show that the answer is yes, when the system you want to filter in has roughly sqrt(K) states, and you design the embeddings appropriately. Hopefully this preliminary result will lead to a better understanding of how deep learning can improve control in the hard, nonlinear scenario.
donquichotte|2 years ago
I'm not so sure about this, maybe this is where the ML approach could outperform (in terms of estimation accuracy, not compute time) the traditional EKF and UKF approaches, by learning the nonlinear system dynamics?
This sounds very hand-wavy, and it is, because of my lack of understanding. For me it is just not immediately clear that if an optimal algorithm for the linear case cannot be matched or outperformed, that is also necessarily the case for nonlinear dynamics.
EDIT: And as mentioned above, the KF is optimal if certain conditions hold, e.g. additive, zero-mean, Gaussian noise on state dynamics and observation. In reality, you may have a multiplicative component of the noise nor non-zero mean or fancy noise distributions, and it would be interesting to see if these can be learned.
brosco|2 years ago
johntiger1|2 years ago