top | item 38640470

(no title)

> I see no reason to see why you should expect it to work in harder, nonlinear settings

I'm not so sure about this, maybe this is where the ML approach could outperform (in terms of estimation accuracy, not compute time) the traditional EKF and UKF approaches, by learning the nonlinear system dynamics?

This sounds very hand-wavy, and it is, because of my lack of understanding. For me it is just not immediately clear that if an optimal algorithm for the linear case cannot be matched or outperformed, that is also necessarily the case for nonlinear dynamics.

EDIT: And as mentioned above, the KF is optimal if certain conditions hold, e.g. additive, zero-mean, Gaussian noise on state dynamics and observation. In reality, you may have a multiplicative component of the noise nor non-zero mean or fancy noise distributions, and it would be interesting to see if these can be learned.

discuss

namibj|2 years ago

Yeah, real world is messy. Also, the contribution/influence of ancient state in softmax is something the controller can learn, especially with a task-suitable position encoding. Though I'd not be surprised if what's IIUC called polynomial attention (essentially truncated Taylor series "FIR", just truncated later than the traditional linear convolutional time-series filter) where you do bounded-exponent non-linear (but IIUC still FFT-based, or at least, similar) response rather than infinite-exponent softmax, turns out to be more suitable.

And beyond that, a hierarchical controller: exploit tight feedback loop with a small controller, supervised, controlled, and managed by the big one that has some inference latency and would like to be batched somewhat (e.g., think a casual transformer trained to predict more than just one token into the future).