top | item 46767886

(no title)

irl_zebra | 1 month ago

SOTA typically refers to achieving the best performance, not using the trendiest thing regardless of performance. There is some subtlety here. At some point an LLM might give the best performance in this task, but that day is not today, so an LLM is not SOTA, just trendy. It's kinda like rewriting something in Rust and calling it SOTA because that's the trend right now. Hope that makes sense.

discuss

order

famouswaffles|1 month ago

>Using an LLM is the SOTA way to turn plain text instructions into embodied world behavior.

>SOTA typically refers to achieving the best performance

Multimodal Transformers are the best way to turn plain text instructions to embodied world behavior. Nothing to do with being 'trendy'. A Vision Language Action model would probably have done much better but really the only difference between that and the models trialed above is training data. Same technology.

infecto|1 month ago

I don’t think trendy is really the right word and maybe it’s not state of the art but a lot of us in the industry are seeing emerging capabilities that might make it SOTA. Hope that makes sense.

irl_zebra|1 month ago

LLMs are indeed the definition of trendy (I've found using Google Trends to dive in is a good entry point to get a broad sense of whether something is "trendy")! Basically the right way to think about it is that something can be promising, and demonstrate emerging capabilities, but but those things don't make something SOTA, nor do they make it trendy. They can be related though (I expect everything SOTA was once promising and emerging, but not everything promising or emerging became SOTA). It's a subtlety that isn't super easy to grasp, but (and here is one area I think an LLM can show promise) an LLM like ChatGPT can help unpick the distinctions here. Still, it's slightly nuanced and I understand the confusion.