top | item 47087325

(no title)

Etheryte | 11 days ago

It is incredibly fast, on that I agree, but even simple queries I tried got very inaccurate answers. Which makes sense, it's essentially a trade off of how much time you give it to "think", but if it's fast to the point where it has no accuracy, I'm not sure I see the appeal.

discuss

order

andrewdea|11 days ago

the hardwired model is Llama 3.1 8B, which is a lightweight model from two years ago. Unlike other models, it doesn't use "reasoning:" the time between question and answer is spent predicting the next tokens. It doesn't run faster because it uses less time to "think," It runs faster because its weights are hardwired into the chip rather than loaded from memory. A larger model running on a larger hardwired chip would run about as fast and get far more accurate results. That's what this proof of concept shows

Etheryte|11 days ago

I see, that's very cool, that's the context I was missing, thanks a lot for explaining.

kaashif|11 days ago

If it's incredibly fast at a 2022 state of the art level of accuracy, then surely it's only a matter of time until it's incredibly fast at a 2026 level of accuracy.

PrimaryExplorer|11 days ago

yeah this is mindblowing speed. imagine this with opus 4.6 or gpt 5.2. probably coming soon

Gud|11 days ago

Why do you assume this?

I can produce total jibberish even faster, doesn’t mean I produce Einstein level thought if I slow down

scotty79|11 days ago

I think it might be pretty good for translation. Especially when fed with small chunks of the content at a time so it doesn't lose track on longer texts.