(no title)
sireat | 6 months ago
Add in multimodality, 1M context and it is such a Swiss army knife.
It is cheap and performant enough to run 100k queries. (Took a bit over a day and cost around 30 Euros for a major document classification task). Yes in theory this could have been done with fine-tuned BERT or maybe even with some older methods but it saved way too much time.
There is another factor that may explain why Flash is #1 in most categories on OpenRouter - Flash has gotten reasonably decent at less common human languages.
Most cheap (including Flash Lite) and local models mostly have English focused training.
karmakaze|6 months ago
> Grok I forgot about until it was too late.
I was surprised by how much I prefer Grok to others. Even its persona is how I prefer it, detailed without volunteering unwanted information or sycophanty. In general I'd use Grok-3 more than 4 which is good enough for common uses.
I suspect that Claude would be best, only if I gave it a long complex task with enough instructions up front so it could grind away on it while I was doing something else and not waiting on it.
vjerancrnjak|6 months ago
sireat|6 months ago
The job was set on Friday and ready on Monday. On average it was about 5k tokens (documents ranging from 1k to 200k in size) and only about 10 tokens out.
Average response was about 1.5 seconds ~ 40 hours for full set.
I really did some heavy prompt testing to limit output.
Even then every few thousand queries you'd get some double token responses. That is Gemini would respond in duplicate - ie Daisy Daisy.