(no title)
scuol | 9 months ago
I asked it about a paper I was looking at (SLOG [0]) and it basically lost the context of what "slog" referred to after 3 prompts.
1. I asked for an example transaction illustrating the key advantages of the SLOG approach. It responded with some general DB transaction stuff.
2. I then said "no use slog like we were talking about" and then it gave me a golang example using the log/slog package
Even without the weird political things around Grok, it just isn't that good.
convivialdingo|9 months ago
I'll say that grok is really excellent at helping my understand the codebase, but some miss-named functions or variables will trip it up..
pomtato|9 months ago
is that even possible to disregard genrated token's selectively?
dahcryn|9 months ago
I think Gemini is just the only one that by default keeps the entire history verbatim.
aibrother|9 months ago
touristtam|9 months ago
With the recent article on how it was easily manipulated, I wouldn't be so confident it is uncensored, just that its bias is leaning into its owner's beliefs; which isn't great.
Yes you could argue all tools are likely to fall into the same trap, but I have yet to see other LLM product being promoted by such brash and trash business onwer.
voidspark|9 months ago
I tried your question with SuperGrok. Here's the result.
https://grok.com/share/bGVnYWN5_d298dd12-9942-411c-900c-2994...
I use Grok for similar tasks and usually prefer Grok's explanations. Easier to understand.
For some problems where I've asked Grok to use formal logical reasoning I have seen Grok outperform both Gemini 2.5 Pro and ChatGPT-o3. It is well trained on logic.
I've seen Grok generate more detailed and accurate descriptions of images that I uploaded. Grok is natively multimodal.
There is no single LLM that outperforms all of the others at all tasks. I've seen all of the frontier models strongly outperform each other at specific tasks. If I was forced to use only one, that would be Gemini 2.5 Pro (for now) because it can process a million tokens and generate much longer output than the others.
Gigachad|9 months ago
[deleted]
srmarm|9 months ago
bilbo0s|9 months ago
At this point, to use Grok, you'd be intentionally setting your startup to detonate itself at some random point in the future. That's just not how you make money.
HenryBemis|9 months ago
Then.. Do we want 'open' or 'curated' LLMs? And how far from reality are the curated LLMs? And how far can curated LLMs take us (black Nazis? female US founding fathers?).
Pick your poison I say.. and be careful what you wish for. There is no "perfect" LLM because there is no "perfect" dataset, and Sam-Altman-types-of-humans are definitely deeply flawed. But life is flawed, so our tools are/will be flawed.