top | item 44032547

(no title)

scuol | 9 months ago

It still seems to have the problems most other LLMs suffer with except Gemini: it loses context so quickly.

I asked it about a paper I was looking at (SLOG [0]) and it basically lost the context of what "slog" referred to after 3 prompts.

1. I asked for an example transaction illustrating the key advantages of the SLOG approach. It responded with some general DB transaction stuff.

2. I then said "no use slog like we were talking about" and then it gave me a golang example using the log/slog package

Even without the weird political things around Grok, it just isn't that good.

[0] https://www.vldb.org/pvldb/vol12/p1747-ren.pdf

discuss

order

convivialdingo|9 months ago

When I use the "think" mode it retains context for longer. I tested with 5k lines of c compiler code and I could 6 prompts in before it started forgetting or generalizing

I'll say that grok is really excellent at helping my understand the codebase, but some miss-named functions or variables will trip it up..

pomtato|9 months ago

not from a tech field at all but would it do the context window any good to use "think" mode but discard them once the llm gives the final answer/reply?

is that even possible to disregard genrated token's selectively?

dahcryn|9 months ago

it also doesn't help that many of these companies tend to either limit the context of the chat to the 10 most recent messages (5 back and forths), or rewrite the history summarized in a few sentences. Both ways lose a ton of information, but you can avoid that behaviour by going through the APIs. Especially Azure OpenAI et... on the web is useless, but it's quite capable through custom APs

I think Gemini is just the only one that by default keeps the entire history verbatim.

aibrother|9 months ago

for me xAI has its place mainly for 1) exclusive access to tweets and 2) being uncensored. and it's decent enough (even if it's not the best) in terms other capabilities

touristtam|9 months ago

> being uncensored

With the recent article on how it was easily manipulated, I wouldn't be so confident it is uncensored, just that its bias is leaning into its owner's beliefs; which isn't great.

Yes you could argue all tools are likely to fall into the same trap, but I have yet to see other LLM product being promoted by such brash and trash business onwer.

voidspark|9 months ago

The paid version "SuperGrok" has a larger context window, but nothing beats Gemini for that.

I tried your question with SuperGrok. Here's the result.

https://grok.com/share/bGVnYWN5_d298dd12-9942-411c-900c-2994...

I use Grok for similar tasks and usually prefer Grok's explanations. Easier to understand.

For some problems where I've asked Grok to use formal logical reasoning I have seen Grok outperform both Gemini 2.5 Pro and ChatGPT-o3. It is well trained on logic.

I've seen Grok generate more detailed and accurate descriptions of images that I uploaded. Grok is natively multimodal.

There is no single LLM that outperforms all of the others at all tasks. I've seen all of the frontier models strongly outperform each other at specific tasks. If I was forced to use only one, that would be Gemini 2.5 Pro (for now) because it can process a million tokens and generate much longer output than the others.

Gigachad|9 months ago

[deleted]

srmarm|9 months ago

Be careful saying things like that or you'll get [flagged] - discussion of what seemed an incredibly important subject is forbidden on here it seems.

bilbo0s|9 months ago

You never know when it will start spouting it either. That kind of uncertainty in the responses landing in your interface is just not sustainable. Your money is coming from the quality of the content your system is putting out. If it's being used for dentistry, and it randomly spits out white supremacist content, dentists will look for a system that won't do that. Because they asked about, say, intaglio surfaces for a wearable dental appliance. Not a treatise on white genocide.

At this point, to use Grok, you'd be intentionally setting your startup to detonate itself at some random point in the future. That's just not how you make money.

HenryBemis|9 months ago

So.. If the 'source' of data is 9gag, 4chan, you will get 'this' material. If you feed it Tumlr, you will get Harry Potter and rope-porn-thingies. If you feed it Hitler's speeches, you will get 'that' material. If you feed it algebra, you will get 'that' material.

Then.. Do we want 'open' or 'curated' LLMs? And how far from reality are the curated LLMs? And how far can curated LLMs take us (black Nazis? female US founding fathers?).

Pick your poison I say.. and be careful what you wish for. There is no "perfect" LLM because there is no "perfect" dataset, and Sam-Altman-types-of-humans are definitely deeply flawed. But life is flawed, so our tools are/will be flawed.