Last week there we had a customer request that landed in our support on a feature that I partially wrote and wrote a pile of public documentation on. Support engineer ran customer query through Claude (trained on our public and internal docs) and it very, very confidently made a bunch of stuff up in the response. It was quite plausible sounding and it would have been great if it worked that way, but it didn't. While explaining why it was wrong in a Slack thread with support engineer and another engineer who also worked on that feature, he ran Augment (that has full source code of the feature) which promptly and also very confidently made up more stuff (but different!). Some choice bleeding eye emojis were exchanged. I'm going to continue to use my own intelligence, thank you.
kristianp|2 months ago
varenc|2 months ago
pllbnk|2 months ago
Edit: I am saying it as a developer who is using LLMs for coding, so I feel that I can constructively criticize them. Also, sometimes the code actually works when I put enough effort to describe what I expect; I guess I could just write the code myself but the problem is that I don't know which way it will result in a quicker delivery.
computerthings|2 months ago
[deleted]
raincole|2 months ago
oersted|2 months ago
However using the model as a multi-hop search robot, leveraging it’s general background knowledge to guide the research flow and interpret findings, works exceedingly well.
Training with RL to optimize research tool use and reasoning is the way forward, at least until we have proper Stateful LLMs that can effectively manage an internal memory (as in Neural Turing Machines, and such).
ramraj07|2 months ago
Or did you just misuse basic terminology about LLMs and are now saying it misbehaved, likely because your org did something very bad with?
pshirshov|2 months ago
Even with your intelligence you would need years to deliver something like this: https://github.com/7mind/jopa
The outcome will be better for sure, but you won't do anything like that in a couple of weeks. Even if you have a team of 10. Or 50.
And I'm not an LLM proponent. Just being an empirical realist.
tomp|2 months ago
My code runs in 0.11s
Gemini's code runs in 0.5s.
Boss wants an explanation. ¯\_(ツ)_/¯
loloquwowndueo|2 months ago
brazukadev|2 months ago
scotty79|2 months ago
At some point you'll be better off with implementing features they hallucinated. Some people with public APIs already took this approach.
AdieuToLogic|2 months ago
> Yeah, LLMs are not really good about things that can't be done.
From the GP's description, this situation was not a case of "things that can't be done", but instead was the result of a statistically generated document having what should be the expected result:
empressplay|2 months ago
131hn|2 months ago
We humans grec our analysis/reasoning skills towards the 99.9999% failed attempts of everything we did, uncessfull trials and errors, wastefull times and frustrations.
So we know that behind a truth, there’s a bigger world of fantasy.
For LLM, everything is just a fantasy. Everything is as much true as it’s opposite. It will need a lot more than the truth to build intelligence, it will require controled malice and deceptions