top | item 46130868

(no title)

shridharxp | 3 months ago

Few months ago, the founder was talking about "AGI" and ridiculous universal basic compute. At this point, I don't even know whom to believe. My first hand experience tells ChatGPT and even ClaudeCode are no where near the expertise they are touted to be. Yet, the marketing by these companies is so immense that you get washed away, you don't know who are agents and who are putting their true opinions.

discuss

fragmede|3 months ago

> My first hand experience tells ChatGPT and even ClaudeCode are no where near the expertise they are touted to be

Not doubting you, but where specifically have the latest models fallen short for you?

shridharxp|2 months ago

ClaudeCode:

- Making functions async without need; it doesn't know the difference between the two or in which scenarios to use them.

- Consistently fails to make changes to the frontend if a project grows above 5000 LOC or a file goes near 1000 LOC.

- The worst part is it lies after making changes.

ChatGPT:

- Fails to implement mid-complex functionality such as scrolling to the bottom when new logs are coming in and not scrolling when the user is checking historical logs.

These models are good at mainstream tasks, the snippets of which you find a lot in repositories. Try to do something off-beat such as algorithmic trading; they fail spectacularly.

rurp|2 months ago

I'm unsure how someone could use LLMs regularly and not encounter significant mistakes. I use them a lot less than some devs and still run into basic errors pretty often, to the point that I rarely bother using them for niche or complicated problems even though they are pretty helpful in other cases. Just in the past few days I've had Claude trip all over itself on multiple basic tasks.

One case was asking how to do a straightforward thing with a popular open source JavaScript library, right in the sweet spot of what models should excel at. Claude's whole approach was completely broken because it relied on a hallucinated library parameter that didn't exist and didn't have an equivalent. It invented a keyword that doesn't appear in the entire open source library repo, to control functionality the library doesn't have.