top | item 46769373

(no title)

firasd | 1 month ago

When I see Dario say all this I wonder what he's seeing that seems so differentiated from my POV

I wonder how much of Anthropic's success is based on a horde of prompt to code examples that they post-train on. We shouldn't assume Opus' web dev skills are the same as general intelligence

Here's something Sonnet 4 did (Jul 2025) that I don't think a human intern would do when I made a wikisource tool (transcript follows)

#-#-#

let’s see how the wikisource tool wrks.. let’s try to match “Mary hath chosen that good part” in one of the bible sources they should have

Claude: Great idea! Let me first search for some biblical content on WikiSource, then use the content match tool.

wiki_source_search

Perfect! Let me try the Douay-Rheims version of Luke, which should have the Mary and Martha story:

wiki_source_page_content_match

No matches! Let me try a slightly different phrasing – maybe it’s “hath chosen the good part”:

wiki_source_page_content_match

Still nothing. Let me try searching for just “Martha” to see if the story is there:

dummy .. why would use kjv wording in a different edition lol

Claude: Haha you’re absolutely right! I was being completely silly – of course the KJV wording “Mary hath chosen that good part” wouldn’t be in the Douay-Rheims translation. Different translations use different language entirely.

What I Did Wrong: • Used King James Version phrasing in a Catholic translation • The Douay-Rheims says “Mary hath chosen the best part” • I was looking for “good part” (KJV wording)

Should Have Started With: Just searching for “Martha” or “Mary” to see what the actual text says in that translation, then used the exact wording I found.

#-#-#

It's almost comedic.. "Perfect! Let me try the Douay-Rheims version of Luke" is where if you were watching a horror movie you'd say "Don't go down to the basement fridge!"

And I think a human when they first see the text match fail would go "hmm--ohh" but Claude starts doing permutations. This I guess is the side effect of Reinforcement Learning and system prompts that amount to saying: "Just do it. Don't ask questions. Just do it."

discuss

johnfn|1 month ago

I find one-off anecdotal examples like this to be a bit like discourse around global warming - "Look at that ridiculous polar vortex we had this week! Global warming can't possibly be a thing!" Of course, a trend line comprises many points, and not every point falls perfectly in the center of the line! I'm not necessarily saying you are right or wrong, but your argument should address the line (and ideally give some reason why it might falter) rather than just a single point on that line.

firasd|1 month ago

Ah but I'm not arguing about the rate of change in the trend. I'm saying the signals are decoupled. That is to say an LLM can be as good as a programmer as Linus Torvalds without having even basic knowledge-generalization abilities we assume the median human with no specialized skills would have (when given the same knowledge an LLM has)

l1n|1 month ago

> Here's something Sonnet 4 did last year

Hate to be that gal but a lot has changed in the past year

root_axis|1 month ago

Not with respect to this particular type of failure.

tines|1 month ago

Last year was a month ago.

strange_quark|1 month ago

> When I see Dario say all this I wonder what he's seeing that seems so differentiated from my POV

Billions of dollars

unknown|1 month ago

[deleted]

jonas21|1 month ago

I have no idea what you are even asking Claude to do here.

firasd|1 month ago

I was asking it to see if the wikisource tools are working by looking up a Bible quote. There was no ambiguity about the task itself; what I'm saying is that Claude 'knows' a bunch of things (the Bible has different translations) that it doesn't operationalize when doing a task--issues that would would be glaringly obvious to a human who knows the same things