top | item 46711759

(no title)

pflenker | 1 month ago

For a game like anchorhead, which is famous in its niche, shouldn’t Claude already know it sufficiently to just solve it right away? I would expect that its data source contained multiple discussions and walkthroughs of the game.

discuss

order

zetalyrae|1 month ago

I expect it's somewhere in the training data, but it's very unlikely to be salient. A few textfiles here and there in the ocean of the Internet is nothing. If Claude had memorized the walkthrough, it would have performed better.

vunderba|1 month ago

I would think so. I'd be far more interested in a comparison of LLMs (no internet search allowed) playing against IF games released in the past month.

ratg13|1 month ago

It's very likely the model didn't stop to question if the game they were playing was something they knew already, and just assumed it was a puzzle created for it.

sfjailbird|1 month ago

You can see Claude's responses in the repo. The first one is:

Ah, Anchorhead! One of the most celebrated pieces of interactive fiction ever written

brianjeong|1 month ago

You could say the same about Pokemon - the models still struggle quite a bit.

Jweb_Guru|1 month ago

Yeah, I do not find performances like this very impressive.

IgorPartola|1 month ago

Honestly I am curious how it would do if it did have a walkthrough.