top | item 46767239

(no title)

thijser | 1 month ago

This reminds me of when I tried to let Claude port an Android libgdx-based game to a WASM-based libgdx version, so I can play the game in the browser.

No matter how much I tried to force it to stick to a mostly line-by-line port, it kept trying to "improve" the code. At some point it had to undo everything as it introduced a number of bugs. I asked it: "What should I add to your prompt so you won't do this again?" and it gave me this:

  ### CRITICAL LESSON: Don't "Improve" During Porting
  -  **BIGGEST MISTAKE: Reorganizing working code**
    - **What I did wrong:** Tried to "simplify" by splitting `createStartButton()` into separate creation and layout methods
    - **Why it failed:** Introduced THREE bugs:
      1. Layout overlap (getY() vs getY() - getHeight())
      2. Children not sized (Group.setSize() doesn't affect children)
      3. Origins not updated (scaling animations broken)
    - **The fix:** Deleted my "improvements" and copied the original Android pattern faithfully
    - **Root cause:** Arrogance - assuming I could improve production-tested code without understanding all the constraints
    - **Solution:** **FOLLOW THE PORTING PRINCIPLES ABOVE** - copy first, don't reorganize
    - **Time wasted:** ~1 hour debugging self-inflicted bugs that wouldn't exist if I'd just copied the original
    - **Key insight:** The original Android code is correct and battle-tested. Your "improvements" are bugs waiting to happen.

I like the self-reflection of Claude, unfortunately even adding this to CLAUDE.md didn't fix it and it kept taking wrong turns so I had to abandon the effort.

discuss

phpnode|1 month ago

Claude doesn't know why it acted the way it acted, it is only predicting why it acted. I see people falling for this trap all the time

LoganDark|1 month ago

It's not even predicting why it acted, it's predicting an explanation of why it acted, which is even worse since there's no consistent mental model.

GuB-42|1 month ago

It had been shown that LLMs don't know how they work. They asked a LLM to perform computations, and explain how they got to the result. The LLM explanation is typical of how we do it: add number digit by digit, with carry, etc... But by looking inside the neural network, it show that the reality is completely different and much messier. None of it is surprising.

Still, feeding it back its own completely made up self-reflection could be an effective strategy, reasoning models kind of work like this.

kaffekaka|1 month ago

Yes, this pitfall is a hard one. It is very easy to interpret the LLM in a way there is no real ground for.

nnevatie|1 month ago

That's because when the failure becomes the context, it can clearly express the intent of not falling for it again. However, when the original problem is the context, none of this obviousness applies.

Very typical, and gives LLMs the annoying Captain Hindsight -like behaviour.

nonethewiser|1 month ago

IDK how far AIs are from intelligence, but they are close enough that there is no room for anthropomorphizing them. When they are anthropomorphized its assumed to be a misunderstanding of how they work.

Whereas someone might say "geeze my computer really hates me today" if it's slow to start, and we wouldn't feel the need to explain the computer cannot actually feel hatred. We understand the analogy.

I mean your distinction is totally valid and I dont blame you for observing it because I think there is a huge misunderstanding. But when I have the same thought, it often occurs to me that people aren't necessarily speaking literally.

drob518|1 month ago

It’s not even doing that. It’s just an algorithm for predicting the next word. It doesn’t have emotions or actually think. So, I had to chuckle when it said it was arrogant. Basically, it’s training data contains a bunch of postmortem write ups and it’s using those as a template for what text to generate and telling us what we want to hear.

everfrustrated|1 month ago

Worth pointing out that your IDE/plugin usually adds a whole bunch of prompts before yours - let alone the prompts that the model hosting provider prepends as well.

This might be what is encouraging the agent to do best practices like improvements. Looking at mine:

>You are a highly sophisticated automated coding agent with expert-level knowledge across many different programming languages and frameworks and software engineering tasks - this encompasses debugging issues, implementing new features, restructuring code, and providing code explanations, among other engineering activities.

I could imagine that an LLM could well interpret that to mean improve things as it goes. Models (like humans) don't respond well to things in the negative (don't think about pink monkeys - Now we're both thinking about them).

hombre_fatal|1 month ago

It's also common for your own CLAUDE.md to have some generic line like "Always use best practices and good software design" that gets in the way of other prompts.

theLiminator|1 month ago

For anything large like this, I think it's critical that you port over the tests first, and then essentially force it to get the tests passing without mutating the tests. This works nicely for stuff that's very purely functional, a lot harder with a GUI app though.

pwdisswordfishy|1 month ago

The same insight can be applied to the codebase itself.

When you're porting the tests, you're not actually working on the app. You're getting it to work on some other adjacent, highly useful thing that supports app development, but nonetheless is not the app.

Rather than trying to get the language model to output constructs in the target PL/ecosystem that go against its training, get it to write a source code processor that you can then run on the original codebase to mechanically translate it into the target PL.

Not only does this work around the problem where you can't manage to convince the fuzzy machine to reliably follow a mechanical process, it sidesteps problems around the question of authorship. If a binary that has been mechanically translated from source into executable by a conventional compiler inherits the same rightsholder/IP status as the source code that it was mechanically translated from, then a mechanical translation by a source-to-source compiler shouldn't be any different, no matter what the model was trained on. Worst case scenario, you have to concede that your source processor belongs to the public domain (or unknowingly infringed someone else's IP), but you should still be able to keep both versions of your codebase, one in each language.

wyldfire|1 month ago

One thing that might be effective at limited-interaction recovery-from-ignoring-CLAUDE.md is the code-review plugin [1], which spawns agents who check that the changes conform to rules specified in CLAUDE.md.

[1] https://github.com/anthropics/claude-code/blob/main/plugins/...

surajrmal|1 month ago

I recently did a c++ to rust port with Gemini and it was basically a straight line port like I wanted. Nearly 10k lines of code too. It needed to change a bit of structure to get it compiling, but that's only because rust found bugs at compile time. I attribute this success to the fact my team writes c++ stylistically close to what is idiomatic rust, and that generally the languages are quite similar. I will likely do another pass in the future to turn the callback driven async into async await syntax, but off the bat it largely avoided doing so when it would change code structure.

arjie|1 month ago

It's not context-free (haha) but a trick you can try is to include negative examples into the prompt. It used to be an awful trick originally because of Waluigi Effect but then became a good trick, and lately with Opus 4.5 I haven't needed to do it that much. But it did work once. e.g. like take the original code and supply the correct answer and the wrong answers in the prompt as examples in Claude.MD and then redo.

If it works, do share.

pwdisswordfishy|1 month ago

Humans act the same way.

For all the (unfortunately necessary) conversations that have occurred over the years of the form, "JavaScript is not Java—they're two different languages," people sometimes go too far and tack on some remark like, "They're not even close to being alike." The reality, though, is that many times you can take some in-house package (though not the Enterprise-hardened™ ones with six different overloads for every constructor, and four for every method, and that buy hard into Java (or .NET) platform peculiarities—just the ones where someone wrote just enough code to make the thing work in that late-90's OOP style associated with Java), and more or less do a line-by-line port until you end up with a native JS version of the same program, which with a little more work will be able to run in browser/Node/GraalJS/GJS/QuickJS/etc. Generally, you can get halfway there by just erasing the types and changing the class/method declarations to conform to the different syntax.

Even so, there's something that happens in folks' brains that causes them to become deranged and stray far off-course. They never just take their program, where they've already decomposed the solution to a given problem into parts (that have already been written!), and then just write it out again—same components, same identifier names, same class structure. There's evidently some compulsion where, because they sense the absence of guardrails from the original language, they just go absolutely wild, turning out code that no one would or should want to read—especially not other programmers hailing from the same milieu who explicitly, avowedly, and loudly state their distaste for "JS" (whereby they mean "the kind of code that's pervasive on GitHub and NPM" and is so hated exactly because it's written in the style their coworker, who has otherwise outwardly appeared to be sane up to this point, just dropped on the team).

andai|1 month ago

Was this Claude Code? If you tried it with one file at a time in the chat UI I think you would get a straight-line port, no?

Edit: It could be because Rust works a little differently from other languages, a 1:1 port is not always possible or idiomatic. I haven't done much with Rust but whenever I try porting something to Rust with LLMs, it imports like 20 cargo crates first (even when there were no dependencies in the original language).

Also Rust for gamedev was a painful experience for me, because rust hates globals (and has nanny totalitarianism so there's no way to tell it "actually I am an adult, let me do the thing"), so you have to do weird workarounds for it. GPT started telling me some insane things like, oh it's simple you just need this rube goldberg of macro crates. I thought it was tripping balls until I joined a Rust discord and got the same advice. I just switched back to TS and redid the whole thing on the last day of the jam.

zozbot234|1 month ago

> rust hates globals

Rust has added OnceCell and OnceLock recently to make threadsafe globals a lot easier for some things. it's not "hate", it just wants you to be consistent about what you're doing.

antonvs|1 month ago

That’s a terrible prompt, more focused on flagellating itself for getting things wrong than actually documenting and instructing what’s needed in future sessions. Not surprising it doesn’t help.

abyesilyurt|1 month ago

Sonnet 4.5 had this problem. Opus 4.5 is much better at focusing on the task instead of getting sidetracked.

0x696C6961|1 month ago

I wish there was a feature to say "you must re-read X" after each compaction.

nickreese|1 month ago

Some people use hooks for that. I just avoid CC and use Codex.

esafak|1 month ago

https://github.com/anthropics/claude-code/issues/3537 https://github.com/anthropics/claude-code/issues/14258

taberiand|1 month ago

Getting the context full to the point of compaction probably means you're already dealing with a severely degraded model, the more effective approach is to work in chunks that don't come close to filling the context window

philipp-gayret|1 month ago

There's no PostCompact hook unfortunately. You could try with PreCompact and giving back a message saying it's super duper important to re-read X, and hope that survives the compacting.

root_axis|1 month ago

What would it even mean to "re-read after a compaction"?

andai|1 month ago

Tangential but doesn't libgdx have native web support?

esafak|1 month ago

It doesn't seem very bound by CLAUDE.md

badlogic|1 month ago

libGDX, now that's a name I haven't heard in a while.

Lionga|1 month ago

Well its close to AGI, can you really expect AGI to follow simple instructions from dumbos like you when it can do the work of god?

b00ty4breakfast|1 month ago

as an old coworker once said, when talking about a certain manager; That boy's just smart enough to be dumb as shit (The AI, not you; I don't know you well enough to call you dumb)