top | item 47162703

(no title)

measurablefunc | 4 days ago

I don't know, that's why I asked b/c I always see a lot of empty platitudes when it comes to LLM praise so I'm curious to see if people can actually back up their claims.

I haven't done any 3D modeling so I'll take your word for it but I can tell you that I am working on a very simple interpreter & bytecode compiler for a subset of Erlang & I have yet to see anything novel or even useful from any of the coding assistants. One might naively think that there is enough literature on interpreters & compilers for coding agents to pretty much accomplish the task in one go but that's not what happens in practice.

discuss

order

brookst|4 days ago

It’s taken me a while to get good at using them.

My advice: ask for more than what you think it can do. #1 mistake is failing to give enough context about goals, constraints, priorities.

Don’t ask “complete this one small task”, ask “hey I’m working on this big project, docs are here, source is there, I’m not sure how to do that, come up with a plan”

measurablefunc|4 days ago

The specification is linked in another comment in this thread & you can decide whether it is ambitious enough or not but what I can tell you is that none of the existing coding agents can complete the task even w/ all the details. If you do try it you will eventually get something that will mostly work on simple tests but fail miserably on slightly more complicated test cases.

pushedx|4 days ago

Which agents are you using, and are you using them in an agent mode (Codex, Claude Code etc.)?

The difference in quality of output between Claude Sonnet and Claude Opus is around an order of magnitude.

The results that you can get from agent mode vs using a chat bot are around two orders of magnitude.

measurablefunc|4 days ago

The workflow is not the issue. You are welcome to try the same challenge yourself if you want. Extra test cases (https://drive.proton.me/urls/6Z6557R2WG#n83c6DP6mDfc) & specification (https://claude.ai/public/artifacts/5581b499-a471-4d58-8e05-1...). I know enough about compilers, bytecode VMs, parsers, & interpreters to know that this is well within the capabilities of any reasonably good software engineer but the implementation from Gemini 3.1 Pro (high & low) & Claude Opus 4.6 (thinking) have been less than impressive.

kmaitreys|4 days ago

Can you clarify a bit more about the this two orders of magnitude? In what context? Sure, they have "agency" and can do more than outputting text, but I would like see a proper example of this claim.

joquarky|4 days ago

Most humans can't force themselves to come up with something novel immediately upon demand.

measurablefunc|4 days ago

Completely unrelated to the topic or any of the points I was making so did you get confused & respond to the wrong thread?