top | item 44621492

(no title)

recipe19 | 7 months ago

I work on niche platforms where the amount of example code on Github is minimal, and this definitely aligns with my observations. The error rate is way too high to make "vibe coding" possible.

I think it's a good reality check for the claims of impending AGI. The models still depend heavily on being able to transform other people's work.

discuss

order

winrid|7 months ago

Even with typescript Claude will happily break basic business logic to make tests pass.

motorest|7 months ago

> Even with typescript Claude will happily break basic business logic to make tests pass.

It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met. If you don't explicitly instruct them, they can't and won't tell apart project code from test code. So they will change your project code to make tests work.

This is not a bug. Changing project code to make tests pass is a fundamental approach to refactoring projects, and the whole basis of TDD. If that's not what you want, you need to prompt them accordingly.

bapak|7 months ago

Speaking of TypeScript, every time I feed a hard type problem to LLMs they just can't do it. Sometimes I find out it's a TS limitation or just not implemented yet, but that won't stop us from wasting 40 minutes together.

rs186|7 months ago

When I vibe coded with GitHub Copilot in TypeScript, it keeps using "any" even though those variables had clear interfaces already defined somewhere in the code. This drove me crazy, as I had to go in and manually fix all those things. The only thing that helps a bit is me screaming "DO NOT EVER USE 'any' TYPE". I can't understand why it would do this.

CalRobert|7 months ago

That seems like the tests don’t work?

pygy_|7 months ago

I've had a similar problem with WebGPU and WGSL. LLMs create buffers with the wrong flags (and other API usage errors), doesn't clean up resources, mix up GLSL and WGSL, write semi-less WGSL (in template strings) if you ask them to write semi-less [0] JS...

It's a big mess.

0. https://github.com/isaacs/semicolons/blob/main/semicolons.js

poniko|7 months ago

Yes and if you work with a plarform that has been arround for long time like .net you will most definitely get a mix of really outdated deprecated code mixed with the latest features.

remich|7 months ago

I recommend the context7 MCP tool for this exact purpose. I've been trying to really push agents lately at work to see where they fall down and whether better context can fix it.

As a test recently I instructed an agent using Claude to create a new MCP server in Elixir based on some code I provided that was written in Python. I know that, relatively speaking, Python is over-represented in training data and Elixir is under-represented. So, when I asked the agent to begin by creating its plan, I told it to reference current Elixir/Phoenix/etc documentation using context7 and to search the web using Kagi Search MCP for best practices on implementing MCP servers in Elixir.

It was very interesting to watch how the initially generated plan evolved after using these tools and how after using the tools the model identified an SDK I wasn't even aware of that perfectly fit the purpose (Hermes-mcp).

ragequittah|7 months ago

This is easily solved by feeding the LLM the correct documentation. I was having problems with tailwind because of this right up until I had ChatGPT deep research come up with a spec sheet on how to use the latest version of it. Fed it into the various AIs I've been using (worked for ChatGPT, Claude, and Cursor) and no problems since.

gompertz|7 months ago

Yep I program in some niche languages like Pike, Snobol4, Unicon. Vibe coding is out of the question for these languages. Forced to use my brain!

johnisgood|7 months ago

You could always feed it some documentation and example programs. I did it with a niche language and it worked out really well, with Claude. Around 8 months ago.

empressplay|7 months ago

I don't know if you're working with modern models. Grok 4 doesn't really know much about assembly language on the Apple II but I gave it all of the architectural information it needed in the first prompt of a conversation and it built compilable and executable code. Most of the issues I encountered were due to me asking for too much in a prompt. But it built a complete, albeit simple, assembly language game in a few hours of back and forth with it. Obviously I know enough about the Apple II to steer it when it goes awry, but it's definitely able to write 'original' code in a language / platform it doesn't inherently comprehend.

timschmidt|7 months ago

This matches my experience as well. Poor performance usually means I haven't provided enough context or have asked for too much in a single prompt. Modifying the prompt accordingly and iterating usually results in satisfactory output within the next few tries.

vineyardmike|7 months ago

Completely agree. I’m a professional engineer, but I like to get some ~vibe~ help on person projects after-work when I’m tired and just want my personal project to go faster. I’ve had a ton of success with go, JavaScript, python, etc. I had mixed-success with writing idiomatic Elixir roughly a year ago, but I’ve largely assumed that this would be resolved today, since every model maker has started aggressively filling training data with code, since we found the PMF of LLM code-assistance.

Last night I tried to build a super basic “barely above hello world” project in Zig (a language where IDK the syntax), and it took me trying a few different LLMs to find one that could actually write anything that would compile (Gemini w/ search enabled). I really wasn’t expecting it considering how good my experience has been on mainstream languages.

Also, I think OP did rather well considering BASIC is hardly used anymore.

andsoitis|7 months ago

> The models

The models don’t have a model of the world. Hence they cannot reason about the world.

pygy_|7 months ago

I tried vibe coding WebGPU/WGSl, which is thoroughly documented, but has little actual code around, and LLMs are pretty bad at it right now.

They don't need a formal model, they need examples from which they can pilfer.

bawana|7 months ago

The theory is that language is an abstraction built on top of the world and therefore encompasses all human experience of the world. The problem will arise however when the world (aka nature) acts in an unexpected way outside human experience

hammyhavoc|7 months ago

"reason" is doing some heavy-lifting in the context of LLMs.

jjmarr|7 months ago

I've noticed the error rate doesn't matter if you have good tooling feeding into the context. The AI hallucinates, sees the bug, and fixes it for you.

cmrdporcupine|7 months ago

I find for these kinds of systems, if I pre-seed Claude Code with a read of the language manual (even the BNF etc) and a TLDR of what it is, results are far better. Just part of the initial prompt: read this summary page, read this grammar, and look at this example code.

I have had it writing LambdaMOO code, with my own custom extensions (https://github.com/rdaum/moor) and it's ... not bad considering.