"it mostly worked" is just a more nuanced way of saying "it didn't work".
Apparently the author did eventually get something working, but it is false to say that the LLMs produced a working project.
Well, yeah. It’s a more nuanced way of saying that because “it didn’t work” isn’t very useful nor descriptive.
What if it wrote all of the boilerplate and just let you focus on the important bit that deserves your scrutiny?
You could say I failed every single project I ever built because it took many iterations to get to the final deliverable stage with lots of errors along the way. So more nuance would be needed.
But when it comes to LLMs suddenly we get all gleeful about how negatively we can frame the experience. Even among HN tech scholars.
I dunno. Depending on the writer and their particularly axe to grind the definition can vary widely. I would like it to mean, "any fixes I needed to make were minimal and not time intensive."
What is your definition of "a working project"? It does what it says on the tin (actually it probably does more, because splint throws some warnings...)
I would never ever let an LLM anywhere near C code. If you need help from LLM to write a NIF that performs basic C calls to the OS, you probably can’t check if it’s safe. I mean, it needs at least to pass valgrind.
Security is a spectrum. If you totally control the input going into a program, it can be safe even if you didn't test it for memory leaks. The only errors that occur will be truly erroneous, not malicious and for many solutions that's fine.
At the very least, it's fine for personal projects which is something I'm getting into more and more: remembering that computers were meant to create convenience, so writing small programs to make life easier.
built my startup in elixir. love it but nifs are one of the few ways you can crash the VM. I don't trust myseld to write a nif in production. no way I'd do it with AI in c. Thank god theres projects like rustler which can catch panics before it crashes the main VM.
I've done this. The NIF worked as in that it ran and was a correct enough NIF. It did not work in terms of solving what I needed it to do. Iteration was a bit painful because it was tangled with a nasty library that needed to be cross-compiled. So when I made a change it seg faulted and I bailed.
I essentially ran out of patience and tried another approach. It involved an LLM running C code so I could check the library output compared to my implementation to make sure it was byte-for-byte.
The C will never ship. I don't have practice writing C so I am very inefficient at it. I read it okay. LLMs are pretty decent help for this type of scrap code.
I once wrote a little generalized yaml templating processor in Python by using an LLM for assistance. It was working pretty well and passing a lot of the tests that I was throwing at it!
Then I noticed that some of the tests that failed were failing in really odd ways. Upon closer inspection, the generated processor had made lots of crazy assumptions about what it should be doing based upon specific values in yaml keys that were obviously unrelated to instructions.
Yeah, I agree with the author. This stuff can be incredibly useful, but it definitely isn't anything like an AGI in its current form.
I tried to do this a few weeks ago, I tried to build a NIF around an existing C lib. I was using Claude Opus and burned over $300 (I didn't have Pro) on tokens with no usable results.
Get Pro, 4 is quite good at Elixir now but you have to stay on it. 3.5 was not, so I imagine next version of Claude will be able to handle the more esoteric things like NIFs, etc.
Why C instead of Rust or Zig? Rustler and Zigler exist.
I feel like a Vibecoded NIF in C is the absolute last thing I would want to expose the BEAM to.
Given the amount of issues the code had when I ran splint on the C file, I agree. The question was for me whether I can get something working to get over the "speed bump" of lacking such a function for the API client I'm writing.
I'm now re-vibe-coding it into Rust with the same process, but also using Grok 4 to get better results. It now builds and passes the tests on Elixir 1.14 to 1.18 on macOS and Ubuntu, but I'm still trying to get Grok 3 and 4 to fix the Windows-specific parts of the Rust code.
It's interesting why the author used weaker models (like Grok 3 when 4 is available, and Gemini 2.5 Flash when Pro is), since the difference in coding quality between these models is significant, and results could be much better.
Because what difference would it make, given the bad quality of code?
Also, is Claude Code free to use?
The manual process has the upside that you get to see how the sausage is (badly) made. Otherwise, just YOLO it and put your trust in GenAI completely.
Furthermore, if there is the interim step of pushing to GitHub to trigger the build & test workflow and see if it works on something other than Linux, is the choice of Vibe-Coding IDE really the limiting factor in the entire process?
So all this arose because you didn't read the docs and note that get_disk_info/1 immediately fetches the data when called? The every-30-minutes-by-default checks are for generating "disk usage is high" event conditions.
flax|6 months ago
hombre_fatal|6 months ago
What if it wrote all of the boilerplate and just let you focus on the important bit that deserves your scrutiny?
You could say I failed every single project I ever built because it took many iterations to get to the final deliverable stage with lots of errors along the way. So more nuance would be needed.
But when it comes to LLMs suddenly we get all gleeful about how negatively we can frame the experience. Even among HN tech scholars.
jgalt212|6 months ago
overbring_labs|6 months ago
faangguyindia|6 months ago
vrighter|6 months ago
unknown|6 months ago
[deleted]
brokencode|6 months ago
As you said, the very title of the article acknowledged that it didn’t produce a working product.
This is just outrage for the sake of outrage.
drumnerd|6 months ago
simonw|6 months ago
overbring_labs|6 months ago
true_religion|6 months ago
At the very least, it's fine for personal projects which is something I'm getting into more and more: remembering that computers were meant to create convenience, so writing small programs to make life easier.
simonw|6 months ago
I had a bunch of fun getting ChatGPT Code Interpreter to write (and compile and test) C extensions for SQLite last year: https://simonwillison.net/2024/Mar/23/building-c-extensions-...
victorbjorklund|6 months ago
unknown|6 months ago
[deleted]
cultofmetatron|6 months ago
worthless-trash|6 months ago
lawik|6 months ago
I essentially ran out of patience and tried another approach. It involved an LLM running C code so I could check the library output compared to my implementation to make sure it was byte-for-byte.
The C will never ship. I don't have practice writing C so I am very inefficient at it. I read it okay. LLMs are pretty decent help for this type of scrap code.
_ea1k|6 months ago
Then I noticed that some of the tests that failed were failing in really odd ways. Upon closer inspection, the generated processor had made lots of crazy assumptions about what it should be doing based upon specific values in yaml keys that were obviously unrelated to instructions.
Yeah, I agree with the author. This stuff can be incredibly useful, but it definitely isn't anything like an AGI in its current form.
bcardarella|6 months ago
cpursley|6 months ago
weatherlight|6 months ago
overbring_labs|6 months ago
I'm now re-vibe-coding it into Rust with the same process, but also using Grok 4 to get better results. It now builds and passes the tests on Elixir 1.14 to 1.18 on macOS and Ubuntu, but I'm still trying to get Grok 3 and 4 to fix the Windows-specific parts of the Rust code.
qualeed|6 months ago
leansensei|6 months ago
ch4s3|6 months ago
overbring_labs|6 months ago
SweetSoftPillow|6 months ago
overbring_labs|6 months ago
overbring_labs|6 months ago
wordofx|6 months ago
overbring_labs|6 months ago
Also, is Claude Code free to use?
The manual process has the upside that you get to see how the sausage is (badly) made. Otherwise, just YOLO it and put your trust in GenAI completely.
Furthermore, if there is the interim step of pushing to GitHub to trigger the build & test workflow and see if it works on something other than Linux, is the choice of Vibe-Coding IDE really the limiting factor in the entire process?
unknown|6 months ago
[deleted]
juped|6 months ago
leansensei|6 months ago
However, this NIF also returns more fields than the disksup function.