top | item 47016103

(no title)

lr4444lr | 15 days ago

At this point, I trust LLMs to come up with something more secure than the cheapest engineering firm for hire.

discuss

order

nozzlegear|15 days ago

"Anyone else out there vibe circuit-building?"

https://xcancel.com/beneater/status/2012988790709928305

godelski|15 days ago

Is there more context to this? I'm assuming Ben is experimenting and demonstrating the danger of vibe circuit designing? Mostly because I know he has a ton of experience and I'd expect him to not make this mistake (also seems like he told the AI why it was wrong)

alexjplant|15 days ago

People make these mistakes too. Several times in my high school shop class kids shorted out 9V batteries trying to build circuits because they didn't understand how electronics work. At no point did our teacher stop them from doing so - on at least one occasion I unplugged one from a breadboard before it got too toasty to handle (and I was/am an electronics nublet). Similarly there was also a lot of hand-wringing about the Gemini pizza glue in a world where people do wacky stuff like cook fish in a dishwasher or defrost chicken overnight on the counter or put cooked steak on the same plate it was on when raw just a few minutes prior.

LLMs are just surfacing the fact that assessing and managing risk is an acquired, difficult-to-learn skill. Most people don't know what they don't know and fail to think about what might happen if they do something (correctly or otherwise) before they do it, let alone what they'd do if it goes wrong.

JKCalhoun|14 days ago

Ha ha, I said this before when Ben's post came up earlier, but, yes I am. And so far it has been a positive experience.

Aurornis|15 days ago

The cheapest engineering firms you hire are also using LLMs.

The operator is still a factor.

jama211|15 days ago

Yeah, but they’ll add another layer of complexity over doing it yourself

Kiro|15 days ago

LLMs definitely write more robust code than most. They don't take shortcuts or resort to ugly hacks. They have no problem writing tedious guards against edge cases that humans brush off. They also keep comments up to date and obsess over tests.

thayne|15 days ago

> They don't take shortcuts or resort to ugly hacks.

That hasn't, universally, been my experience. Sometimes the code is fine. Sometimes it is functional, but organized poorly, or does things in a very unusual way that is hard to understand. And sometimes it produces code that might work sometimes but misses important edge cases and isn't robust at all, or does things in an incredibly slow way.

> They have no problem writing tedious guards against edge cases that humans brush off.

The flip side of that is that instead of coming up with a good design that doesn't have as many edge cases, it will write verbose code that handles many different cases in similar, but not quite the same ways.

> They also keep comments up to date and obsess over tests.

Sure but they will often make comments or tests that aren't actually useful, or modify tests to succeed instead of fixing the code.

One significant danger of LLMs is that the quality of the output is higly variable and unpredictable.

That's ok, if you have someone knowledgeable reviewing and correcting it. But if you blindly trust it, because it produced decent results a few times, you'll probably be sorry.

BoorishBears|15 days ago

I had 5.3-Codex take two tries to satisfy a linter on Typescript type definitions.

It gave up, removed the code it had written directly accessing the correct property, and replaced it with a new function that did a BFS to walk through every single field in the API response object while applying a regex "looksLikeHttpsUrl" and hoping the first valid URL that had https:// would be the correct key to use.

On the contrary, the shift from pretraining driving most gains to RL driving most gains is pressuring these models resort to new hacks and shortcuts that are increasingly novel and disturbing!

godelski|14 days ago

  > They don't take shortcuts or resort to ugly hacks.
My experience is quite different

  > They have no problem writing tedious guards against edge cases that humans brush off. 
Ditto.

I have a hard time getting them to write small and flexible functions. Even with explicit instructions about how a specific routine should be done. (Really easy to produce in bash scripts as they seem to avoid using functions, but so do people, but most people suck at bash) IME they're fixated on the end goal and do not grasp the larger context (which is often implicit though I still find difficulty when I'm highly explicit. Which at that point it's usually faster to write myself)

It also makes me question context. Are humans not doing this because they don't think about it or because we've been training people to ignore things? How often do we hear "I just care that it works?" I've only heard that phrase from those that also love to talk about minimum viable products because... frankly, who is not concerned if it works? That's always been a disagreement about what is sufficient. Only very junior people believe in perfection. It's why we have sayings like "there's no solution more permanent than a temporary fix that works". It's the same people who believe tests are proof of correctness rather than a bound on correctness. The same people who read that last sentence and think I'm suggesting to not write tests or believe tests are useless.

I'd be concerned with the LLM operator quite a bit because of this. Subtle things are important when instructing LLMs. Subtle things in the prompts can wildly change the output

girvo|14 days ago

They absolutely take shortcuts and resort to ugly hacks.

My AGENTS.md is filled with specific lines to counter all of them that come up.

devmor|15 days ago

Interesting and completely wrong statement, what gave you this impression?

Hendrikto|14 days ago

> They don't take shortcuts or resort to ugly hacks.

In my experience that is all they do, and you constantly have to fight them to get the quality up, and then fight again to prevent regressions on every change.

kahnclusions|14 days ago

What? Yes they do take shortcuts and hacks. They change the tests case to make it pass. As the context gets longer it is less reliable at following earlier instructions. I literally had Claude hallucinate nonexistent APIs and then admitted “You caught me! I didn’t actually know, let me do a web search” and then after the web search it still mixes deprecated patterns and APIs against instructions.

I’m much more worried about the reliability of software produced by LLMs.

Aurornis|14 days ago

> LLMs definitely write more robust code than most.

I’ve been using Opus 4.6 and GPT-Codex-5.3 daily and I see plenty of hacks and problems all day long.

I think this is missing the point. The code in this product might be robust in the sense that it follows documentation and does things without hacks, but the things it’s doing are a mismatch for what is needed in the situation.

It might be perfectly structured code, but it uses hardcoded shared credentials.

A skilled operator could have directed it to do the right things and implement something secure, but an unskilled operator doesn’t even know how to specify the right requirements.

seanmcdirmid|14 days ago

I don’t know, you can get a lot of nice engineering done in a Shenzhen dark alley.

lukan|15 days ago

And the cheapest engineering firm won't use LLMs as well, wherever possible?

fc417fc802|15 days ago

The cheapest engineering firm will turn out to be headed up by an openclaw instance.

TheRealPomax|15 days ago

fun fact, LLMs come in cheapest and useless and expensive but actually does what's being asked, too.

So, will they? Probably. Can you trust the kind of LLM that you would use to do a better job than the cheapest firm? Absolutely.