WingNews

overgard|12 days ago

I don't think "results don't match promises" is the same as "not knowing how to use it". I've been using Claude and OpenAI's latest models for the past two weeks now (probably moving at about 1000 lines of code a day, which is what I can comfortably review), and it makes subtle hard-to-find mistakes all over the place. Or it just misunderstands well known design patterns, or does something bone headed. I'm fine with this! But that's because I'm asking it to write code that I could write myself, and I'm actually reading it. This whole "it can build a whole company for me and I don't even look at it!" is overhype.

scoopdewoop|12 days ago

Prompting LLMs for code simply takes more than a couple of weeks to learn.

It takes time to get an intuition for the kinds of problems they've seen in pre-training, what environments it faced in RL, and what kind of bizarre biases and blindspots it has. Learning to google was hard, learning to use other peoples libraries was hard, and its on par with those skills at least.

If there is a well known design pattern you know, thats a great thing to shout out. Knowing what to add to the context takes time and taste. If you are asking for pieces so large that you can't trust them, ask for smaller pieces and their composition. Its a force multiplier, and your taste for abstractions as a programmer is one of the factors.

In early usenet/forum days, the XY problem described users asking for implementation details of their X solution to Y problem, rather than asking how to solve Y. In llm prompting, people fall into the opposite. They have an X implementation they want to see, and rather than ask for it, they describe the Y problem and expect the LLM to arrive at the same X solution. Just ask for the implementation you want.

Asking bots to ask bots seems to be another skill as well.

vidarh|11 days ago

Do you use an agent harness to have it review code for you before you do?

If not, you don't know how to use it efficiently.

A large part of using AI efficiently is to significantly lower that review burden by having it do far more of the verification and cleanup itself before you even look at it.

XenophileJKO|11 days ago

If you know good architecture and you are testing as you go, I would say, it is probably pretty damn close to being able to build a company without looking at the code. Not without "risk" but definitely doable and plausible.

My current project that I started this weekend is a rust client server game with the client compiled into web assembly.

I do these projects without reading the code at all as a way to gauge what I can possibly do with AI without reading code, purely operating as a PM with technical intuition and architectural opinions.

So far Opus 4.6 has been capable of building it all out. I have to catch issues and I have asked it for refactoring analysis to see if it could optimize the file structure/components, but I haven't read the code at all.

At work I certainly read all the code. But would recommend people try to build something non trivial without looking at the code. It does take skill though, so maybe start small and build up the intuition on how they have issues, etc. I think you'll be surprised how much your technical intuition can scale even when you are not looking at the code.

munksbeer|11 days ago

> and it makes subtle hard-to-find mistakes all over the place.

I agree. I'm constantly correcting the code it generates. But then, I do the same for humans when I review their PRs, and the LLM generated the code in a 100th of the time (or whatever figure you prefer).

docmars|11 days ago

And yet, this is exactly what my last job's engineering & product leadership did with their CEO at the helm, before they laid me off.

They vibe-coded a complete rewrite of their products in a few months without any human review. Hundreds of thousands LOC. I feel sorry for the remaining engineers having to learn everything they just generated, and are now having customers use.

politelemon|12 days ago

You are assuming that we all work on the same tasks and should have exactly the same experience with it, which is it course far from the truth. It's probably best to start with that base assumption and work on the implications from there.

As for the last example, for all the money being spent on this area, if someone is expected to perform a workflow based on the kind of question they're supposed to ask, that's a failure in the packaging and discoverability aspect of the product, the leaky abstraction only helps some of us who know why it's there.

harrall|12 days ago

I’ve been helping normal people at work use AI and there’s two groups that are really struggling:

1. People who only think of using AI in very specific scenarios. They don’t know when you use it outside of the obvious “to write code” situations and they don’t really use AI effectively and get deflated when AI outputs the occasional garbage. They think “isn’t AI supposed to be good at writing code?”

2. People who let AI do all the thinking. Sometimes they’ll use AI to do everything and you have to tell them to throw it all away because it makes no sense. These people also tend to dump analyses straight from AI into Slack because they lack the tools to verify if a given analysis is correct.

To be honest, I help them by teaching them fairly rigid workflows like “you can use AI if you are in this specific situation.” I think most people will only pick up tools effectively if there is a clear template. It’s basically on-the-job training.

illiac786|11 days ago

> On hacker news, a very tech literate place, I see people thinking modern AI models can’t generate working code.

I am completely flooded with comments and stories about how great LLMs are at coding. I am curious to see how you get a different picture than this? Can you point me to a thread or a story that supports your view? At the moment, individuals thinking AI cannot generate working code seem almost inexistent to me.

riskable|11 days ago

It's a real thing, but usually tied to IT folks that tried ChatGPT ~2 years ago (in a web browser) and had to "fix" whatever it output. That situation solidified their "understanding of AI" and they haven't updated their knowledge on the current situation (because... No pressing need).

Folks like this have never used AI inside of an IDE or one of the CLI AI tools. Without that perspective, AI seems mostly like a gimmick.

tstrimple|12 days ago

> On hacker news, a very tech literate place

I think this is the prior you should investigate. That may be what HN used to be. But it's been a long time since it has been an active reality. You can still see actual expert opinions on HN, but they are the minority more and more.

alephnerd|12 days ago

I think one longtime HN user (Karrot_Kream I think) pinpointed the change in HN discourse to sometime in mid 2022 to early 2023 when the rate of new users spiked to 40k per month and remained at that elevated rate.

From personal experience, I've also noticed that some of the most toxic discourse and responses I've received on this platform are overwhelmingly from post-2022 users.

3form|10 days ago

>You can’t expect revolutionary impact while people are still learning how to even use the thing. We’re so early.

What makes you think people this will ever change? Have you seen how well people know their already existing tools?

mrtksn|12 days ago

In a WhatsApp group full of doctors, managers, journalist and engineers (including software) in age of 30-60 I asked if anyone heard of openclaw and only 3 people heard of it from influencers, none used it.

But from my social feed the impression was that it is taking over the world:)

I asked it because I am building something similar since some tome and I thought its over they were faster than me but as it appears there’s no real adoption yet. Maybe there will be some once they release it as part of ChatGPT but even then it looks like too early as actually few people are using the more advanced tools.

It’s definitely in very early stage. It appears that so far the mainstream success in AI is limited to slop generation and even that is actually small number of people generating huge amounts of slop.

wiseowise|12 days ago

> I asked if anyone heard of twitter vaporware and only 3 people heard of it from influencers, none used it.

Shocking results, I say!

alephnerd|12 days ago

> I asked it because I am building something similar since some tome and I thought its over they were faster than me

If you have been working on a usecase similar to OpenClaw for sometime now I'd actually say you are in a great position to start raising now.

Being first to market is not a significant moat in most cases. Few people want to invest in the first company in a category - it's too risky. If there are a couple of other early players then the risk profile has been reduced.

That said, you NEED to concentrate on GTM - technology is commodified, distribution is not.

> It appears that so far the mainstream success in AI is limited to slop generation and even that is actually small number of people generating huge amounts of slop

The growth of AI slop has been exponential, but the application of agents for domain specific usecases has been decently successful.

The biggest reason you don't hear about it on HN is because domain-specific applications are not well known on HN, and most enterprises are not publicizing the fact that they are using these tools internally.

Furthermore, almost anyone who is shipping something with actual enterprise usage is under fairly onerous NDAs right now and every company has someone monitoring HN like a hawk.

bigbuppo|12 days ago

And it will get worse once the UX people get ahold of it.

scrubs|12 days ago

You got that right . .. imagine AI making more keyboard shortcuts, "helping" wayland move off X more so, new window transistions, overhauling htmx ... it'll be hell+ on earth.

KellyCriterion|12 days ago

A neighbour of me has a PhD and is working in research at a hospital. He is super smart.

Last time he said: "yes yes I know about ChatGPT, but I do not use it at work or home."

Therefore, most people wont even know about Gemini, Grok or even Claude.

Guillaume86|11 days ago

He said he know about it and your conclusion is that he doesnt know about the other ones...

slopinthebag|12 days ago

> I see people thinking modern AI models can’t generate working code.

Really? Can you show any examples of someone claiming AI models cannot generate working code? I haven't seen anyone make that claim in years, even from the most skeptical critics.

autoexec|12 days ago

I've seen it said plenty of the times that the code might work eventually (after several cycles of prompting and testing), but even then the code you get might not be something you'd want to maintain, and it might contain bugs and security issues that don't (at least initially) seem to impact its ability to do whatever it was written to do but which could cause problems later.

dangus|12 days ago

And really the problem isn’t that it can’t make working code, the problem is that it’ll never get the kind of context that is in your brain.

I started working today on a project I hadn’t touched in a while but I now needed to as it was involved in an incident where I needed to address some shortcomings. I knew the fix I needed to do but I went about my usual AI assisted workflow because of course I’m lazy the last thing I want to do is interrupt my normal work to fix this stupid problem.

The AI doesn’t know anything about the full scope of all the things in my head about my company’s environment and the information I need to convey to it. I can give it a lot of instructions but it’s impossible to write out everything in my head across multiple systems.

The AI did write working code, but despite writing the code way faster than me, it made small but critical mistakes that I wouldn’t have made on my first draft.

For example, it just added in a command flag that I knew that it didn’t need, and it actually probably should have known it, too. Basically it changed a line of code that it didn’t need to touch.

It also didn’t realize that the curled URL was going to redirect so we needed an -L flag. Maybe it should have but my brain knew it already.

It also misinterpreted some changes in direction that a human never would have. It confused my local repository for the remote one because I originally thought I was going to set a mirror, but I changed plans and used a manual package upload to curl from. So it out the remote URL in some places where the local one should have been.

Finally, it seems to have just created some strange text gore while editing the readme where it deleted existing content for seemingly no reason other than some kind of readline snafu.

So yes it produced very fast great code that would have taken me way longer to do, but I had to go back and consume a very similar amount of time to fix so many things that I might as well have just done it manually.

But hey I’m glad my company is paying $XX/month for my lazy workday machine.

zelphirkalt|11 days ago

Depends what they mean. Generate working code all the time or after going a few iterations of trying and promoting? It can very easily happen, that an LLM generates something that is a straight error, because it hallucinates some keyword argument or something like that, which doesn't actually exist. Only happened to me yesterday. So going from that, no, they are still not able to generate working code all the time. Especially, when the basis is a shoddy-made library itself, that is simply missing something required.

stackbutterflow|11 days ago

10 days ago someone was making this claim about copilot on legacy code: https://news.ycombinator.com/item?id=46932609

> Github Copilot has been great in getting that code coverage up marginally but ass otherwise.

IshKebab|11 days ago

I'll claim it. They can't generate working code for the things I am working on. They seem to be too complex or in languages that are too niche.

They can do a tolerable job with super popular /simple things like web dev and Python. It really depends on what you're doing.

KellyCriterion|12 days ago

Scroll up a few comments where someone said Claude is generating errors over and over again and that Claude cant work according to code guidelines etc :-))

(no title)

discuss