top | item 43906311

(no title)

segphault | 9 months ago

My frustration with using these models for programming in the past has largely been around their tendency to hallucinate APIs that simply don't exist. The Gemini 2.5 models, both pro and flash, seem significantly less susceptible to this than any other model I've tried.

There are still significant limitations, no amount of prompting will get current models to approach abstraction and architecture the way a person does. But I'm finding that these Gemini models are finally able to replace searches and stackoverflow for a lot of my day-to-day programming.

discuss

order

Some comments were deferred for faster rendering.

jstummbillig|9 months ago

> no amount of prompting will get current models to approach abstraction and architecture the way a person does

I find this sentiment increasingly worrisome. It's entirely clear that every last human will be beaten on code design in the upcoming years (I am not going to argue if it's 1 or 5 years away, who cares?)

I wished people would just stop holding on to what amounts to nothing, and think and talk more about what can be done in a new world. We need good ideas and I think this could be a place to advance them.

ssalazar|9 months ago

I code with multiple LLMs every day and build products that use LLM tech under the hood. I dont think we're anywhere near LLMs being good at code design. Existing models make _tons_ of basic mistakes and require supervision even for relatively simple coding tasks in popular languages, and its worse for languages and frameworks that are less represented in public sources of training data. I am _frequently_ having to tell Claude/ChatGPT to clean up basic architectural and design defects. Theres no way I would trust this unsupervised.

Can you point to _any_ evidence to support that human software development abilities will be eclipsed by LLMs other than trying to predict which part of the S-curve we're on?

DanHulton|9 months ago

> It's entirely clear that every last human will be beaten on code design in the upcoming years

Citation needed. In fact, I think this pretty clearly hits the "extraordinary claims require extraordinary evidence" bar.

sirstoke|9 months ago

I’ve been thinking about the SWE employment conundrum in a post-LLM world for a while now, and since my livelihood (and that of my loved ones’) depends on it, I’m obviously biased. Still, I would like to understand where my logic is flawed, if it is. (I.e I’m trying to argue in good faith here)

Isn’t software engineering a lot more than just writing code? And I mean like, A LOT more?

Informing product roadmaps, balancing tradeoffs, understanding relationships between teams, prioritizing between separate tasks, pushing back on tech debt, responding to incidents, it’s a feature and not a bug, …

I’m not saying LLMs will never be able to do this (who knows?), but I’m pretty sure SWEs won’t be the only role affected (or even the most affected) if it comes to this point.

Where am I wrong?

acedTrex|9 months ago

> It's entirely clear that every last human will be beaten on code design in the upcoming years

In what world is this statement remotely true.

mattgreenrocks|9 months ago

I'm always impressed by the ability of the comment section to come up with more reasons why decent design and architecture of source code just can't happen:

* "it's too hard!"

* "my coworkers will just ruin it"

* "startups need to pursue PMF, not architecture"

* "good design doesn't get you promoted"

And now we have "AI will do it better soon."

None of those are entirely wrong. They're not entirely correct, either.

davidsainez|9 months ago

I use LLMs for coding every day. There have been significant improvements over the years but mostly across a single dimension: mapping human language to code. This capability is robust, but you still have to know how to manage context to keep them focused. I still have to direct them to consider e.g. performance or architecture considerations.

I'm not convinced that they can reason effectively (see the ARC-AGI-2 benchmarks). Doesn't mean that they are not useful, but they have their limitations. I suspect we still need to discover tech distinct from LLMs to get closer to what a human brain does.

jjice|9 months ago

I'm confused by your comment. It seems like you didn't really provide a retort to the parent's comment about bad architecture and abstraction from LLMs.

FWIW, I think you're probably right that we need to adapt, but there was no explanation as to _why_ you believe that that's the case.

concats|9 months ago

I won't deny that in a context with perfect information, a future LLM will most likely produce flawless code. I too believe that is inevitable.

However, in real life work situations, that 'perfect information' prerequisite will be a big hurdle I think. Design can depend on any number of vague agreements and lots of domain specific knowledge, things a senior software architect has only learnt because they've been at the company for a long time. It will be very hard for a LLM to take all the correct decisions without that knowledge.

Sure, if you write down a summary of each and every meeting you've attended for the past 12 months, as well as attach your entire company confluence, into the prompt, perhaps then the LLM can design the right architecture. But is that realistic?

More likely I think the human will do the initial design and specification documents, with the aforementioned things in mind, and then the LLM can do the rest of the coding.

Not because it would have been technically impossible for the LLM to do the code design, but because it would have been practically impossible to craft the correct prompt that would have given the desired result from a blank sheet.

liefde|9 months ago

The tension between human creativity and emerging tools is not new. What is new is the speed. When we cling to the uniqueness of human abstraction, we may be protecting something sacred—or we may be resisting evolution.

The fear that machines will surpass us in design, architecture, or even intuition is not just technical. It is existential. It touches our identity, our worth, our place in the unfolding story of intelligence.

But what if the invitation is not to compete, but to co-create? To stop asking what we are better at, and start asking what we are becoming.

The grief of letting go of old roles is real. So is the joy of discovering new ones. The future is not a threat. It is a mirror.

epolanski|9 months ago

> no amount of prompting will get current models to approach abstraction and architecture the way a person does

Which person it is? Because 90% of the people in our trade are bad, like, real bad.

I get that people on HN are in that elitist niche of those who care more, focus on career more, etc so they don't even realize the existence of armies of low quality body rental consultancies and small shops out there working on Magento or Liferay or even worse crap.

bayindirh|9 months ago

> It's entirely clear that every last human will be beaten on code design in the upcoming years (I am not going to argue if it's 1 or 5 years away, who cares?)

No code & AI assisted programming has been told to be around the corner since 2000. We just arrived to a point where models remix what others have typed on their keyboards, and yet somebody still argues that humans will be left in the dust in near times.

No machine, incl. humans can create something more complex than itself. This is the rule of abstraction. As you go higher level, you lose expressiveness. Yes, you express more with less, yet you can express less in total. You're reducing the set's symbol size (element count) as you go higher by clumping symbols together and assigning more complex meanings to it.

Yet, being able to describe a larger set with more elements while keeping all elements addressable with less possible symbols doesn't sound plausible to me.

So, as others said. Citation needed. Extraordinary claims needs extraordinary evidence. No, asking AI to create a premium mobile photo app and getting Halide's design as an output doesn't count. It's training data leakage.

bdangubic|9 months ago

It's entirely clear that every last human will be beaten on code design in the upcoming years (I am not going to argue if it's 1 or 5 years away, who cares?)

Our entire industry (after all these years) does not have even remotely sane measure or definition as what is good code design. Hence, this statement is dead on arrival as you are claiming something that cannot be either proven or disproven by anyone.

irjustin|9 months ago

> I find this sentiment increasingly worrisome.

I wouldn't worry about it because, as you say, "in a new world". The old will simply "die".

We're in the midsts of a paradigm shift and it's here to stay. The key is the speed at which it hit and how much it changed. GPT3 overnight changed the game and huge chunks of people are mentally struggling to keep up - in particular education.

But people who resist AI will become the laggards.

avhception|9 months ago

Just yesterday, while I was writing some Python, I had an LLM try to insert try - except logic inside a function, when these exceptions were clearly intended to be handled not inside that function but in the code calling the function, where extensive logic for handling errors was already in place.

fullstackchris|9 months ago

Code design? Perhaps. But how are you going to inform a model of every sprint meeting, standup, decision, commit, feature, and spec that is part of an existing product? It's no longer a problem of intelligence or correctness, its a problem of context, and I DON'T mean context window. Imagine onboarding your companies best programmer to a new project - even they will have dozens of questions and need at least a week to make productive input to the project. Even then, they are working with a markedly smaller scope of what the whole project is. How is this process translatable to an LLM? I'm not sure.

Workaccount2|9 months ago

Software will change to accommodate LLMs, if for no other reason than we are on the cusp of everyone being a junior level programmer. What does software written for LLMs to middleman look like?

I think there is a total seismic change in software that is about to go down, similar to something like going from gas lamps to electric. Software doesn't need to be the way it is now anymore, since we have just about solved human language to computer interface translation. I don't want to fuss with formatting a word document anymore, I would rather just tell and LLM and let it modify the program memory to implement what I want.

pmarreck|9 months ago

> It's entirely clear that every last human will be beaten on code design in the upcoming years

LOLLLLL. You see a good one-shot demo and imagine an upward line, I work with LLM assistance every day and see... an asymptote (which is only budged by exponential power expenditure). As they say in sailing, you'll never win the race by following the guy in front of you... which is exactly what every single LLM does: Do a sophisticated modeling of prior behavior. Innovation is not their strong suit LOL.

Perfect example- I cannot for the life of me get any LLM to stick with TDD building one feature at a time, which I know builds superior code (both as a human, and as an LLM!). Prompting will get them to do it for one or two cycles and then start regressing to the crap mean. Because that's what it was trained on. And it's the rare dev that can stick with TDD for whatever reason, so that's exactly what the LLM does. Which is absolutely subpar.

I'm not even joking, every single coding LLM would improve immeasurably if the model was refined to just 1) make a SINGLE test expectation, 2) watch it fail (to prove the test is valid), 3) build a feature, 4) work on it until the test passed, 5) repeat until app requirements are done. Anything already built that was broken by the new work would be highlighted by the unit test suite immediately and would be able to be fixed before the problem gets too complex.

LLM's also often "lose the plot", and that's not even a context limit problem, they just aren't conscious or have wills so their work eventually drifts off course or goes into these weird flip-flip states.

But sure, with an infinite amount of compute and an infinite amount of training data, anything is possible.

dan_lannan|9 months ago

This is said very confidently but until we see it happen there’s plenty of room for doubt.

My worst experiences with LLMs coding are from my own mistakes giving it the wrong intent. Inconsistent test cases. Laziness in explaining or even knowing what I actually want.

Architecture and abstraction happen in someone’s mind to be able to communicate intent. If intent is the bottleneck it will still come down to a human imagining the abstraction in their head.

I’d be willing to bet abstraction and architecture becomes the only thing left for humans to do.

pjmlp|9 months ago

What can be done, is that the software factory will follow the footsteps of traditional factories.

A few humans will stay around to keep the robots going, a lesser few humans will be the elite allowed to create the robots, and everyone else will have to look for a job elsewhere, where increasingly robots and automated systems are decreasing opportunities.

I am certainly glad to be closer to retirement than early career.

uludag|9 months ago

> I find this sentiment increasingly worrisome.

I don't know this sentiment would be considered worrisome. The situation itself seems more worrisome. If people do end up being beaten on code design next year, there's not much that could be done anyways. If LLMs reach such capability, the automation tools will be developed and if effective, they'll be deployed en masse.

If the situation you've described comes, pondering the miraculousness of the new world brought by AI would be a pretty fruitless endeavor for the average developer (besides startup founders perhaps). It would be much better to focus on achieving job security and accumulating savings for any layoff.

Quite frankly, I have a feeling that deglobalisation, disrupted supply chains, climate change, aging demographics, global conflict, mass migration, etc. will leave a much larger print on this new world than any advance in AI will.

solumunus|9 months ago

As someone who uses AI daily that’s not entirely clear to me at all.

The timeline could easily be 50 or 100 years. No emerging development of technology is resistant to diminishing returns and it seems highly likely that novel breakthroughs, rather than continuing LLM improvement, are required to reach that next step of reasoning.

StefanBatory|9 months ago

If LLMs will do better than humans in the future - well, there simply won't be any humans doing this. :(

Can't really prepare for that unless you switch to a different career... Ideally, with manual labor. As automation might be still too expensive :P

linsomniac|9 months ago

Do you think it could be that the people who find LLMs useless are (in large) not paying for the LLMs and therefore getting a poor experience, while the people who are more optimistic about the abilities are paying to obtain better tooling?

joshjob42|9 months ago

I mean, if you draw the scaling curves out and believe them, then sometime in the next 3-10 years, plausibly shorter, AIs will be able to achieve best-case human performance in everything able to be done with a computer and do it at 10-1000x less cost than a human, and shortly thereafter robots will be able to do something similar (though with a smaller delta in cost) for physical labor, and then shortly after that we get atomically precise manufacturing and post-scarcity. So the amount of stuff that amounts to nothing is plausibly every field of endeavor that isn't slightly advancing or delaying AI progress itself.

EGreg|9 months ago

Bro. Nothing can be done. What are you talking about? Humans will be replaced for everything, humor, relationships, even raising their own kids, everything can be trained and the AIs just keep improving.

saurik|9 months ago

I mean, didn't you just admit you are wrong? If we are talking 1-5 years out, that's not "current models".

Jordan-117|9 months ago

I recently needed to recommend some IAM permissions for an assistant on a hobby project; not complete access but just enough to do what was required. Was rusty with the console and didn't have direct access to it at the time, but figured it was a solid use case for LLMs since AWS is so ubiquitous and well-documented. I actually queried 4o, 3.7 Sonnet, and Gemini 2.5 for recommendations, stripped the list of duplicates, then passed the result to Gemini to vet and format as JSON. The result was perfectly formatted... and still contained a bunch of non-existent permissions. My first time being burned by a hallucination IRL, but just goes to show that even the latest models working in concert on a very well-defined problem space can screw up.

darepublic|9 months ago

Listen I don't blame any mortal being for not grokking the AWS and Google docs. They are a twisting labyrinth of pointers to pointers some of them deprecated though recommended by Google itself.

perching_aix|9 months ago

Sounds like a vague requirement, so I'd just generally point you towards the AWS managed policies summary [0] instead. Particularly the PowerUserAccess policy sounds fitting here [1] if the description for it doesn't raise any immediate flags. Alternatively, you could browse through the job function oriented policies [2] they have and see if you find a better fit. Can just click it together instead of bothering with the JSON. Though it sounds like you're past this problem by now.

[0] https://docs.aws.amazon.com/IAM/latest/UserGuide/access_poli...

[1] https://docs.aws.amazon.com/aws-managed-policy/latest/refere...

[2] https://docs.aws.amazon.com/IAM/latest/UserGuide/access_poli...

floydnoel|9 months ago

by asking three different models and then keeping everything single unique thing they gave you, i believe you actually maximized your chances of running into hallucinations.

instead of ignoring the duplicates, when i query different models, i use the duplicates as a signal that something might be more accurate. i wonder what your results might have looked like if you only kept the duplicated permissions and went from there.

dotancohen|9 months ago

AWS docs have (had) an embedded AI model that would do this perfectly. I suppose it had better training data, and the actual spec as a RAG.

mark_l_watson|9 months ago

I have a suggestion for you: Create a Gemini Gem for a programming language and put context info for library resources, examples of your coding style, etc.

I just dropped version 0.1 of my Gemini book, and I have an example for making a Gem (really simple to do); read online link:

https://leanpub.com/solo-ai/read

siscia|9 months ago

This problem have been solved by LSP (language server protocol), all we need is a small server behind MCP that can communicate LSP information back to the LLM and get the LLM to use by adding to the prompt something like: "check your API usage with the LSP"

The unfortunate state of open source funding makes buildings such simple tool a loosing adventure unfortunately.

satvikpendem|9 months ago

This already happens in agent modes in IDEs like Cursor or VSCode with Copilot, it can check for errors with the LSP.

doug_durham|9 months ago

If they never get good at abstraction or architecture they will still provide a tremendous amount of value. I have them do the parts of my job that I don't like. I like doing abstraction and architecture.

mynameisvlad|9 months ago

Sure, but that's not the problem people have with them nor the general criticism. It's that people without the knowledge to do abstraction and architecture don't realize the importance of these things and pretend that "vibe coding" is a reasonable alternative to a well-thought-out project.

codebolt|9 months ago

I've found they do a decent job searching for bugs now as well. Just yesterday I had a bug report on a component/page I wasn't familiar with in our Angular app. I simply described the issue as well as I could to Claude and asked politely for help figuring out the cause. It found the exact issue correctly on the first try and came up with a few different suggestions for how to fix it. The solutions weren't quite what I needed but it still saved me a bunch of time just figuring out the error.

M4v3R|9 months ago

That’s my experience as well. Many bugs involve typos, syntax issues or other small errors that LLMs are very good at catching.

yousif_123123|9 months ago

The opposite problem is also true. I was using it to edit code I had that was calling the new openai image API, which is slightly different from the dalle API. But Gemini was consistently "fixing" the OpenAI call even when I explained clearly not to do that since I'm using a new API design etc. Claude wasn't having that issue.

The models are very impressive. But issues like these still make me feel they are still more pattern matching (although there's also some magic, don't get me wrong) but not fully reasoning over everything correctly like you'd expect of a typical human reasoner.

disgruntledphd2|9 months ago

They are definitely pattern matching. Like, that's how we train them, and no matter how many layers of post training you add, you won't get too far from next token prediction.

And that's fine and useful.

toomuchtodo|9 months ago

It seems like the fix is straightforward (check the output against a machine readable spec before providing it to the user), but perhaps I am a rube. This is no different than me clicking through a search result to the underlying page to verify the veracity of the search result surfaced.

redox99|9 months ago

Making LLMs know what they don't know is a hard problem. Many attempts at making them refuse to answer what they don't know caused them to refuse to answer things they did in fact know.

Volundr|9 months ago

> Many attempts at making them refuse to answer what they don't know caused them to refuse to answer things they did in fact know.

Are we sure they know these things as opposed to being able to consistently guess correctly? With LLMs I'm not sure we even have a clear definition of what it means for it to "know" something.

rdtsc|9 months ago

> Making LLMs know what they don't know is a hard problem. Many attempts at making them refuse to answer what they don't know caused them to refuse to answer things they did in fact know.

They are the perfect "fake it till you make it" example cranked up to 11. They'll bullshit you, but will do it confidently and with proper grammar.

> Many attempts at making them refuse to answer what they don't know caused them to refuse to answer things they did in fact know.

I can see in some contexts that being desirable if it can be a parameter that can be tweaked. I guess it's not that easy, or we'd already have it.

bezier-curve|9 months ago

The best way around this is to dump documentation of the APIs you need them privy to into their context window.

mbesto|9 months ago

To date, LLMs can't replace the human element of:

- Determining what features to make for users

- Forecasting out a roadmap that are aligned to business goals

- Translating and prioritizing all of these to a developer (regardless of whether these developers are agentic or human)

Coincidentally these are the areas that frequently are the largest contributors to software businesses successes....not wether you use NextJs with a Go and Elixir backend against a multi-geo redundant multi sharded CockroachDB database, or that your code is clean/elegant.

dist-epoch|9 months ago

Maybe at elite companies.

At half of the companies you can randomly pick those three things and probably improve the situation. Using an AI would be a massive improvement.

nearbuy|9 months ago

What does it say when you ask it to?

jug|9 months ago

I’ve seen benchs on hallucinations and OpenAI has typically performed worse than Google and Anthropic models. Sometimes significantly so. But it doesn’t seem like they have cared much. I’ve suspected that LLM performance is correlated to risking hallucinations? That is, if they’re bolder, this can be beneficial? Which helps in other performance benchmarks. But of course at the risk of hallucinating more…

mountainriver|9 months ago

The hallucinations are a result of RLVR. We reward the model for an answer and then force it to reason about how to get there when the base model may not have that information.

ChocolateGod|9 months ago

I asked today both Claude and ChatGPT to fix a Grafana Loki query I was trying to build, both hallucinated functions that didn't exist, even when telling to use existing functions.

To my surprise, Gemini got it spot on first time.

fwip|9 months ago

Could be a bit of a "it's always in the last place you look" kind of thing - if Claude or CGPT had gotten it right, you wouldn't have tried Gemini.

Tainnor|9 months ago

I definitely get more use out of Gemini Pro than other models I've tried, but it's still very prone to bullshitting.

I asked it a complicated question about the Scala ZIO framework that involved subtyping, type inference, etc. - something that would definitely be hard to figure out just from reading the docs. The first answer it gave me was very detailed, very convincing and very wrong. Thankfully I noticed it myself and was able to re-prompt it and I got an answer that is probably right. So it was useful in the end, but only because I realised that the first answer was nonsense.

alex1138|9 months ago

The fact that SO much is only discovered after the fact by asking it "Are you sure?" is just insane

There has to be some kind of recursive error checking thing, or something

0x457|9 months ago

I've noticed that models that can search internet do it a lot less because I guess they can look up documentation? My annoyance now is that it doesn't take version into consideration.

tastysandwich|9 months ago

Re hallucinating APIs that don't exist - I find this with Golang sometimes. I wonder if it's because the training data doesn't just consist of all the docs and source code, but potentially feature proposals that never made it into the language.

Regexes are another area where I can't get much help from LLMs. If it's something common like a phone number, that's fine. But anything novel it seems to have trouble. It will spit out junk very confidently.

robinei|9 months ago

Since it's trained on a vast a mount of code (probably all publicly accessible Go code and more), it's seen a vast amount of different bespoke APIs for doing all kinds of things. I'm sure some of that will leak into the output from time to time. And to some extent can generalize, so it may just invent APIs.

jppittma|9 months ago

I've had great success by asking it to do project design first, compose the design into an artifact, and then asking it to consult the design artifact as it writes code.

epaga|9 months ago

This is a great idea - do you have a more detailed overview of this approach and/or an example? What types of things do you tell it to put into the "artefact"?

tough|9 months ago

You should give it docs for each of your base dependencies in a mcp/tool whatever so it can just consult.

internet also helps.

Also having markdown files with the stack etc and any -rules-

satvikpendem|9 months ago

If you use Cursor, you can use @Docs to let it index the documentation for the libraries and languages you use, so no hallucination happens.

Rudybega|9 months ago

The context7 mcp works similarly. It allows you to search a massive constantly updated database of relevant documentation for thousands of projects.

viraptor|9 months ago

> no amount of prompting will get current models to approach abstraction and architecture the way a person does.

What do you mean specifically? I found the "let's write a spec, let's make a plan, implement this step by step with testing" results in basically the same approach to design/architecture that I would take.

pzo|9 months ago

I feel your pain. Cursor has docs features but many times when I pointed to check @docs and selected one recently indexed one it sometimes still didn't get it. I still have to try contex7 mcp which looks promising:

https://github.com/upstash/context7

onlyrealcuzzo|9 months ago

2.5 pro seems like a huge improvement.

One area I've still noticed weakness is if you want to use a pretty popular library from one language in another language, it has a tendency to think the function signatures in the popular language match the other.

Naively, this seems like a hard problem to solve.

I.e. ask it how to use torchlib in Ruby instead of Python.

froh|9 months ago

searching and ranking existing fragments and recombining them within well known paths is one thing, exploratively combining existing fragments to completely novel solutions quickly runs into combinatorial explosion.

so it's a great tool in the hands of a creative architect, but it is not one in and by itself and I don't see yet how it can be.

my pet theory is that the human brain can't understand and formalize its creativity because you need a higher order logic to fully capture some other logic. I've been contested that the second Gödel incompleteness theorem "can't be applied like this to the brain" but I stubbornly insist yes, the brain implements _some_ formal system and it can't understand how that system works. tongue in cheek, somewhat, maybe.

but back to earth I agree llms are a great tool for a creative human mind.

breuleux|9 months ago

> I've been contested that the second Gödel incompleteness theorem "can't be applied like this to the brain" but I stubbornly insist yes, the brain implements _some_ formal system and it can't understand how that system works

I would argue that the second incompleteness theorem doesn't have much relevance to the human brain, because it is trying to prove a falsehood. The brain is blatantly not a consistent system. It is, however, paraconsistent: we are perfectly capable of managing a set of inconsistent premises and extracting useful insight from them. That's a good thing.

It's also true that we don't understand how our own brain works, of course.

ksec|9 months ago

I have been asking if AI without hallucination, coding or not is possible but so far with no real concrete answer.

Foreignborn|9 months ago

Try dropping the entire api docs in the context. If it’s verbose, i usually pull only a subset of pages.

Usually I’m using a minimum of 200k tokens to start with gemini 2.5.

pizza|9 months ago

"if it were a fact, it wouldn't be called intelligence" - donald rumsfeld

mattlondon|9 months ago

It's already much improved on the early days.

But I wonder when we'll be happy? Do we expect colleagues friends and family to be 100% laser-accurate 100% of the time? I'd wager we don't. Should we expect that from an artificial intelligence too?

ookblah|9 months ago

the future is probably something that looks pretty "inefficient" to us but a non-factor for a machine. i sometimes think a lot of our code structure is just for our own maintenance and conceptualization (DRY, SRP), but if you throw enough compute adn context at a problem im sure none of this even matters (as much).

at least for 90% of the CRUD apps out there, you can def abstract away the entire base framework of getting, listing, and updating records. i guess the problem is validating that data for use in other more complex workflows.

bruce511|9 months ago

I've spent my career writing code in a language which already abstracts 90% of a CRUD-type app away. Indeed there are a whole subset of users who literally don't write a line of code. We've had this since the very early 90's for DOS.

Of course that last 10% does a lot of heavy lifting. Domain expertise, program and database design, sales, support, actually processing the data for more than just simple reports, and so on.

And sure, the code is not maximally efficient in all cases, but it is consistent, and deterministic. Which is all I need from my code generator.

I see a lot of panic from programmers (outside our space) who worry about their futures. As if programming is the ultimate career goal. When really, writing code is the least interesting, and least valuable part of developing software.

Maybe LLMs will code software for you. Maybe they already do. And, yes, despite their mistakes it's very impressive. And yes, it will get better.

But they are miles away from replacing developers- unless your skillset is limited to "coding" there's no need to worry.

johnisgood|9 months ago

> hallucinate APIs

Tell me about it. Thankfully I have not experienced it as much with Claude as I did with GPT. It can get quite annoying. GPT kept telling me to use this and that and none of them were real projects.

impulser_|9 months ago

Use few-shot learning. Build a simple prompt with basic examples of how to use the API and it will do significantly better.

LLMs just guess, so you have to give it a cheatsheet to help it guess closer to what you want.

M4v3R|9 months ago

At this point the time it takes to teach the model might be more than you save from using it for interacting with that API.

rcpt|9 months ago

I'm using repomix for this

abletonlive|9 months ago

I feel like there are two realities right now where half the people say LLM doesn't do anything well and there is another half that's just using LLM to the max. Can everybody preface what stack they are using or what exactly they are doing so we can better determine why it's not working for you? Maybe even include what your expectations are? Maybe even tell us what models you're using? How are you prompting the models exactly?

I find for 90% of the things I'm doing LLM removes 90% of the starting friction and let me get to the part that I'm actually interested in. Of course I also develop professionally in a python stack and LLMs are 1 shotting a ton of stuff. My work is standard data pipelines and web apps.

I'm a tech lead at faang adjacent w/ 11YOE and the systems I work with are responsible for about half a billion dollars a year in transactions directly and growing. You could argue maybe my standards are lower than yours but I think if I was making deadly mistakes the company would have been on my ass by now or my peers would have caught them.

Everybody that I work with is getting valuable output from LLMs. We are using all the latest openAI models and have a business relationship with openAI. I don't think I'm even that good at prompting and mostly rely on "vibes". Half of the time I'm pointing the model to an example and telling it "in the style of X do X for me".

I feel like comments like these almost seem gaslight-y or maybe there's just a major expectation mismatch between people. Are you expecting LLMs to just do exactly what you say and your entire job is to sit back prompt the LLM? Maybe I'm just use to shit code but I've looked at many code bases and there is a huge variance in quality and the average is pretty poor. The average code that AI pumps out is much better.

oparin10|9 months ago

I've had the opposite experience. Despite trying various prompts and models, I'm still searching for that mythical 10x productivity boost others claim.

I use it mostly for Golang and Rust, I work building cloud infrastructure automation tools.

I'll try to give some examples, they may seem overly specific but it's the first things that popped into my head when thinking about the subject.

Personally, I found that LLMs consistently struggle with dependency injection patterns. They'll generate tightly coupled services that directly instantiate dependencies rather than accepting interfaces, making testing nearly impossible.

If I ask them to generate code and also their respective unit tests, they'll often just create a bunch of mocks or start importing mock libraries to compensate for their faulty implementation, rather than fixing the underlying architectural issues.

They consistently fail to understand architecture patterns, generating code where infrastructure concerns bleed into domain logic. When corrected, they'll make surface level changes while missing the fundamental design principle of accepting interfaces rather than concrete implementations, even when explicitly instructed that it should move things like side-effects to the application edges.

Despite tailoring prompts for different models based on guides and personal experience, I often spend 10+ minutes correcting the LLM's output when I could have written the functionality myself in half the time.

No, I'm not expecting LLMs to replace my job. I'm expecting them to produce code that follows fundamental design principles without requiring extensive rewriting. There's a vast middle ground between "LLMs do nothing well" and the productivity revolution being claimed.

That being said, I'm glad it's working out so well for you, I really wish I had the same experience.

thewebguyd|9 months ago

I've found, like you mentioned, that the tech stack you work with matters a lot in terms of successful results from LLMs.

Python is generally fine, as you've experienced, as is JavaScript/TypeScript & React.

I've had mixed results with C# and PowerShell. With PowerShell, hallucinations are still a big problem. Not sure if it's the Noun-Verb naming scheme of cmdlets, but most models still make up cmdlets that don't exist on the fly (though will correct itself once you correct it that it doesn't exist but at that point - why bother when I can just do it myself correctly the first time).

With C#, even with my existing code as context, it can't adhere to a consistent style, and can't handle nullable reference types (albeit, a relatively new feature in C#). It works, but I have to spend too much time correcting it.

Given my own experiences and the stacks I work with, I still won't trust an LLM in agent mode. I make heavy use of them as a better Google, especially since Google has gone to shit, and to bounce ideas off of, but I'll still write the code myself. I don't like reviewing code, and having LLMs write code for me just turns me into a full time code reviewer, not something I'm terribly interested in becoming.

I still get a lot of value out of the tools, but for me I'm still hesitant to unleash them on my code directly. I'll stick with the chat interface for now.

edit Golang is another language I've had problems relying on LLMs for. On the flip side, LLMs have been great for me with SQL and I'm grateful for that.

codexon|9 months ago

> I feel like there are two realities right now where half the people say LLM doesn't do anything well and there is another half that's just using LLM to the max. Can everybody preface what stack they are using or what exactly they are doing so we can better determine why it's not working for you? Maybe even include what your expectations are? Maybe even tell us what models you're using? How are you prompting the models exactly?

Just right now, I've been feeding o4-mini with high effort a C++ file with a deadlock in it.

It has failed to fix the problem after 3 times, and it introduced a double free bug in one of the attempts. It did not see the double free problem until I pointed it out.

thr0waway39290|9 months ago

Replacing stackoverflow is definitely helpful, but the best use case for me is how much it helps in high-level architecture and planning before starting a project.

thefourthchime|9 months ago

Ask the models that can search to double check their API usage. This can just be part of a pre-prompt.

bboygravity|9 months ago

This is hilarious to read if you have actually seen the average (embedded systems) production code written by humans.

Either you have no idea how terrible real world commercial software (architecture) is or you're vastly underestimating newer LLMs or both.

nurettin|9 months ago

Just tell it to cite docs when using functions, works wonders.

tiahura|9 months ago

Why not add the applicable api references as context?

pdntspa|9 months ago

I don't know about that, my own adventures with Gemini Pro 2.5 in Roo Code has it outputting code in a style that is very close to my own

While far from perfect for large projects, controlling the scope of individual requests (with orchestrator/boomerang mode, for example) seems to do wonders

Given the sheer, uh, variety of code I see day to day in an enterprise setting, maybe the problem isn't with Gemini?

gxs|9 months ago

Huh? Have you ever just told it, that API doesn’t exist, find another solution?

Never seen it fumble that around

Swear people act like humans themselves don’t ever need to be asked for clarification

mannycalavera42|9 months ago

same, I asked a simple question about javascript fetch api and it started talking about the workspace api. When I asked about that workspace api it replied it was the google workspace API ¯ \ _ (ツ) _ / ¯

SafeDusk|9 months ago

I’m having reasonable success specifically with Gemini model using only 7 tools: read, write, diff, browse, command, ask, think.

This minimal template might be helpful to you: https://github.com/aperoc/toolkami