top | item 44164365

(no title)

anxoo | 9 months ago

name 5 tasks which you think current AIs can't do. then go and spend 30 minutes seeing how current AIs can do on them. write it on a sticky note and put it somewhere that you'll see it.

otherwise, yes, you'll continue to be irritated by AI hype, maybe up until the point where our civilization starts going off the rails

discuss

order

TheRoque|9 months ago

Well, I'll try to do a sticky note here:

- they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient

- they fail at doing clean DRY practices even though they are supposed to skim through the codebase much faster than me

- they bait me into inexisting apis, or hallucinate solutions or issues

- they cannot properly pick the context and the files to read in a mid-size app

- they suggest to download some random packages, sometimes low quality ones, or unmaintained ones

simonw|9 months ago

"they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient"

That's mostly solved by the most recent ones that can run searches. I've had great results from o4-mini for this, since it can search for the latest updates - example here: https://simonwillison.net/2025/Apr/21/ai-assisted-search/#la...

Or for a lot of libraries you can dump the ENTIRE latest version into the prompt - I do this a lot with the Google Gemini 2.5 models since those can handle up to 1m tokens of input.

"they fail at doing clean DRY practices" - tell them to DRY in your prompt.

"they bait me into inexisting apis, or hallucinate solutions or issues" - really not an issue if you're actually testing your code! I wrote about that one here: https://simonwillison.net/2025/Mar/2/hallucinations-in-code/ - and if you're using one of the systems that runs your code for you (as promoted in tptacek's post) it will spot and fix these without you even needing to intervene.

"they cannot properly pick the context and the files to read in a mid-size app" - try Claude Code. It has a whole mechanism dedicated to doing just that, I reverse-engineered it this morning: https://simonwillison.net/2025/Jun/2/claude-trace/

"they suggest to download some random packages, sometimes low quality ones, or unmaintained ones" - yes, they absolutely do that. You need to maintain editorial control over what dependencies you add.

agotterer|9 months ago

> they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient

This is where collaboration comes in play. If you solely rely on the LLM to “vibe code” everything, then you’re right, you get whatever it thinks is best at the time of generation. That could be wrong or outdated.

My workflow is to first provide clear requirements, generally one objective at a time. Sometimes I use an llm to format the requirements for the llm to generate code from. It then writes some code, and I review it. If I notice something is outdated I give it a link to the docs and tell it to update it using X. A few seconds later it’s made the change. I did this just yesterday when building out an integration with an api. Claude wrote the code using a batch endpoint because the steaming endpoint was just released and I don’t think it was aware of it. My role in this collaboration, is to be aware of what’s possible and how I want it to work (e.g.. being aware of the latest features and updates of the frameworks and libraries). Then it’s just about prompting and directing the llm until it works the way I want. When it’s really not working, then I jump in.

bdangubic|9 months ago

they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient

of course they can, teach them / feed them latest changes or whatever you need (much like another developer unaware of the same thing)

they fail at doing clean DRY practices even though they are supposed to skim through the codebase much faster than me

tell them it is not DRY until they make it DRY. for some (several projects I’ve been involved with) DRY is generally anti-pattern when taken to extremes (abstraction gone awry etc…). instruct it what you expect and it and watch it deliver (much like you would another developer…)

they bait me into inexisting apis, or hallucinate solutions or issues

tell it when it hallucinates, it’ll correct itself

they cannot properly pick the context and the files to read in a mid-size app

provide it with context (you should always do this anyways)

they suggest to download some random packages, sometimes low quality ones, or unmaintained ones

tell it about it, it will correct itself

apwell23|9 months ago

> - they bait me into inexisting apis, or hallucinate solutions or issues

yes. this happens to me almost every time i use it. I feel like a crazy person reading all the AI hype.

motza|9 months ago

I have definitely noticed these as well. Have you ever tried prompting these issues away? I'm thinking this might be a good list to add to every coding prompt

bradfa|9 months ago

They also can’t hold copyright on their creations.

alisonatwork|9 months ago

The problem with AI hype is not really about whether a particular model can - in the abstract - solve a particular programming problem. The problem with AI hype is that it is selling a future where all software development companies become entirely dependent on closed systems.

All of the state-of-the-art models are online models - you have no choice, you have to pay for a black box subscription service controlled by one of a handful of third-party gatekeepers. What used to be a cost center that was inside your company is now a cost center outside your company, and thus it is a risk to become dependent on it. Perhaps the risk is worthwhile, perhaps not, but the hype is saying that real soon now it will be impossible to not become dependent on these closed systems and still exist as a viable company.

apwell23|9 months ago

> name 5 tasks which you think current AIs can't do.

For coding it seems to back itself into a corner and never recover from it until i "reset" it .

AI can't write software without an expert guiding it. I cannot open a non trivial PR to postgres tonight using AI.

simonw|9 months ago

"AI can't write software without an expert guiding it. I cannot open a non trivial PR to postgres tonight using AI."

100% true, but is that really what it would take for this to be useful today?

poincaredisk|9 months ago

1. create a working (moderately complex) ghidra script without hallucinating.

Granted I was trying to do this 6 months ago, but maybe a miracle has happened. But I'm the past I had very bad experience with using LLMs for niche things (i.e. things that were never mentioned on stackoverflow)

simonw|9 months ago

I've never heard of Ghidra before but, in case you're interested, I ran that prompt through OpenAI's o3 and Anthropic's Claude Opus 4 for you just now (both of them the latest/greatest models from those vendors and new as of less than six months ago) - results here: https://chatgpt.com/share/683e3e38-cfd0-8006-9e49-2aa799dac4... and https://claude.ai/share/7a076ca1-0dee-4b32-9c82-8a5fd3beb967

I have no way of evaluating these myself so they might just be garbage slop.

AtlasBarfed|9 months ago

Everyone keeps thinking AI improvement is linear. I don't know if this is correct, but it's just my basic impression that the current AI boost came from instead of being limiting yourself to the CPU and its throughput adding the massive amount of computing power in graphics cards.

But for each nine of reliability you want out of llms everyone's assuming it's just a linear growth. I don't think it is. I think it's polynomial at least.

As for your tasks and maybe it's just cuz I'm using chat GPT, but I asked it to Port sed, something with full open source code availability, tons of examples/test cases, a fully documented user interface and I wanted it moved to Java as a library.

And it failed pretty spectacularly. Yeah it got the very very very basic functionality of sed.

kaydub|9 months ago

Of course it didn't port sed like that. It doesn't matter that it's open source with tons of examples/test cases. It's not going to go read all the code and change it to a different language. It can pick out what sed's purpose is and it built it for you in the language you asked.

chinchilla2020|9 months ago

If AI can do anything, why can't I just prompt "Here is sudo access to my laptop, please do all my work for me, respond to emails, manage my household budget, and manage my meetings".

I've tried everything. I have four AI agents. They still have an accuracy rate of about 50%.

ipaddr|9 months ago

Make me a million dollars

Tell me about this specific person who isn't famous

Create a facebook clone

Recreate Windows including drivers

Create a way to transport matter like in Star Trek.

I'll see you in 6 months.