(no title)
develoopest | 11 months ago
All the incredible performance and success stories always come from these Twitter posts, I do find value in asking simple but tedious task like a small refactor or generate commands, but this "AI takes the wheel level" does not feel real.
abxyz|11 months ago
Coders used to be more productive by using libraries (e.g: don't write your own function for finding the intersection of arrays, use intersection from Lodash) whereas now libraries have been replaced by LLMs. Programmers laughed at the absurdity of left-pad[1] ("why use a dependency for 16 lines of code?") whereas coders thought left-pad was great ("why write 16 lines of code myself?").
If you think about code as a means to an end, and focus on the end, you'll get much closer to the magical experience you see spoken about on Twitter, because their acceptance criteria is "good enough" not "right". Of course, if you're a programmer who cares about the artistry of programming, that feels like a betrayal.
[1] https://en.wikipedia.org/wiki/Npm_left-pad_incident
miki123211|11 months ago
I've been using Claude Code a lot recently, and it's doing amazing work, but it's not exactly what I want it to do.
I had to push it hard to refactor and simplify, as the code it generated was often far more complicated than it needed to be.
To be honest though, most of the code it generated I would accept if I was reviewing another developer's work.
I think that's the way we need to look at it. It's a junior developer that will complete our tasks, not always in our preferred way, but at 10x the speed, and frequently make mistakes that we need to point out in CR. It's not a tool which will do exactly what we would.
jmull|11 months ago
They write as much code as you want, and it often sorta works, but it’s a bug filled mess. It’s painstaking work to fix everything, on part with writing it yourself. Now, you can just leave it as-is, but what’s the use of releasing software that crappy?
I suppose it’s a revolution for that in-house crapware company IT groups create and foist on everyone who works there. But the software isn’t better, it just takes a day rather than 6 months (or 2 years or 5 years) to create. Come to think of it, it may not be useful for that either… I think the end-purpose is probably some kind of brag for the IT manger/exec, and once people realize how little effort is involved it won’t serve that purpose.
bikamonki|11 months ago
icedchai|11 months ago
beezlewax|11 months ago
someothherguyy|11 months ago
The problem with this is that you will never be able to modify the code in a meaningful way after it crosses a threshold, so either you'll have a prompt only modification ability, or you will just have to rewrite things from scratch.
I wrote my first application ever (equivalent to a education CMS today) in the very early 2000s with barely any notion of programming fundamentals. It was probably a couple hundred thousand lines of code by the time I abandoned it.
I wrote most of it in HTML, JS, ASP and SQL. I was in high school. I didn't know what common data structures were. I once asked a professor when I got into late high school "why arrays are necessary in loops".
We called this cookbook coding back in the day.
I was pretty much laughed at when I finally showed people my code, even though it was a completely functional application. I would say an LLM probably can do better, but it really doesn't seem like something we should be chasing.
oxag3n|11 months ago
And I don't believe until I see LLMs can use real debugger to figure out a root cause for a sophisticated, cascading bug.
mrits|11 months ago
bodhi_mind|11 months ago
roflyear|11 months ago
Sure, but man are there bugs.
nbardy|11 months ago
You can be over specified in your prompts and say exactly what types and algorithms you want if you’re opinionated.
I often write giant page long specs to get exactly the code I want.
It’s only 2x as fast as coding, but thinking in English is way better than coding.
throwaway2037|11 months ago
Also, if you cannot tell the difference between code written by an LLM or a human, what is the difference? This whole post is starting to feel like people with very strong (gaterkeeper'ish) views on hi-fi stereo equipment, coffee, wine, ... and programming. Or should I say "code-as-craft" <cringe>?
jv22222|11 months ago
gitgud|11 months ago
They’re synonymous words and mean the same thing right?
Person who writes logic for machines
BeetleB|11 months ago
Consider using Aider. It's a great tool and cheaper to use than Code.
Look at Aiders LLM leaderboard to figure out which LLMs to use.
Use its architect mode (although you can get quite fast without it - I personally haven't needed it).
Work incrementally.
I use at least 3 branches. My main one, a dev one and a debug one. I develop on dev. When I encounter a bug I switch to debug. The reason is it can produce a lot of code to fix a bug. It will write some code to fix it. That won't work. It will try again and write even more code. Repeat until fixed. But in the end I only needed a small subset of the new code. So you then revert all the changes and have it fix it again telling it the correct fix.
Don't debug on your dev branch.
Aider's auto committing is scary but really handy.
Limit your context to 25k.
Only add files that you think are necessary.
Combining the two: Don't have large files.
Add a Readme.md file. It will then update the file as it makes code changes. This can give you a glimpse of what it's trying to do and if it writes something problematic you know it's not properly understanding your goal.
Accept that it is not you and will write code differently from you. Think of it as a moderately experienced coder who is modifying the codebase. It's not going to follow all your conventions.
https://aider.chat/
https://aider.chat/docs/leaderboards/
majormajor|11 months ago
how big/complex does the codebase have to be for this to be for you to actually save time compared to just using a debugger and fixing it yourself directly? (I'm assuming here that bugs in smaller codebases are that much easier for a human to identify quickly)
geoka9|11 months ago
Can you provide a ballpark of what kind of $ costs we are talking here for using Aider with, say, Claude? (or any other provider that you think is better at the moment).
Say a run-of-the-mill bug-fixing session from your experience vs the most expensive one off the top of your head?
tptacek|11 months ago
ddanieltan|11 months ago
branko_d|11 months ago
Where AI shines for me is as a form of a semantic search engine or even a tutor of sorts. I can ask for the information that I need in a relatively complex way, and more often than not it will give me a decent summary and a list of "directions" to follow-up on. If anything, it'll give me proper technical terms, that I can feed into a traditional search engine for more info. But that's never the end of my investigation and I always try to confirm the information that it gives me by consulting other sources.
mentalgear|11 months ago
jofzar|11 months ago
Yoric|11 months ago
Code? Nope.
smallerfish|11 months ago
A few things to note:
a) Use the "Projects" feature in Claude web. The context makes a significant amount of difference in the output. Curate what it has in the context; prune out old versions of files and replace them. This is annoying UX, yes, but it'll give you results.
b) Use the project prompt to customize the response. E.g. I usually tell it not to give me redundant code that I already have. (Claude can otherwise be overly helpful and go on long riffs spitting out related code, quickly burning through your usage credits).
c) If the initial result doesn't work, give it feedback and tell it what's broken (build messages, descriptions of behavior, etc).
d) It's not perfect. Don't give up if you don't get perfection.
triyambakam|11 months ago
jacobedawson|11 months ago
cheema33|11 months ago
Same here. Most candidates I interviewed said they did not use AI for development work. And it showed. These guys were not well informed on modern tooling and frameworks. Many of them seemed stuck in/comfortable with their old way of doing things and resistant to learning anything new.
I even hired a couple of them, thinking that they could probably pick up these skills. That did not happen. I learned my lesson.
InvertedRhodium|11 months ago
1. 1st prompt is me describing what I want to build, what I know I want and any requirements or restrictions I'm aware of. Based on these requirements, ask a series of questions to produce a complete specification document.
2. Workshop the specification back and forward until I feel it's complete enough.
3. Ask the agent to implement the specification we came up with.
4. Tell the agent to implement Cursor Rules based on the specifications to ensure consistent implementation details in future LLM sessions.
I'd say it's pretty good 80% of the time. You definitely still need to understand the problem domain and be able to validate the work that's been produced but assuming you had some architectural guidelines you should be able to follow the code easily.
The Cursor Rules step makes all the difference in my experience. I picked most of this workflow up from here: https://ghuntley.com/stdlib/
Edit: A very helpful rule is to tell Cursor to always checkout a new branch based on the latest HEAD of master/main for all of it's work.
theshrike79|11 months ago
Cursor w/ Claude has a habit of running away on tangents instead of solving just the one problem, then I need to reject its changes and even roll back to a previous version.
With a proper specification as guideline it might stay on track a bit better.
slooonz|11 months ago
After interacting with this tool, I decided it would be nice if the tool could edit itself, so I asked (him ? it ?) to create its next version. It came up with a non-working version of this https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016.... I fixed the bug manually, but it started an interactive loop : I could now describe what I wanted, describe the bugs, and the tool will add the features/fix the bugs itself.
I decided to rewrite it in Typescript (by that I mean: can you rewrite yourself in typescript). And then add other tools (by that: create tools and unit tests for the tools). https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... and https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... have been created by the tool itself, without any manual fix from me. Setting up the testing/mock framework ? Done by the tool itself too.
In one day (and $20), I essentially had recreated claude-code. That I could improve just by asking "Please add feature XXX". $2 a feature, with unit tests, on average.
WD-42|11 months ago
This is why expectations are all out of whack.
Silhouette|11 months ago
That said - I'm wary of reading too much into results at this scale. There isn't enough code in such a simple application to need anything more sophisticated than churning out a few lines of boilerplate that produce the correct result.
It probably won't be practical for the current state of the art in code generators to write large-scale production applications for a while anyway just because of the amount of CPU time and RAM they'd need. But assuming we solve the performance issues one way or another eventually it will be interesting to see whether the same kind of code generators can cope with managing projects at larger scales where usually the hard problems have little to do with efficiently churning out boilerplate code.
matt_heimer|11 months ago
Google has gotten worse (or the internet has more garbage) so finding code an example is more difficult than it used to be. Now I ask an LLM for an example. Sometimes I have to ask for a refinement and and usually something is broken in the example but it takes less time to get the LLM produced example to work than it does to find a functional example using Google.
But the LLM has only replaced my previous Google usage, I didn't expect Google to develop my applications and I don't with LLMs.
ptmcc|11 months ago
deergomoo|11 months ago
The LLM was obviously much faster and the information was much higher density, but it had quite literally about a 20% rate of just making up APIs from my limited experiment. But I was very impressed with Kagi’s results and ended up signing up, now using it as my primary search engine.
layer8|11 months ago
jayd16|11 months ago
escapecharacter|11 months ago
But in my own work, those tasks are pretty rare, like 3 times a month? Often I start working on something, and the scope and definition of success changes while I'm in the midst of it. Or it turns out to be harder than expected and it makes sense to timebox it and do a quick search for workarounds.
As much as we joke about StackOverflow commenters sometimes telling a question-asker they shouldn't be doing what they're trying to do, you do actually want that (soft) pushback some of the time. Most modern LLMs will gleefully come up with a complete plan for how we're gonna get the square peg in the round hole.
philipswood|11 months ago
I once accidentally asked a local DeepSeek distilled model to do the wrong thing by accidentally copy pasting the wrong variable name.
It told me how to do it, and then asked me if I was sure.
My local DeepSeek R1 model (deepseek-r1:7b) saw me trying to do something stuupid (I was working with the wrong variable). It told me how to do what I asked and then asked:
> _Is this modification part of a larger change you're making to the code? I'd like to make sure we're not modifying something important that might have side effects._
Looking at its though process:
> _The user wants to modify this string by replacing "-input" with "-kb". But looking at the ARN structure, I recall that AWS S3 ARNs are usually in the form arn<:aws:1151472526310103070>s3:::bucket_name RegionalPart path. The part after the bucket name is typically s3:// followed by the object key._ > _Wait, maybe the user has a specific reason to make this change. Perhaps they're formatting or structuring something for a different purpose. They might not realize that directly modifying ARNs can cause issues elsewhere if not done correctly._
unknown|11 months ago
[deleted]
clusterhacks|11 months ago
My recent usage is oriented towards using pseudocode descriptions that closely map to Python to produce Python functions. I am very impressed with Claude 3.7's syntactic correctness when given a chunk of pseudocode that looks "python-y" to begin with.
My one concern is that much of my recent code requirements lack novelty. So there is a somewhat reasonable chance that the tool is just spitting out code it slurped somewhere in github or elsewhere in the larger Internet. Just this week, I gave Claude a relatively "anonymous" function in pseudocode, meaning variable names were not particularly descriptive with one tiny exception. However, Claude generated a situationally appropriate comment as part of the function definition. This was . . . surprising to me if somehow the model had NOT in its training set had some very close match to my pseudocode description that included enough context to add the comment.
doug_durham|11 months ago
csomar|11 months ago
I'm using Claude 3.7 now and while it improved on certain areas, it degraded on others (ie: it randomly removes/changes things more now).
namaria|11 months ago
LLMs are cool, machine learning is cooler. Still no 'AI' in sight.
julienmarie|11 months ago
The way I prompt it is first I write the documentation of the module I want, following the format I detailed inbthe master documents, and ask him to follow the documentation and specs.
I use cursor as well, but more as an assistant when I work on the architecture pieces.
But I would never let an AI the driver seat for building the architecture and making tech decisions.
crabl|11 months ago
simonw|11 months ago
Yup, that's our job as software engineers.
cglace|11 months ago
TylerLives|11 months ago
Balgair|11 months ago
But I'm in the more 'bad programmer/hacker' camp and think that LLMs are amazing and really helpful.
I know that one can post a link to the chat history. Can you do that for an example that you are comfortable sharing? I know that it may not be possible though or very time consuming.
What I'm trying to get at is: I suck at programming, I know that. And you probably suck a lot less. And if you say that LLMs are garbage, and I say they are great, I want to know where I'm getting the disconnect.
I'm sincerely, not trying to be a troll here, and I really do want to learn more.
Others are welcome to post examples and walk through them too.
Thanks for any help here.
vlod|11 months ago
Respectively, are you understanding what it produces or do you think that's its amazing because it produces something, that 'maybe' works.
Here's an e.g. I was just futzing with. I did a refactor of my code (typescript) and my test code broke (vitest) and for some reason it said 'mockResolvedValue()' is not a function. I've done this a gazillion times.
I allowed it via 3-4 iterations to try and fix it (I was being lazy and wanted my error to go away) and the amount of crap (rewriting tests, referenced code) it was producing was beyond ridiculous. (I was using github co-pilot).
Eventually I said "f.that for a game of soldiers" and used by brain. I forgot to uncomment a vi.mock() during the refactor.
I DO use it to fix stupid typescript errors (the error blob it dumps on you can be a real pita to process) and appreciate it when gives me a simple solution.
So I agree with quite a few comments here. I'm not ready to bend the knee to our AI Overloads.
mns|11 months ago
Then I went on github and found that it used some code written by someone in JS 7 years ago and just converted and extended it for my language, but that code was wrong and simply useless. We'll end up with people publishing exploits and various other security flaws in Github, these LLMs will get trained on that and people that have no clue what they are doing will push out code based on that. We're in for fun times ahead.
alexkwood|11 months ago
sovietmudkipz|11 months ago
Finding the right prompt to have current generation AI create the magic depicted in twitter posts may be a harder problem than most anticipate.
fullstackwife|11 months ago
belter|11 months ago
epolanski|11 months ago
For small stuff LLMs are actually great and often a lifesaver on legacy codebases, but that's more or less where it stops.
noufalibrahim|11 months ago
iambateman|11 months ago
I wrote a worksheet for Cursor and give it specific notes for how to accomplish the task in a particular case. Then let it run and it’s fairly successful.
Keep in mind…it’s never truly “hands off” for me. I still need to clean things up after it’s done. But it’s very good at figuring out how to filter the HTML down and parse out the data I need. Plus it writes good tests.
So my success story is that it takes 75% of the energy out of a task I find particularly tedious.
WD-42|11 months ago
onion2k|11 months ago
One of the biggest difficulties AI will face is getting developers to unlearn the idea that there's a right answer, and that of the many thousands of possible right answers, 'the code I would have written myself' is just one (or a few if you're one of the few great devs who don't stop thinking about approaches after your first attempt.)
rhubarbtree|11 months ago
I tried to get it to build a very simple version of an app I’ve been working on. But the basics didn’t work, and as I got it to fix some functionality other stuff broke. It repeatedly nuked the entire web app, then rolled back again and again. It tried quick and dirty solutions that would lead to dead ends in just a few more features. No sense of elegance or foundational abstractions.
The code it produced was actually OK, and I could have fixed the bugs given enough time, but overall the results were far inferior to every programmer I’ve ever worked with.
On the design side, the app was ugly as hell and I couldn’t get it to fix that at all.
Autocomplete on a local level seems far more useful.
_steve_yegge_|11 months ago
It seems like you would be the perfect audience for it. We're hoping the book can teach you what you need in order to have all those success stories yourself.
Gunnerhead|11 months ago
moomin|11 months ago
Delomomonl|11 months ago
Like a single page HTML J's page which does a few things and saves it state in local storage with a json backup feature (download the json).
I also enjoy it for doing things I don't care much but makes it more polished. Like I hate my basically empty readme with two commands. It looks ugly and when I come back to stuff like this a few days/weeks later I always hate it.
Claude just generates really good readmes.
I'm trying out Claude code right now and like it so far.
Kiro|11 months ago
dilap|11 months ago
I still find lots of use for LLMs authoring stuff at more like the function level. "I know I need exactly this."
Edit: I did however find it amazing for asking questions about sections of the code I did not write.
babyent|11 months ago
Every single time they were doing something simple.
Just because someone has decades of experience or is a SME in some niche doesn’t mean they’re actually good… engineers.
yodsanklai|11 months ago
This is already a productivity boost. I'm more and more impressed about what I can get out of these tools (as you said, simple but tedious things). ChatGPT4o (provided by company) does pretty complex things for me, and I use it more and more.
Actually, I noticed that when I can't use it (e.g. internal tools/languages), I'm pretty frustrated.
cglace|11 months ago
kolbe|11 months ago
nsonha|11 months ago
collingreen|11 months ago
That being said I appreciate your suggestion and will consider giving that a shot.
egorfine|11 months ago
ido|11 months ago
jayd16|11 months ago
I'm curious when we'll start seeing verifiable results like live code streams with impressive results or companies dominating the competition with AI built products.
unknown|11 months ago
[deleted]
Ancalagon|11 months ago
razemio|11 months ago
unknown|11 months ago
[deleted]
develoopest|11 months ago
gloosx|11 months ago
huvin99|11 months ago
timewizard|11 months ago
habinero|11 months ago
A lot -- and I mean a lot -- of people who hype it up are hobby or aspirational coders.
If you drill down on what exactly they use it for, they invariably don't write code in professional settings that will be maintained and which other humans have to read.
Everyone who does goes "eh, it's good for throwaway code or one offs and it's decent at code completion".
Then there's the "AGI will doom us all" cult weirdos, but we don't talk about them.
dgellow|11 months ago
darepublic|11 months ago
EigenLord|11 months ago