Absolutely LLMs are great for greenfield projects. They can get you to a prototype for a new idea faster than any tool yet invented. Where they start to break down, I find, is when you ask it to make changes/refactors to existing code and mature projects. They usually lack context, so they doesn't hesitate to introduce lots of extra complexity, add frameworks you don't need, and in general just make the situation worse. Or if they get you to some solution it will have taken so long that you might as well have just done the heavy lifting yourself. LLMs are still no substitute for actually understanding your code.
My experience to date across the major LLMs is that they are quick to leap to complex solutions, and I find that the code often is much harder to maintain than if I were to do it myself.
But complex code is only part of the problem. Another huge problem I see is the rapid accumulation of technical debt. LLMs will confidently generate massive amounts of code with abstractions and design patterns that may be a good fit in isolation, but are absolutely the wrong pattern for problem you're trying to solve or the system you're trying to build. You run into the "existing code pattern" problem that Sandi Metz talked about in her fantastic 2014 RailsConf talk, "All the little things" [0]:
> "We have a bargain to follow the pattern, and if the pattern is a good one then the code gets better. If the pattern is a bad one, then we exacerbate the problem."
Rapidly generating massive amounts of code with the wrong abstractions and design patterns is insidious because it feels like incredible productivity. You see it all the time in posts on e.g. Twitter or LinkedIn. People gushing about how quickly they are shipping products with minimal to zero other humans involved. But there is no shortcut to understanding or maintainability if you care about building sustainable software for the medium to long-term.
I would recommend everyone reading this to think of it as a skill issue. You can learn to use the LLM/agent to document your code base, test isolated components and refactor your spaghetti into modular chunks easily understandable by the agent.
The greenfield projects turn into a mess very quick because if you let it code without any guidance (wrt documentation, interactivity, testability, modularity) it generates crap until you can't modify it. The greenfield project turns into legacy as fast as the agent can spit out new code.
LLM coding is quite well-suited to projects that apply the Unix philosophy. Highly modular systems that can be broken into small components that do one thing well.
You are right but that's also a good indication that the codebase itself is too complex and at a certain size / scale, too much for a human to reason over even or even if something a human could do, is not efficient doing so.
You should not be structuring the code for a LLM alone but I have found that trying to be very modular has helped both my code as well as the ability to utilize LLM on top of it.
I've found they will also introduce subtle changes. I just used o1 recently to pull code from a Python notebook and remove all the intermediate output. It basically got it right except for one string that was used to look up info from an external source. It just dropped 2 characters from the end of the string. That issue required a bit of time to track down because I thought it was an issue with the test environment!
Eventually I ended up looking at the notebook and the extracted code side-by-side and carefully checking every line. Despite being split across dozens of cells, it would have been faster if I had started out by just manually copying the code out of each meaningful cell and pasted it all together.
The first part of this, where you told it to ask YOU questions, rather than laboriously building prompts and context yourself was the magic ticket for me. And I doubt I would have stumbled on that sorta inverse logic on my own. Really great write up!
This is the key to a lot of my workflows as well. I'll usually tack some form of "ask me up to 5 questions to improve your understanding of what I'm trying to do here" onto the end of my initial messages. Over time I've noticed patterns in information I tend to leave out which has helped me improve my initial prompts, plus it often gets me thinking about aspects I hadn't considered yet.
The example prompts are useful. They not only reduced the activation energy required for me to start installing this habit in my personal workflows, but also inspired the notion that I can build a library of good prompts and easily implement them by turning them into TextExpander snippets.
P.S.: Extra credit for the Insane Clown Posse reference!
That's one of the key wins with o1-pro's deep research feature. The first thing it tends to do after you send a new prompt is ask you several questions, and they tend to be good ones.
One idea I really like here is asking the model to generate a todo list.
I add something like “ask me any clarifying questions” to my my initial prompts. For larger requests, it seems to start a dialogue of refinement before providing solutions.
That lonely/downtime section at the end is a giant red flag for me.
It looks like the sort of nonproductive yak-shaving you do when you're stuck or avoiding an unpleasant task--coasting, fooling around incrementally with your LLM because your project's fucked and you psychologically need some sense of progress.
The opposite of this is burnout--one of the things they don't tell you about successful projects with good tools is they induce much more burnout than doomed projects. There's a sort of Amdahl's Law in effect, where all the tooling just gives you more time to focus on the actual fundamentals of the product/project/problem you’re trying to address, which is stressful and mentally taxing even when it works.
Fucking around with LLM coding tools, otoh, is very fun, and like constantly clean-rebuilding your whole (doomed) project, gives you both some downtime and a sense of forward momentum--look how much the computer is chugging!
The reality testing to see if the tool is really helping is to sit down with a concrete goal and a (near) hard deadline. Every time I've tried to use an LLM under these conditions it just fails catastrophically--I don't just get stuck, I realize how basically every implicit decision embedded in the LLM output has an unacceptably high likelihood of being wrong, and I have an amount of debug cycles ahead of me exceeding the time to throw it all away and do it without the LLM by, like, an order of magnitude.
I'm not an LLM-coding hater and I've been doing AI stuff that's worked for decades, but current offerings I've tried aren't even close to productive compared to searching for code that already exists on the web.
Seriously!! Coding with LLM's is marketed as a huge time saver, but every time I've tried, it hasn't been. I'm told I just need to put in the time (ironic, no?) to learn to use the LLM properly. Why don't I just use that time to learn to write code better myself?
I guess you're not a big fan of rubber duck debugging then? Whenever I get stuck I like to ask myself a bunch of questions and thought experiments to get a better understanding of the problem/project, and with LLMs I'm forced to spell out each one of these questions/experiments coherently, which ends up being great documentation later on. I think LLMs are great if you're actually interested in the fundamentals of your problem/project, otherwise it just turns into a sinkhole that sucks you in.
This is the first article I’ve come across that truly utilizes LLMs in a workflow the right way. I appreciate the time and effort the author put into breaking this down.
I believe most people who struggle to be productive with language models simply haven’t put in the necessary practice to communicate effectively with AI. The issue isn’t with the intelligence of the models—it’s that humans are still learning how to use this tool properly. It’s clear that the author has spent time mastering the art of communicating with LLMs. Many of the conclusions in this post feel obvious once you’ve developed an understanding of how these models "think" and how to work within their constraints.
I’m a huge fan of the workflow described here, and I’ll definitely be looking into AIder and repomix. I’ve had a lot of success using a similar approach with Cursor in Composer Agent mode, where Claude-3.5-sonnet acts as my "code implementer." I strategize with larger reasoning models (like o1-pro, o3-mini-high, etc.) and delegate execution to Claude, which excels at making inline code edits. While it’s not perfect, the time savings far outweigh the effort required to review an "AI Pull Request."
Maximizing efficiency in this kind of workflow requires a few key things:
- High typing speed – Minimizing time spent writing prompts means maximizing time generating useful code.
- A strong intuition for "what’s right" vs. "what’s wrong" – This will probably become less relevant as models improve, but for now, good judgment is crucial.
- Familiarity with each model’s strengths and weaknesses – This only comes with hands-on experience.
Right now, LLMs don’t work flawlessly out of the box for everyone, and I think that’s where a lot of the complaints come from—the "AI haterade" crowd expects perfection without adaptation.
For what it’s worth, I’ve built large-scale production applications using these techniques while writing minimal human code myself.
Most of my experience using these workflows has been in the web dev domain, where there's an abundance of training data. That said, I’ve also worked in lower-level programming and language design, so I can understand why some people might not find models up to par in every scenario, particularly in niche domains.
Has anyone who evolved from a baseline of just using Cursor chat and freestyling to a proper workflow like this got any anecdata to share on noticeable improvements?
Does the time invested into the planning benefit you? Have you noticed less hallucinations? Have you saved time overall?
I’d be curious to hear because my current workflow is basically
Aider + AI generated maps and user guides for internal modules has worked well for me. Just today I did my own version of a script that uses Gemini 2 Flash (1M context window) to generate maps of each module in my codebase, i.e. a short one or two sentence description of what's in every file. Aider's repo maps don't work well for me, so I disable them, and I think this will work better.
I also have a scratchpad file that I tell the model it can update to reflect anything new it learns, so that gives it a crude form of memory as it works on the codebase. This does help it use internal utility APIs.
- small .cursorrules file explaining what I am trying to build and why at a very high level and my tech stack
- a DEVELOPMENT.md file which is just a to-do/issue list for me that I tell cursor to update before every commit
- a temp/ directory where I dump contextual md and txt files (chat logs discussing feature, more detailed issue specs, etc.)
- a separate snippet management app that has my commonly used request snippets (write commit message, ask me clarify questions, update README, summarize chat for new session, etc.)
Most of these workflows are just context management workflows and in Cursor it's so simple to manage the context.
For large files I just highlight the code and cmd+L. For short files, I just add them all by using /+downarrow
I constantly feed context like this and then usually come to a good solution for both legacy and greenfield features/products.
If I don't come to a good solution it's almost always because I didn't think through my prompt well enough and/or I didn't provide the correct context.
Something I quickly learned while retooling this past week is that it’s preferable not to add opinionated frameworks to the project as they increase the size of the context the model should be aware of. This context will also not likely be available in the training data.
For example, rather than using Plasmo for its browser extension boilerplate and packaging utilities, I’ve chosen to ask the LLM to setup all of that for me as it won’t have any blindspots when tasked with debugging.
It's not just frameworks – I noticed this recently when starting a new project and utilizing EdgeDB. They have their own Typescript query builder, and [insert LLM] cannot write correct constructions with that query builder to save its life.
This is all fine for a solo dev, but how does this work with a team / squad, working on the same code base?
Having 7 different instances of an LLM analyzing the same code base and making suggestions would not just be economically wasteful, it would also be unpractical or even dangerous?
Outside of RAG, which is a different thing, are there products that somehow "centralize" the context for a team, where all questions refer to the same codebase?
I've only recently switched to Cursor so am not clued up about everything, but they mention that the embedded indexing they do on your code is shared with others (others who have access to that repository? Unsure).
It did seem to take a while to index, even though my colleagues had been using Cursor for a while, so I'm likely misunderstanding something.
I fix errors myself, because LLMs are capable of producing large chunks of really stupid/wrong code, which needs to be reverted, and thats why it makes sense to see the code at least once.
Also I used to find myself in a situation when I tried to use LLM for the sake of using LLM to write code (waste of time)
Would be great if there were more details on the costs of doing this work - especially when loading lots of tokens of context via repo mix and then generating code with context (context-loaded inference API calls are more expensive, correct?). A dedicated post discussing this and related considerations would be even better. Are there cost estimations in the tools like aider (vs just refreshing the LLM platform’s billing dashboard?)
I have been using LLM for a long time but these prompts ideas are fantastic; they really opened up a world for me.
Because a lot of the benefits of LLM is bringing ideas or questions I am not thinking of right now, and this really does that. Typically this would happen as I dig through a topic, not before hand. So that's a net benefit.
I also tried it and it worked a charm, the LLM did respect context and the step by step approach, poking holes in my ideas. Amazing work.
I still like writing codes and solving puzzles in my mind so I won't be doing the "execution" part. From there on, I mostly use LLM as auto complete and I'm stuck here or obscure bug solving. Otherwise, I don't get any satisfaction from programming, having learnt nothing.
It will work - you can see it well with a Chain of Thought (CoT) model: it will keep asking itself: "am I hallucinating? let's double check" and then will self-reject thoughts if it can't find a proper grounding. In fact, this is the best part of CoT model, that you can see where it goes off rails and can add a message to fix it in the prompt.
For example, there is this common challenge, "count how many r letters in strawberry", and you can see the issue is not counting, but that model does not know if "rr" should be treated as single "r" because it is not sure if you are counting r "letters" or r "sounds" and when you sound out the word, there is a single "r" sound where it is spelled with double "r". so if you tell the model, double "r" stands for 2 letters, it will get it right.
I don't know about this specific technique, but I have found it useful to add a line like 'it's OK if you don't know or this isn't possible' at the end of queries. Otherwise LLMs have a tendency to tilt at whatever windmill you give them. Managing tone and expectations with them is a subtle but important art.
Am I the only one that doesn’t see the hype with Claude? I recently tried it, hit the usage limit, read around found tons of blogs and posts from devs saying Claude is the best code assistant LLM… so I purchased Claude pro… and I hate it. I have been asking it surface level questions about Apache spark (configuring the number of tasks retries, errors, error handling, etc.) and it hallucinated so much, so frequently. It reminds me of like ChatGPT 3…
What am I doing wrong or what am I missing? My experience has been so underwhelming I just don’t understand the hype for why people use Claude over something else.
Sorry I know there are many models out there, and Claude is probably better than 99% of them. Can someone help me understand the value of it over o1/o3? I honestly feel like I like 4o better.
It seems like you might be trying to use it like a search engine, which is a common mistake people make when first trying LLMs. LLMs are not like Google.
The key is to give it context so it can help you. For example, if you want it to help you with Spark configuration, give it the Spark docs. If you want it to help you write code, give it your codebase.
Tools like cursor and the like make this process very easy. You can also set up a local MCP server so the LLM can get the context and tools it needs on its own.
Claude is exceptionally good at taking "here is two paragraphs of me rambling off about a feature I want in broken voice to text" and actually understanding what you want. It has really good prompt adherence but at the same time knows when to read between the lines.
I think LLM codegen still requires a mental model of the problem domain. I wonder how many upcoming devs will simply never develop one. Calculators are tools for engineers /and/ way too many people can't even do basic receipt math.
Calculations are for calculators. I was good at math in school but now I struggle / take so much time doing receipt math and for what? What's the purpose of the time you spend doing it, when do you need to have your brain trained for this specific task?
Something I’ve started to do recently is mob programming with LLM’s.
I act as the director, creativity and ideas person. I have one LLM that implements, and a second LLM that critiques and suggests improvements and alternatives.
I find making the LLMs think and plan the project a bit worrying, I understand this helps with procrastination but when these systems eventually get better and more integrated, the most likely thing to happen to software devs is them moving away from purely coding to more of a solution architect role (aka Planning stuff), not taking into account the negative impact of giving up critical thinking to LLMs.
I actually think it is going to be way worse than you are suggesting. I think that the LLM codegen is going to replace most if not all of software eng workflow and teams that we see today.
Software is going to be prompt wrangling with some acceptance testing. Then just prompt wrangling.
I don't have a lot of hope for the software profession to survive.
This is pretty much my flow that I landed on as well. Dump existing relevant files into context, explain what we are trying to achieve, and ask it to analyze various approaches, considerations, and ask clarifying questions. Once we both align on direction, I ask for a plan of all files to be created/modified in dependency order with descriptions of the required changes. Once we align on the plan I say lets proceed one file at a time, that way I can ensure each file builds on the previous one and I can adjust as needed.
> I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience. There is so much opportunity to fix this and make it amazing.
This i think is the grand vision -- what could it look like?
in my mind programming should look like a map -- you can go anywhere, and there'll be things happening. and multiple people.
If anyone wants to work on this (or have comments, hit me up!)
This is a great article -- I really appreciate the author giving specific examples. I have never heard of mise (https://mise.jdx.dev/) before either, and the integration with the saved prompts is a nifty idea -- excited to try it out!
If I have to go to this much effort, what is AI buying us here? Why don't we just put the effort in to learn to write code ourselves? Instead of troubleshooting AI problems and coming up with clever workarounds for those problems, troubleshoot your code, solve those problems directly!
the speed is way faster. I am a good programmer with more than 25 years of professional experience. The AI is a better programmer in every way. Why do it myself when i can outsource it and play cookie clicker?
The real thing that sold me is the entire workflow takes 10 minutes to plan, and then 10-15 minutes to execute (let's say a python script of medium complexity). after a solid ~20-30 min I am largely done. no debugging necessary.
it would have taken me an hour or two to do the same script.
this means i can spend a lot more time with the fam, hacking on more things, and messing about.
Great write up! Roughly how many Claude tokens are you using per month with this workflow? What’s your monthly API costs?
Also what do you mean by “I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience.” ?
Total tokens in: 26,729,994
Total tokens out: 1,553,284
Last month anthropic bill was $89.30
--
I want to program with a team, together. not a group of people individually coding with an agent, and then managing merges. I have been playing a lot with merging team context - but haven't gotten too far yet.
I don’t mind LLMs, but what irks me is the customer noncompete, you have these systems that can do almost anything and the legal terms explicitly say you’re not allowed to use the thing for anything that competes with the thing. But if the things can do almost anything then you really can’t use it for anything. Making a game with Grok? No, that competes with the xAI game studio. Making an agents framework with ChatGPT? No, that competes with Swarm. Making legal AI with Claude? No, that competes with Claude. Seems like the only American companies making AI we can actually use for work are HuggingFace and Meta.
in my experience, there is quite a spectrum of legacy code.
Legacy modern code would be anything from the last 5-10 years.
Vintage Pioneer code (which i have both initialized, and maintained) is more than 20 years old.
I am trying not to be a vintage pioneer these days.
About the first step, you probably also need some kind of context that the LLM has most information to iterate with you about the new feature idea.
so either you put the whole codebase into the context (will mostly lead to problems as tokens are limited) or you have some kind of summary with your current features etc.
Or you do some kind of "black box" iterations, which I feel won’t be that useful for new features, as the model should know about current features etc?
Given a 3 648 318 tokens repository (number from Repomix), I'm not sure what would be the cost of using a leading LLM to analyse it and ask improvements.
Isn't the input token number way more limited than that ?
This is part is unclear to me in the "non-Greenfield" part of the article.
Iterating with aider on very limited scopes is easy, I've used it often. But what about understanding a whole repository and act on it ? Following imports to understand a Typescript codebase as a whole ?
Well, do you as a human have the whole codebase loaded in to your memory with the ability to mentally reason with it? No, you work on a small scope at a time.
This is effing great...thanks for sharing your experience.
I was just wondering how to give my edits back to in-browser tools like Claude or ChatGPT, but the idea of repo mix is great, will try!
Although I have been flying bit with copilot in vscode, so right now I have essentially two AI, one for larger changes (in the browser), and then minor code fixes (in vscode).
In our company we are only allowed to use GitHub Copilot with GPT or Claude, but not Claude directly. I'm quite struggling with getting good results from it, so I'll try to adapt your workflow into that setup. To the community: Do you have some additional guidance for that setup?
I'm curious to see his mise tasks. He lists a few of them near the end but I'm not sure what his LLM CLI is. Is that an actual tool or is he using it as a placeholder for "insert your LLM CLI tool here"?
Also, don't forget that your favorite AI tools can be of great help with the factors that cause us to make software: research, subject expertise, marketing, business planning, etc.
i really wanted to use DSPy to generate prompts, but it wasn't quite as compatible with my workflow as i wanted. I love the idea tho - code instead of strings.
the prompts are generated from the planning steps. If you were to follow the prompts in the planning phase, you would get output that is clearly the "starting prompt"
that is the first thing you send to aider.
also - there was a joke below, but you can do --yes-always and it will not ask for confirmation. I find it does a pretty good job.
Yeah, everyone is saying this article is great, but it's akin to "then draw rest of the owl". It's very detailed in planning, and then for execution just "paste prompt into claude". What prompt?
Is that correct? Never heard the expression before, but as a skier if you're over your skis you're in control of them, while if you're backseated the skis will control you.
briga|1 year ago
wilkystyle|1 year ago
My experience to date across the major LLMs is that they are quick to leap to complex solutions, and I find that the code often is much harder to maintain than if I were to do it myself.
But complex code is only part of the problem. Another huge problem I see is the rapid accumulation of technical debt. LLMs will confidently generate massive amounts of code with abstractions and design patterns that may be a good fit in isolation, but are absolutely the wrong pattern for problem you're trying to solve or the system you're trying to build. You run into the "existing code pattern" problem that Sandi Metz talked about in her fantastic 2014 RailsConf talk, "All the little things" [0]:
> "We have a bargain to follow the pattern, and if the pattern is a good one then the code gets better. If the pattern is a bad one, then we exacerbate the problem."
Rapidly generating massive amounts of code with the wrong abstractions and design patterns is insidious because it feels like incredible productivity. You see it all the time in posts on e.g. Twitter or LinkedIn. People gushing about how quickly they are shipping products with minimal to zero other humans involved. But there is no shortcut to understanding or maintainability if you care about building sustainable software for the medium to long-term.
EDIT: Forgot to add link
[0] https://www.youtube.com/watch?v=8bZh5LMaSmE&t=8m11s
jrvarela56|1 year ago
The greenfield projects turn into a mess very quick because if you let it code without any guidance (wrt documentation, interactivity, testability, modularity) it generates crap until you can't modify it. The greenfield project turns into legacy as fast as the agent can spit out new code.
HPsquared|1 year ago
infecto|1 year ago
You should not be structuring the code for a LLM alone but I have found that trying to be very modular has helped both my code as well as the ability to utilize LLM on top of it.
ziml77|1 year ago
Eventually I ended up looking at the notebook and the extracted code side-by-side and carefully checking every line. Despite being split across dozens of cells, it would have been faster if I had started out by just manually copying the code out of each meaningful cell and pasted it all together.
sejje|1 year ago
jrexilius|1 year ago
danphilibin|1 year ago
treetalker|1 year ago
The example prompts are useful. They not only reduced the activation energy required for me to start installing this habit in my personal workflows, but also inspired the notion that I can build a library of good prompts and easily implement them by turning them into TextExpander snippets.
P.S.: Extra credit for the Insane Clown Posse reference!
CamperBob2|1 year ago
One idea I really like here is asking the model to generate a todo list.
nijaru|1 year ago
theturtle32|1 year ago
bcoates|1 year ago
It looks like the sort of nonproductive yak-shaving you do when you're stuck or avoiding an unpleasant task--coasting, fooling around incrementally with your LLM because your project's fucked and you psychologically need some sense of progress.
The opposite of this is burnout--one of the things they don't tell you about successful projects with good tools is they induce much more burnout than doomed projects. There's a sort of Amdahl's Law in effect, where all the tooling just gives you more time to focus on the actual fundamentals of the product/project/problem you’re trying to address, which is stressful and mentally taxing even when it works.
Fucking around with LLM coding tools, otoh, is very fun, and like constantly clean-rebuilding your whole (doomed) project, gives you both some downtime and a sense of forward momentum--look how much the computer is chugging!
The reality testing to see if the tool is really helping is to sit down with a concrete goal and a (near) hard deadline. Every time I've tried to use an LLM under these conditions it just fails catastrophically--I don't just get stuck, I realize how basically every implicit decision embedded in the LLM output has an unacceptably high likelihood of being wrong, and I have an amount of debug cycles ahead of me exceeding the time to throw it all away and do it without the LLM by, like, an order of magnitude.
I'm not an LLM-coding hater and I've been doing AI stuff that's worked for decades, but current offerings I've tried aren't even close to productive compared to searching for code that already exists on the web.
getnormality|1 year ago
krupan|1 year ago
khqc|1 year ago
triyambakam|1 year ago
flir|1 year ago
ouch. You've thought about this, haven't you? Your ideas are intriguing to me, and I wish to subscribe to your newsletter.
rotcev|1 year ago
I believe most people who struggle to be productive with language models simply haven’t put in the necessary practice to communicate effectively with AI. The issue isn’t with the intelligence of the models—it’s that humans are still learning how to use this tool properly. It’s clear that the author has spent time mastering the art of communicating with LLMs. Many of the conclusions in this post feel obvious once you’ve developed an understanding of how these models "think" and how to work within their constraints.
I’m a huge fan of the workflow described here, and I’ll definitely be looking into AIder and repomix. I’ve had a lot of success using a similar approach with Cursor in Composer Agent mode, where Claude-3.5-sonnet acts as my "code implementer." I strategize with larger reasoning models (like o1-pro, o3-mini-high, etc.) and delegate execution to Claude, which excels at making inline code edits. While it’s not perfect, the time savings far outweigh the effort required to review an "AI Pull Request."
Maximizing efficiency in this kind of workflow requires a few key things:
- High typing speed – Minimizing time spent writing prompts means maximizing time generating useful code.
- A strong intuition for "what’s right" vs. "what’s wrong" – This will probably become less relevant as models improve, but for now, good judgment is crucial.
- Familiarity with each model’s strengths and weaknesses – This only comes with hands-on experience.
Right now, LLMs don’t work flawlessly out of the box for everyone, and I think that’s where a lot of the complaints come from—the "AI haterade" crowd expects perfection without adaptation.
For what it’s worth, I’ve built large-scale production applications using these techniques while writing minimal human code myself.
Most of my experience using these workflows has been in the web dev domain, where there's an abundance of training data. That said, I’ve also worked in lower-level programming and language design, so I can understand why some people might not find models up to par in every scenario, particularly in niche domains.
brokencode|1 year ago
Let’s be honest. The author was probably playing cookie clicker while this article was being written.
rd|1 year ago
Does the time invested into the planning benefit you? Have you noticed less hallucinations? Have you saved time overall?
I’d be curious to hear because my current workflow is basically
1. Have idea
2. create-next-app + ShadCN + TailwindUI boilerplate
3. Cursor Composer on agent mode with Superwispr voice transcription
I’m gonna try the author’s workflow regardless, but would love to hear others opinions.
ghuntley|1 year ago
mike_hearn|1 year ago
I also have a scratchpad file that I tell the model it can update to reflect anything new it learns, so that gives it a crude form of memory as it works on the codebase. This does help it use internal utility APIs.
dimitri-vs|1 year ago
- small .cursorrules file explaining what I am trying to build and why at a very high level and my tech stack
- a DEVELOPMENT.md file which is just a to-do/issue list for me that I tell cursor to update before every commit
- a temp/ directory where I dump contextual md and txt files (chat logs discussing feature, more detailed issue specs, etc.)
- a separate snippet management app that has my commonly used request snippets (write commit message, ask me clarify questions, update README, summarize chat for new session, etc.)
Otherwise it's pretty much what your workflow is.
cynicalpeace|1 year ago
Most of these workflows are just context management workflows and in Cursor it's so simple to manage the context.
For large files I just highlight the code and cmd+L. For short files, I just add them all by using /+downarrow
I constantly feed context like this and then usually come to a good solution for both legacy and greenfield features/products.
If I don't come to a good solution it's almost always because I didn't think through my prompt well enough and/or I didn't provide the correct context.
rollinDyno|1 year ago
For example, rather than using Plasmo for its browser extension boilerplate and packaging utilities, I’ve chosen to ask the LLM to setup all of that for me as it won’t have any blindspots when tasked with debugging.
sampton|1 year ago
fastball|1 year ago
bambax|1 year ago
Having 7 different instances of an LLM analyzing the same code base and making suggestions would not just be economically wasteful, it would also be unpractical or even dangerous?
Outside of RAG, which is a different thing, are there products that somehow "centralize" the context for a team, where all questions refer to the same codebase?
sambo546|1 year ago
staindk|1 year ago
It did seem to take a while to index, even though my colleagues had been using Cursor for a while, so I'm likely misunderstanding something.
adr1an|1 year ago
tarkin2|1 year ago
I ended up finishing my side projects when I kept these in mind, rather than focusing on elegant code for elegant code's sake.
It seems the key to using LLMs successfully is to make them create a specification and execution plan, through making them ask /you/ questions.
If this skill--specification and execution planning--is passed onto LLMs, along with coding, then are we essentially souped-up tester-analysts?
fullstackwife|1 year ago
> if it doesn’t work, Q&A with aider to fix
I fix errors myself, because LLMs are capable of producing large chunks of really stupid/wrong code, which needs to be reverted, and thats why it makes sense to see the code at least once.
Also I used to find myself in a situation when I tried to use LLM for the sake of using LLM to write code (waste of time)
codeisawesome|1 year ago
keyle|1 year ago
Because a lot of the benefits of LLM is bringing ideas or questions I am not thinking of right now, and this really does that. Typically this would happen as I dig through a topic, not before hand. So that's a net benefit.
I also tried it and it worked a charm, the LLM did respect context and the step by step approach, poking holes in my ideas. Amazing work.
I still like writing codes and solving puzzles in my mind so I won't be doing the "execution" part. From there on, I mostly use LLM as auto complete and I'm stuck here or obscure bug solving. Otherwise, I don't get any satisfaction from programming, having learnt nothing.
ggulati|1 year ago
Your workflow is much more polished, will definitely try it out for my next project
fragmede|1 year ago
is more polished? What's your workflow, banging rocks together?
harper|1 year ago
Isamu|1 year ago
watt|1 year ago
For example, there is this common challenge, "count how many r letters in strawberry", and you can see the issue is not counting, but that model does not know if "rr" should be treated as single "r" because it is not sure if you are counting r "letters" or r "sounds" and when you sound out the word, there is a single "r" sound where it is spelled with double "r". so if you tell the model, double "r" stands for 2 letters, it will get it right.
simonw|1 year ago
I have no idea if it works or not!
becquerel|1 year ago
krainboltgreene|1 year ago
cipehr|1 year ago
What am I doing wrong or what am I missing? My experience has been so underwhelming I just don’t understand the hype for why people use Claude over something else.
Sorry I know there are many models out there, and Claude is probably better than 99% of them. Can someone help me understand the value of it over o1/o3? I honestly feel like I like 4o better.
/frustration-rant
btucker|1 year ago
The key is to give it context so it can help you. For example, if you want it to help you with Spark configuration, give it the Spark docs. If you want it to help you write code, give it your codebase.
Tools like cursor and the like make this process very easy. You can also set up a local MCP server so the LLM can get the context and tools it needs on its own.
dimitri-vs|1 year ago
unknown|1 year ago
[deleted]
hooverd|1 year ago
jack_pp|1 year ago
junto|1 year ago
I act as the director, creativity and ideas person. I have one LLM that implements, and a second LLM that critiques and suggests improvements and alternatives.
krupan|1 year ago
jacooper|1 year ago
https://news.ycombinator.com/item?id=43057907
Other than that a great article! Very insightful.
harper|1 year ago
Software is going to be prompt wrangling with some acceptance testing. Then just prompt wrangling.
I don't have a lot of hope for the software profession to survive.
avandekleut|1 year ago
randomcatuser|1 year ago
This i think is the grand vision -- what could it look like?
in my mind programming should look like a map -- you can go anywhere, and there'll be things happening. and multiple people.
If anyone wants to work on this (or have comments, hit me up!)
blah2244|1 year ago
abrookewood|1 year ago
krupan|1 year ago
harper|1 year ago
The real thing that sold me is the entire workflow takes 10 minutes to plan, and then 10-15 minutes to execute (let's say a python script of medium complexity). after a solid ~20-30 min I am largely done. no debugging necessary.
it would have taken me an hour or two to do the same script.
this means i can spend a lot more time with the fam, hacking on more things, and messing about.
triyambakam|1 year ago
runoisenze|1 year ago
Also what do you mean by “I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience.” ?
harper|1 year ago
Total tokens in: 26,729,994 Total tokens out: 1,553,284
Last month anthropic bill was $89.30
--
I want to program with a team, together. not a group of people individually coding with an agent, and then managing merges. I have been playing a lot with merging team context - but haven't gotten too far yet.
bionhoward|1 year ago
thornewolf|1 year ago
biddit|1 year ago
Noprofit X publishes outputs from competing AI, which is not copyrightable.
Corp Y injests content published by Nonprofit X.
dfltr|1 year ago
As opposed to Vintage Pioneer code?
harper|1 year ago
Legacy modern code would be anything from the last 5-10 years. Vintage Pioneer code (which i have both initialized, and maintained) is more than 20 years old.
I am trying not to be a vintage pioneer these days.
mrklol|1 year ago
so either you put the whole codebase into the context (will mostly lead to problems as tokens are limited) or you have some kind of summary with your current features etc.
Or you do some kind of "black box" iterations, which I feel won’t be that useful for new features, as the model should know about current features etc?
What’s the way here?
maelito|1 year ago
Isn't the input token number way more limited than that ?
This is part is unclear to me in the "non-Greenfield" part of the article.
Iterating with aider on very limited scopes is easy, I've used it often. But what about understanding a whole repository and act on it ? Following imports to understand a Typescript codebase as a whole ?
kridsdale3|1 year ago
unknown|1 year ago
[deleted]
unknown|1 year ago
[deleted]
thedeep_mind|1 year ago
I was just wondering how to give my edits back to in-browser tools like Claude or ChatGPT, but the idea of repo mix is great, will try!
Although I have been flying bit with copilot in vscode, so right now I have essentially two AI, one for larger changes (in the browser), and then minor code fixes (in vscode).
ChrisRob|1 year ago
debian3|1 year ago
psadri|1 year ago
sejje|1 year ago
pyreal|1 year ago
nickrj|1 year ago
It is linked to in the article - a brilliant utility from Simon.
fallinditch|1 year ago
Also, don't forget that your favorite AI tools can be of great help with the factors that cause us to make software: research, subject expertise, marketing, business planning, etc.
insin|1 year ago
jdenning|1 year ago
harper|1 year ago
i will dig in again. It is an exciting idea.
mark_mcnally_je|1 year ago
harper|1 year ago
that is the first thing you send to aider.
also - there was a joke below, but you can do --yes-always and it will not ask for confirmation. I find it does a pretty good job.
fragmede|1 year ago
matsemann|1 year ago
bill_lau19|1 year ago
sprobertson|1 year ago
Philpax|1 year ago
snowwrestler|1 year ago
“Over my skis” ~ in over my head.
“Over my skies” ~ very far overhead. In orbit maybe?
matsemann|1 year ago
Is that correct? Never heard the expression before, but as a skier if you're over your skis you're in control of them, while if you're backseated the skis will control you.
pyreal|1 year ago
harper|1 year ago
zackify|1 year ago
oars|1 year ago
timmyers|1 year ago
[deleted]
dkkergoog|1 year ago
[deleted]
tyiz|1 year ago
[deleted]