(no title)
porise
|
1 month ago
I wish the people who wrote this let us know what king of codebases they are working on. They seem mostly useless in a sufficiently large codebase especially when they are messy and interactions aren't always obvious. I don't know how much better Claude is than ChatGPT, but I can't get ChatGPT to do much useful with an existing large codebase.
CameronBanga|1 month ago
I've been working in the mobile space since 2009, though primarily as a designer and then product manager. I work in kinda a hybrid engineering/PM job now, and have never been a particularly strong programmer. I definitely wouldn't have thought I could make something with that polish, let alone in 3 months.
That code base is ~98% Claude code.
bee_rider|1 month ago
TaupeRanger|1 month ago
unknown|1 month ago
[deleted]
danielvaughn|1 month ago
I never paid any attention to different models, because they all felt roughly equal to me. But Opus 4.5 is really and truly different. It's not a qualitative difference, it's more like it just finally hit that quantitative edge that allows me to lean much more heavily on it for routine work.
I highly suggest trying it out, alongside a well-built coding agent like the one offered by Claude Code, Cursor, or OpenCode. I'm using it on a fairly complex monorepo and my impressions are much the same as Karpathy's.
suddenlybananas|1 month ago
CSMastermind|1 month ago
I'm not sure how big your repos are but I've been effective working with repos that have thousands of files and tens of thousands of lines of code.
If you're just prototyping it will hit wall when things get unwieldy but that's normally a sign that you need to refactor a bit.
Super strict compiler settings, static analysis, comprehensive tests, and documentation help a lot. As does basic technical design. After a big feature is shipped I do a refactor cycle with the LLM where we do a comprehensive code review and patch things up. This does require human oversight because the LLMs are still lacking judgement on what makes for good code design.
The places where I've seen them be useless is working across repositories or interfacing with things like infrastructure.
It's also very model-dependent. Opus is a good daily driver but Codex is much better are writing tests for some reason. I'll often also switch to it for hard problems that Claude can't solve. Gemini is nice for 'I need a prototype in the next 10 minutes', especially for making quick and dirty bespoke front-ends where you don't care about the design just the functionality.
madhadron|1 month ago
Perhaps this is part of it? Tens of thousands of lines of code seems like a very small repo to me.
keerthiko|1 month ago
Trying to incorporate it in existing codebases (esp when the end user is a support interaction or more away) is still folly, except for closely reviewed and/or non-business-logic modifications.
That said, it is quite impressive to set up a simple architecture, or just list the filenames, and tell some agents to go crazy to implement what you want the application to do. But once it crosses a certain complexity, I find you need to prompt closer and closer to the weeds to see real results. I imagine a non-technical prompter cannot proceed past a certain prototype fidelity threshold, let alone make meaningful contributions to a mature codebase via LLM without a human engineer to guide and review.
reubenmorais|1 month ago
jjfoooo4|1 month ago
It's been especially helpful in explaining and understanding arcane bits of legacy code behavior my users ask about. I trigger Claude to examine the code and figure out how the feature works, then tell it to update the documentation accordingly.
1123581321|1 month ago
gwd|1 month ago
I really enjoyed the process. As TFA says, you have to keep a close eye on it. But the whole process was a lot less effort, and I ended up doing mor than I would otherwise have done.
ph4te|1 month ago
fy20|1 month ago
For this the LLM struggles a bit, but so does a human. The main issues are it messes up some state that it didnt realise was used elsewhere, and out test coverage is not great. We've seen humans make exactly the same kind of mistakes. We use MCP for Figma so most of the time it can get a UI 95% done, just a few tweaks needed by the operator.
On the backend (Typescript + Node, good test coverage) it can pretty much one-shot - from a plan - whatever feature you give it.
We use opus-4.5 mostly, and sometimes gpt-5.2-codex, through Cursor. You aren't going to get ChatGPT (the web interface) to do anything useful, switch to Cursor, Codex or Claude Code. And right now it is worth paying for the subscription, you don't get the same quality from cheaper or free models (although they are starting to catch up, I've had promising results from GLM-4.7).
yasoob|1 month ago
I had never used Swift before that and was able to use AI to whip up a fairly full-featured and complex application with a decent amount of code. I had to make some cross-cutting changes along the way as well that impacted quite a few files and things mostly worked fine with me guiding the AI. Mind you this was a year ago so I can only imagine how much better I would fare now with even better AI models. That whole month was spent not only on coding but on learning Swift enough to fix problems when AI started running into circles and then learning about Xcode profiler to optimize the application for speed and improving perf.
BeetleB|1 month ago
What type of documents do you have explaining the codebase and its messy interactions, and have you provided that to the LLM?
Also, have you tried giving someone brand new to the team the exact same task and information you gave to the LLM, and how effective were they compared to the LLM?
> I don't know how much better Claude is than ChatGPT, but I can't get ChatGPT to do much useful with an existing large codebase.
As others have pointed out, from your comment, it doesn't sound like you've used a tool dedicated for AI coding.
(But even if you had, it would still fail if you expect LLMs to do stuff without sufficient context).
smusamashah|1 month ago
jumploops|1 month ago
Commercial codebases, especially private internal ones, are often messy. It seems this is mostly due to the iterative nature of development in response to customer demands.
As a product gets larger, and addresses a wider audience, there’s an ever increasing chance of divergence from the initial assumptions and the new requirements.
We call this tech debt.
Combine this with a revolving door of developers, and you start to see Conway’s law in action, where the system resembles the organization of the developers rather than the “pure” product spec.
With this in mind, I’ve found success in using LLMs to refactor existing codebases to better match the current requirements (i.e. splitting out helpers, modularizing, renaming, etc.).
Once the legacy codebase is “LLMified”, the coding agents seem to perform more predictably.
YMMV here, as it’s hard to do large refactors without tests for correctness.
(Note: I’ve dabbled with a test first refactor approach, but haven’t gone to the lengths to suggest it works, but I believe it could)
tunesmith|1 month ago
Okkef|1 month ago
After you tried it, come back.
Imustaskforhelp|1 month ago
I tried a website which offered the Opus model in their agentic workflow & I felt something different too I guess.
Currently trying out Kimi code (using their recent kimi 2.5) for the first time buying any AI product because got it for like 1.49$ per month. It does feel a bit less powerful than claude code but I feel like monetarily its worth it.
Y'know you have to like bargain with an AI model to reduce its pricing which I just felt really curious about. The psychology behind it feels fascinating because I think even as a frugal person, I already felt invested enough in the model and that became my sunk cost fallacy
Shame for me personally because they use it as a hook to get people using their tool and then charge next month 19$ (I mean really Cheaper than claude code for the most part but still comparative to 1.49$)
epolanski|1 month ago
2. Put your important dependencies source code in the same directory. E.g. put a `_vendor` directory in the project, in it put the codebase at the same tag you're using or whatever: postgres, redis, vue, whatever.
3. Write good plans and requirements. Acceptance criteria, context, user stories, etc. Save them in markdown files. Review those multiple times with LLMs trying to find weaknesses. Then move to implementation files: make it write a detailed plan of what it's gonna change and why, and what it will produce.
4. Write very good prompts. LLMs follow instructions well if they are clear "you should proactively do X", is a weak instruction if you mean "you must do X".
5. LLMs are far from perfect, and full of limits. Karpathy sums their cons very well in his long list. If you don't know their limits you'll mismanage the expectations and not use them when they are a huge boost and waste time on things they don't cope well with. On top of that: all LLMs are different in their "personality", how they adhere to instruction, how creative they are, etc.
datsci_est_2015|1 month ago
I guess this is fine when you don’t have customers or stakeholders that give a shit lol.
bluGill|1 month ago
Which is to say you have to learn to use the tools. I've only just started, and cannot claim to be an expert. I'll keep using them - in part because everyone is demanding I do - but to use them you clearly need to know how to do it yourself.
simonw|1 month ago
I also find pointing it to an existing folder full of code that conforms to certain standards can work really well.
rob|1 month ago
There's basically a "brainstorm" /slash command that you go back and forth with, and it places what you came up with in docs/plans/YYYY-MM-DD-<topic>-design.md.
Then you can run a "write-plan" /slash command on the docs/plans/YYYY-MM-DD-<topic>-design.md file, and it'll give you a docs/plans/YYYY-MM-DD-<topic>-implementation.md file that you can then feed to the "execute-plan" /slash command, where it breaks everything down into batches, tasks, etc, and actually implements everything (so three /slash commands total.)
There's also "GET SHIT DONE" (GSD) [1] that I want to look at, but at first glance it seems to be a bit more involved than Superpowers with more commands. Maybe it'd be better for larger projects.
[0] https://github.com/obra/superpowers
[1] https://github.com/glittercowboy/get-shit-done
gverrilla|1 month ago
jwr|1 month ago
culi|1 month ago
languid-photic|1 month ago
Macha|1 month ago
vindex10|1 month ago
redox99|1 month ago
AI assisted coding has never been like that, which would be atrocious. The typical workflow was using Cursor with some model of your choice (almost always an Anthropic model like sonnet before opus 4.5 released). Nowadays (in addition to IDEs) it's often a CLI tool like Claude Code with Opus or Codex CLI with GPT Codex 5.2 high/xhigh.
maxdo|1 month ago
spaceman_2020|1 month ago
If you're using plain vanilla chatgpt, you're woefully, woefully out of touch. Heck, even plain claude code is now outdated
shj2105|1 month ago