(no title)
ragle | 1 year ago
What models are you using that you feel comfortable trusting it to understand and operate on 10-20k LOC?
Using the latest and greatest from OpenAI, I've seen output become unreliable with as little as ~300 LOC on a pretty simple personal project. It will drop features as new ones are added, make obvious mistakes, refuse to follow instructions no matter how many different ways I try to tell it to fix a bug, etc.
Tried taking those 300 LOC (generated by o3-mini-high) to cursor and didn't fare much better with the variety of models it offers.
I haven't tried OpenAI's APIs yet - I think I read that they accommodate quite a bit more context than the web interface.
I do find OpenAI's web-based offerings extremely useful for generating short 50-200 LOC support scripts, generating boilerplate, creating short single-purpose functions, etc.
Anything beyond this just hasn't worked all that well for me. Maybe I just need better or different tools though?
Jcampuzano2|1 year ago
When it comes to 10k LOC codebases, I still don't really trust it with anything. My best luck has been small personal projects where I can sort of trust it to make larger scale changes, but larger scale at a small level in the first place.
I've found it best for generating tests, autocompletion, especially if you give context via function names and parameter names I find it can oftentimes complete a whole function I was about to write using the interfaces available to it in files I've visited recently.
But besides that I don't really use it for much outside of starting from scratch on a new feature or getting helping me with getting a plan together before starting working on something I may be unfamiliar with.
We have access to all models available through copilot including o3 and o1, and access to chatgpt enterprise, and I do find using it via the chat interface nice just for architecting and planning. But I usually do the actual coding with help from autocompletion since it honestly takes longer to try to wrangle it into doing the correct thing than doing it myself with a little bit of its help.
ragle|1 year ago
It's when I try to give it a clear, logical specification for a full feature and expect it to write everything that's required to deliver that feature (or the entirety of slightly-more-than-non-trivial personal project) that it falls over.
I've experimented trying to get it to do this (for features or personal projects that require maybe 200-400 LOC) mostly just to see what the limitations of the tool are.
Interestingly, I hit a wall with GPT-4 on a ~300 LOC personal project that o3-mini-high was able to overcome. So, as you'd expect - the models are getting better. Pushing my use case only a little bit further with a few more enhancements, however, o3-mini-high similarly fell over in precisely the same ways as GPT-4, only a bit worse in the volume and severity of errors.
The improvement between GPT-4 and o3-mini-high felt nominally incremental (which I guess is what they're claiming it offers).
Just to say: having seen similar small bumps in capability over the last few years of model releases, I tend to agree with other posters that it feels like we'll need something revolutionary to deliver on a lot of the hype being sold at the moment. I don't think current LLM models / approaches are going to cut it.
unknown|1 year ago
[deleted]