top | item 43783106

(no title)

CopyOnWrite | 10 months ago

Most comments here surprise me: I am using Githubs Copilot / ChatGPT 4.0 at work with a code base which is mostly implements a basic CRUD service... and outside of small/trivial example (where the generated code is mostly okay), prompting is more often than not a total waste of time. Now, I wonder if I am just totally unable to write/refine good prompts for the LLM (as it works for smaller samples, I hope I am not too far off) or what could explain the huge discrepancy of experience. (Just for the record: I would totally not mind if the LLM writes the code for the stuff I have to do at work.)

To clarify my questions: - Who here uses LLMs to generate code for bigger projects at work? (>= 20k lines of code) - If you use LLMs for bigger projects: Do you need to change your prompting strategy to get good results? - What programming languages are you using in your code bases? - Are there other people here who experience that LLMs are no help for non trivial problems?

discuss

order

douglasisshiny|10 months ago

I'm in the same boat. I've largely stopped using these tools other than asking questions about a language that I'm less familiar with or a complex type in typescript for which it can be helpful (sometimes). Otherwise, I felt like I was just wasting my time and becoming lazier/worse as a developer. I do wonder whether LLMs have hit a wall and we're in a hype cycle.

CopyOnWrite|10 months ago

Yes, I have the same feeling about the wall/hype cycle. Most of my time is understanding code and formulating a plan to change code w/o breaking anything... even if LLMs would generate 100% perfect code on the first try, it would not help in a big way.

One thing I forgot to mention is asking LLMs questions from within the IDE instead of doing a web search... this works quite nice, but again, it is not a crazy productivity boost.

thi2|10 months ago

My employer gives me access to Jetbrains AI, I work on a Vue Frontend with a Kotlin Spring Boot backend.

The codebase is not too old and has grown without too much technical debt, with complex prompts I never had decent success. Its usefull for quick "what does this do" checks but any real functionality seems to be lacking.

Maybe I'm not refining my prompts good enough but doing so would take longer than implementing it myself.

Recently I tried Jetbrains Junie, which acts like Claude if I understand it correctly.

I had a really refined prompt, ran it three times with adjustments and fine tuning but the result was still lacking. So I tossed it and wrote it myself. But watching the machine nearly getting it right was still impressive.

aitchnyu|10 months ago

Jetbrains AI runs on a "discount LLM" and their ratings were below 2 stars. I tried two others, which played games with me to reduce context and use cheaper models. I then switched to Aider which leads me to believe a moderate Claude user may need to spend 30$ a month, but I use Gemini models and I didnt exceed 5$.

CrimsonRain|10 months ago

You are just bad with prompting or working with very obscure language/framework or bad coding pattern or all of it. I had a talk with a seasoned engineer who has been coding for 50 years and has created many amazing things over lifetime about him having really bad results with AI tools I suggested for him. When I use AI for the same purposes in the same repo he's working on, it works nicely. When he does it, results are always not what he wants. It comes down to a combination of him not understanding how to guide the LLMs to correct direction and using a language/framework (he's not familiar with) he can't judge the LLMs output. It is really important to know what you want, be able to describe it in short points (but important points). Points that you know ai will mess up if you don't specify. And also be able to figure out which direction the ai is heading with the solution and correct it EARLY rather than later. Not overloading context/memory with unnecessary things. Focusing on key areas to improve and much more. I'm using AI to get solutions done that I can definitely do myself but it'll take a certain amount of time to hunt down all documentation, API/lib calls etc. With AI, 1/10th time is enough.

I've had massive success with java, js/TS, html css, go, rust, python, bitbucket pipelines/GitHub actions, cdk, docker compose, SQL, flutter/dart, swift etc.

douglasisshiny|10 months ago

I've had the same experience as the person to whom you're responding. After reading your post, I have to ask: if you're putting so much effort into prompting it with specific points, correcting it often, etc., why not just write the code yourself? It sounds like you're putting a good deal of effort into prompting it.

Aren't you worried that overtime you'll rely on it too much and your offhand knowledge will get worse?

CopyOnWrite|10 months ago

I do not rule out, that I am just very bad with prompting.

It just surprises me, that you write you had massive successes with "java, js/TS, html css, go, rust, python, bitbucket pipelines/GitHub actions, cdk, docker compose, SQL, flutter/dart, swift etc.", if you include the usual libraries/frameworks and the diverse application areas for these technologies, even with LLMs support it seems to me crazy to be able to make meaningful contributions in non trivial code bases.

Concerning SQL I can report another fail with LLMs, in a trivial code base with a handful of entities the LLM cannot come up with basic window functions.

I would be very interested if you could write up a blog post or could make a youtube video demonstrating your prompting skills... Perhaps demonstrating with a bigger open source project in any of the mentioned languages how to add a non trivial feature with your prompting skills?

thi2|10 months ago

> You are just bad with prompting or working with very obscure language/framework or bad coding pattern or all of it

You just described every existing legacy project^^

manojlds|10 months ago

Play with Cursor or Claude Code a bit and then make a decision. I am not on the this is going to replace Devs boat, but this has changed the way I code and approach things.

CopyOnWrite|10 months ago

Could you perhaps point me to a youtube video which demonstrates an experienced prompter sculpting code with Cursor/Clause Code?

In my search I just found trivial examples.

My critic so far:

- Examples seem always to be creating a simple application from scratch

- Examples always use super common things (like create a blog / simple website for CRUD)

What I would love to see (see elsewhere): Adding a non trivial feature to a bigger code base. Just a youtube video/demonstration. I don't care about language/framework etc. ...

knlam|10 months ago

Copilot is just plain bad. The result is day and night compare with cursor + gemini 2.5 (of course with good prompting)

merb|10 months ago

Copilot can also use Gemini 2.5 and sonnet 3.7.

7589447636|10 months ago

> Now, I wonder if I am just totally unable to write/refine good prompts for the LLM (as it works for smaller samples, I hope I am not too far off) or what could explain the huge discrepancy of experience.

Programming language / stack plays plays a big role, I presume.

CopyOnWrite|10 months ago

Fair enough. Still, I was out of luck for some fairly simple SQL statements, were the model knows 100% of the DDL statements.

stopyellingatme|10 months ago

Same here. We have a massive codebase with large classes and the LLMs are not very helpful. Frontend stuff is okay sometimes but the backend models are too complex at this point, I guess.

pdntspa|10 months ago

Tooling and available context size matters a lot. I'm having decent luck with Gemini 2.5 and Roo Code.