top | item 44975059

(no title)

systemf_omega | 6 months ago

> B2B SaaS

Perhaps that's part of it.

People here work on all kinds of industries. Some of us are implementing JIT compilers, mission-critical embedded systems or distributed databases. In code bases like this you can't just wing it without breaking a million things, so LLM agents tend to perform really poorly.

discuss

order

sunrunner|6 months ago

> People here work on all kinds of industries.

Yes, it would be nice to have a lot more context (pun intended) when people post how many LoC they introduced.

B2B SaaS? Then can I assume that a browser is involved and that a big part of that 200k LoC is the verbose styling DSL we all use? On the other hand, Nginx, a production-grade web server, is 250k LoC (251,232 to be exact [1]). These two things are not comparable.

The point being that, as I'm sure we all agree, LoC is not a helpful metric for comparison without more context, and different projects have vastly different amounts of information/feature density per LoC.

[1] https://openhub.net/p/nginx

Fr0styMatt88|6 months ago

I primarily work in C# during the day but have been messing around with simple Android TV dev on occasion at night.

I’ve been blown away sometimes at what Copilot puts out in the context of C#, but using ChatGPT (paid) to get me started on an Android app - totally different experience.

Stuff like giving me code that’s using a mix of different APIs and sometimes just totally non-existent methods.

With Copilot I find sometimes it’s brilliant but it’s so random as to when that will be it seems.

godelski|6 months ago

  > when people post how many LoC they introduced.
Pretty ironic you and the GP talk about lines of code.

From the article:

  Garman is also not keen on another idea about AI – measuring its value by what percentage of code it contributes at an organization.

  “It’s a silly metric,” he said, because while organizations can use AI to write “infinitely more lines of code” it could be bad code.

  “Often times fewer lines of code is way better than more lines of code,” he observed. “So I'm never really sure why that's the exciting metric that people like to brag about.”
I'm with Garman here. There's no clean metric for how productive someone is when writing code. At best, this metric is naive, but usually it is just idiotic.

Bureaucrats love LoC, commits, and/or Jira tickets because they are easy to measure but here's the truth: to measure the quality of code you have to be capable of producing said code at (approximately) said quality or better. Data isn't just "data" that you can treat as a black box and throw in algorithms. Data requires interpretation and there's no "one size fits all" solution. Data is nothing without its context. It is always biased and if you avoid nuance you'll quickly convince yourself of falsehoods. Even with expertise it is easy to convince yourself of falsehoods. Without expertise it is hopeless. Just go look at Reddit or any corner of the internet where there's armchair experts confidently talking about things they know nothing about. It is always void of nuance and vastly oversimplified. But humans love simplicity. You need to recognize our own biases.

drusepth|6 months ago

On the other hand, fault-intolerant codebases are also often highly defined and almost always have rigorous automated tests already, which are two contexts where coding agents specifically excel in.

JambalayaJimbo|6 months ago

I work on brain dead crud apps much of my time and get nothing from LLMs.

benjaminwootton|6 months ago

Try Claude Code. You’ll literally be able to automate 90% of the coding part of your job.

murukesh_s|6 months ago

I found the opposite - I am able to get 50% improvement in productivity for day to day coding (mix of backend, frontend), mostly in Javascript but have helped in other languages. But you have to carefully review though - and have extremely well written test cases if you have to blindly generate or replace existing code.

motorest|6 months ago

> In code bases like this you can't just wing it without breaking a million things, so LLM agents tend to perform really poorly.

This is a false premise. LLMs themselves don't force you to introduce breaking changes into your code.

In fact, the inception of coding agents was lauded as a major improvement to the developer experience because they allow the LLMs themselves to automatically react to feedback from test suites, thus speeding up how code was implemented while preventing regressions.

If tweaking your code can result in breaking a million things, this is a problem with your code and how you worked to make it resilient. LLMs are only able to introduce regressions if your automated tests are unable to catch any of these million of things breaking. If this is the case then your problems are far greater than LLMs existing, and at best LLMs only point out the elephant in the room.