top | item 46508634

(no title)

moezd | 1 month ago

I recall someone saying stories of LLMs doing something useful to "I have a Canadian girlfriend" stories. Not trying to discredit or be a pessimist, can anyone elaborate how exactly they use these agents while working in interdependent projects in multi-team settings in e.g. regulated industries?

discuss

hyperadvanced|1 month ago

I’m strictly talking about “Agentic” coding here:

They are not a silver bullet or truly “you don’t need to know how to code anymore” tools. I’ve done a ton of work with Claude code this year. I’ve gone from a “maybe one ticket a week” tier React developer to someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org, where the hard problems come from complex interactions between distributed systems, monitoring across services, and lots of low-level machine traffic. LLM’s let me solve easy problems and spend my most productive hours working with people to break down the hard problems into easy problems that I can solve later or pass off to someone on my team to help.

I’ve also used LLM to get into other people’s codebases, refactor ancient tech debt, shore up test suites from years ago that are filled with garbage and copy/paste. On testing alone, LLM are super valuable for throwing edge cases at your code and seeing what you assumed vs. what an entropy machine would throw at it.

LLM absolutely are not a 10x improvement in productivity on their own. They 100% cannot solve some problems in a sensible, tractable way, and they frequently do stupid things that waste time and would ruin a poor developer’s attempts at software engineering. However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.

Software as a discipline has shifted so far from “build functional, safe systems that solve problems” to “I make 200k bike shedding JIRA tickets that require an army of product people to come up with and manage” that LLM can be valuable if only for their capabilities to role-compress and give people with a sense of ownership the tools they need to operate like a whole team would 10 years ago.

aflukasz|1 month ago

> However, they absolutely also lower the barrier to entry and dethrone “pure single tech” (ie backend only, frontend only, “I don’t know Kubernetes”, or other limited scope) software engineers who’ve previously benefited from super specialized knowledge guarding their place in the business.

This argument gets repeated frequently, but to me it seems to be missing final, actionable conclusion.

If one "doesn't know Kubernetes", what exactly are they supposed to do now, having LLM at hand, in a professional setting? They still "can't" asses the quality of the output, after all. They can't just ask the model, as they can't know if the answer is not misleading.

Assuming we are not expecting people to operate with implicit delegation of responsibility to the LLM (something that is ultimately not possible anyway - taking blame is a privilege human will keep for a foreseeable future), I guess the argument in the form as above collapses to "it's easier to learn new things now"?

But this does not eliminate (or reduce) a need for specialization of knowledge on the employee side, and there is only so much you can specialize in.

The bottleneck maybe shifted right somewhat (from time/effort of the learning stage to the cognition and the memory limits of an individual), but the output on the other side of the funnel (of learn->understand->operate->take-responsibility-for) didn't necessary widen that much, one could argue.

netdevphoenix|1 month ago

> someone who’s shipped entire new frontend feature sets, while also managing a team. I’ve used LLM to prototype these features rapidly and tear down the barrier to entry on a lot of simple problems that are historically too big to be a single-dev item, and clear out the backlog of “nice to haves” that compete with the real meat and bread of my business. This prototyping and “good enough” development has been massively impactful in my small org

Has any senior React dev code review your work? I would be very interested to see what do they have to say about the quality of your code. It's a bit like using LLMs to medically self diagnose yourself and claiming it works because you are healthy.

Ironically enough, it does seem that the only workforce AIs will be shrinking will be devs themselves. I guess in 2025, everyone can finally code

moezd|1 month ago

That's a solid answer, I like it, thanks!

ncruces|1 month ago

I follow at least one GitHub repo (a well respected one that's made the HN front page), and where everything is now Claude coded. Things do move fast, but I'm seriously under impressed with the quality. I've raised a few concerns, some were taken in, others seem to have been shut down with an explanation Claude produced that IMO makes no sense, but which is taken at face value.

This matches my personal experience. I was asked to help with a large Swift iOS app without knowing Swift. Had access to a frontier agent. I was able to consistently knock a couple of tickets per week for about a month until the fire was out and the actual team could take over. Code review by the owners means the result isn't terrible, but it's not great either. I leave the experience none the wiser: gained very little knowledge of Swift, iOS development or the project. Management was happy with the productivity boost.

I think it's fleeting and dread a time where most code is produced that way, with the humans accumulating very little institutional knowledge and not knowing enough to properly review things.

piker|1 month ago

Any reason not to link to the repo in question?

PacificSpecific|1 month ago

Oh wow that's a great analogy. So many posts talking about how AI is a massive benefit for their work but no examples or further information.

srcreigh|1 month ago

This project and its website were both originally working 1 shot prototypes:

The website https://pxehost.com - via codex CLI

The actual project itself (a pxe server written in go that works on macOS) - https://github.com/pxehost/pxehost - ChatGPT put the working v1 of this in 1 message.

There was much tweaking, testing, refactoring (often manually) before releasing it.

Where AI helps is the fact that it’s possible to try 10-20 different such prototypes per day.

The end result is 1) Much more handwritten code gets produced because when I get a working prototype I usually want to go over every detail personally; 2) I can write code across much more diverse technologies; 3) The code is better, because each of its components are the best of many attempts, since attempts are so cheap.

I can give more if you like, but hope that is what you are looking for.

unknown|1 month ago

[deleted]

unknown|1 month ago

[deleted]

SkyBelow|1 month ago

I had some .csproj files that only worked with msbuild/vsbuild that I wanted to make compatible with dotnet. Copilot does a pretty good job of updating these and identifying the ones more likely to break (say web projects compared to plain dlls). It isn't a simple fire and forget, but it did make it possible without me needing to do as much research into what was changing.

Is that a net benefit? Without AI, if I really wanted to do that conversion, I would have had to become much more familiar with the inner workings of csproj files. That is a benefit I've lost, but it would've also taken longer to do so, so much time I might not have decided to do the conversion. My job doesn't really have a need for someone that deeply specialized in csproj, and it isn't a particular interest of mine, so letting AI handle it while being able to answer a few questions to sate my curiosity seemed a great compromise.

A second example, it works great as a better option to a rubber duck. I noticed some messy programming where, basically, OOP had been abandoned in favor of one massive class doing far too much work. I needed to break it down, and talking with AI about it helped come up with some design patterns that worked well. AI wasn't good enough to do the refactoring in one go, but it helped talk through the pros and cons of a few design pattern and was able to create test examples so I could get a feel for what it would look like when done. Also, when I finished, I had AI review it and it caught a few typos that weren't compile errors before I even got to the point of testing it.

None of these were things AI could do on their own, and definitely aren't areas I would have just blindly trusted some vibe coded output, but overall it was productivity increase well worth the $20 or so cost.

(Now, one may argue that is the subsidized cost, and the unsubsidized cost would not have been worthwhile. To that, I can only say I'm not versed enough on the costs to be sure, but the argument does seem like a possibility.)

scott_w|1 month ago

I was at a podiatrist yesterday who explained that what he's trying to do is to "train" an LLM agent on the articles and research papers he's published to create a chatbot that can provide answers to the most common questions more quickly than his reception team can.

He's also using it to speed up writing his reports to send to patients.

Longer term, he was also quite optimistic on its ability to cut out roles like radiologists, instead having a software program interpret the images and write a report to send to a consultant. Since the consultant already checks the report against any images, the AI being more sensitive to potential issues is a positive thing: giving him the power to discard erroneous results rather than potentially miss something more malign.

engeljohnb|1 month ago

> Longer term, he was also quite optimistic on its ability to cut out roles like radiologists, instead having a software program interpret the images and write a report to send to a consultant.

As a medical imaging tech, I think this is a terrible idea. At least for the test I perform, a lot of redundancy and double-checking is necessary because results can easily be misleading without a diligent tech or critical-thinking on the part of the reading physician. For instance, imaging at slightly the wrong angle can make a normal image look like pathology, or vice versa.

Maybe other tests are simpler than mine, but I doubt it. If you've ever asked an AI a question about your field of expertise and been amazed at the nonsense it spouts, why would you trust it to read your medical tests?

> Since the consultant already checks the report against any images, the AI being more sensitive to potential issues is a positive thing: giving him the power to discard erroneous results rather than potentially miss something more malign.

Unless they had the exact same schooling as the radiologist, I wouldn't trust the consultant to interpret my test, even if paired with an AI. There's a reason this is a whole specialized field -- because it's not as simple as interpreting an EKG.

heyitsguay|1 month ago

Agreed. I've never seen a concrete answer with an outcome that can be explained in clear, simple terms.

jdross|1 month ago

I work in insurance - regulated, human capital heavy, etc.

Three examples for you: - our policy agent extracts all coverage limits and policy details into a data ontology. This saves 10-20 mins per policy. It is more accurate and consistent than our humans - our email drafting agent will pull all relevant context on an account whenever an email comes in. It will draft a reply or an email to someone else based on context and workflow. Over half of our emails are now sent without meaningfully modifying the draft, up from 20% two months ago. Hundreds of hours saved per week, now spent on more valuable work for clients. - our certificates agent will note when a certificate of insurance is requested over email and automatically handle the necessary checks and follow up options or resolution. Will likely save us around $500k this year.

We also now increasingly share prototypes as a way to discuss ideas. Because the cost to vibe code something illustrative is very low, an it’s often much higher fidelity to have the conversation with something visual than a written document

linkjuice4all|1 month ago

Here's some anecdata from the B2B SaaS company I work at

- Product team is generating some code with LLMs but everything has to go through human review and developers are expected to "know" what they committed - so it hasn't been a major time saver but we can spin up quicker and explore more edge cases before getting into the real work

- Marketing team is using LLMs to generate initial outlines and drafts - but even low stakes/quick turn around content (like LinkedIn posts and paid ads) still need to be reviewed for accuracy, brand voice, etc. Projects get started quicker but still go through various human review before customers/the public sees it

- Similarly the Sales team can generate outreach messaging slightly faster but they still have to review for accuracy, targeting, personalization, etc. Meeting/call summaries are pretty much 'magic' and accurate-enough when you need to analyze any transcripts. You can still fall back on the actual recording for clarification.

- We're able to spin up demos much faster with 'synthetic' content/sites/visuals that are good-enough for a sales call but would never hold up in production

---

All that being said - the value seems to be speeding up discovery of actual work, but someone still needs to actually do the work. We have customers, we built a brand, we're subject to SLAs and other regulatory frameworks so we can't just let some automated workflow do whatever it wants without a ton of guardrails. We're seeing similar feedback from our customers in regard to the LLM features (RAG) that we've added to the product if that helps.

moron4hire|1 month ago

Lately, it seems like all the blogs have shifted away from talking about productivity and are now talking about how much they "enjoy" working with LLMs.

If firing up old coal plants and skyrocketing RAM prices and $5000 consumer GPUs and violating millions of developers' copyrights and occasionally coaxing someone into killing themselves is the cost of Brian From Middle Management getting to Enjoy Programming Again instead of having to blame his kids for not having any time on the weekends, I guess we have no choice but to oblige him his little treat.

unknown|1 month ago

[deleted]

cageface|1 month ago

This kind of take I find genuinely baffling. I can't see how anybody working with current frontier models isn't finding them a massive performance boost. No they can't replace a competent developer yet, but they can easily at least double your productivity.

Careful code review and a good pull request flow are important, just as they were before LLMs.

59nadir|1 month ago

People thought they were doubling their productivity and then real, actual studies showed they were actually slower. These types of claims have to be taken with entire quarries of salt at this point.

netdevphoenix|1 month ago

> double your productivity

Churning out 2x as much code is not doubling productivity. Can you perform at the same level as a dev who is considered 2x as productive as you? That's the real metric. Comparing quality to quantity of code ratios, bugs caused by your PRs, actual understanding of the code in your PR, ability to think slow, ability to deal with fires, ability to quickly deal with breaking changes accidentally caused by your changes.

Churning out more more per day is not the goal. No point merging code that either doesn't fully work, is not properly tested, other humans (or you) cannot understand, etc.