top | item 41669355

(no title)

binocarlos | 1 year ago

I would pay hundreds of dollars per month for the combination of cursor and claude - I could not get my head around it when my beginner lever colleague said "I just coded this whole thing using cursor".

It was an entire web app, with search filters, tree based drag and drop GUIs, the backend api server, database migrations, auth and everything else.

Not once did he need to ask me a question. When I asked him "how long did this take" and expected him to say "a few weeks" (it would have taken me - a far more experienced engineer - 2 months minimum).

His answer was "a few days".

What I'm not saying is "AGI is close" but I've seen tangible evidence (only in the last 2 months), that my 20 year software engineering career is about to change and massively for the upside. Everyone is going to be so much more productive using these tools is how I see this.

discuss

order

aniviacat|1 year ago

Current LLMs fail if what you're coding is not the most common of tasks. And a simple web app is about as basic as it gets.

I've tried using LLMs for some libraries I'm working on, and they failed miserably. Trying to make an LLM implement a trait with a generic type in Rust is a game of luck with very poor chances.

I'm sure LLMs can massively speed up tasks like front-end JavaScript development, simple Python scripts, or writing SQL queries (which have been written a million times before).

But for anything even mildly complex, LLMs are still not suited.

dathinab|1 year ago

I don't think if complexity is the right metric.

front-end JS can easily also become very complex

I think a better metric is how close you are to reinventing a wheel for the thousands time. Because that is what LLMs are good at: Helping you write code which nearly the same way has already been written thousands of times.

But that is also something you find in backend code, too.

But that is also something where we as a industry kinda failed to produce good tooling. And worse if you are in the industry it's kinda hard to spot without very carefully taking a hounded (mental) steps back from what you are used to and what biases you might have.

znpy|1 year ago

I had similar experiences:

1. Aasked ChatGPT to write a simple echo server in C but with this twist: use io_uring rather than the classic sendmsg/recvmsg. The code it spat out wouldn't compile, let alone work. It was wrong on many points. It was clearly pieces of who-knows-what cut and pasted together. However after having banged my head on the docs for a while I could clearly determine from which sources the code io_uring code segments were coming. The code barely made any sense and it was completely incorrect both syntactically and semantically.

2. Asked another LLM to write an AWS IAM policy according to some specifications. It hallucinated and used predicates that do not exist at all. I mean, I could have done it myself if I just could have made predicates up.

> But for anything even mildly complex, LLMs are still not suited.

Agreed, and I'm not sure we are any close to them being.

PaulHoule|1 year ago

Roughly LLMs are great at things that involve a series of (near) 1-1 correspondences like “translate 同时采访了一些参与其中的活跃用户 to English” or “How do I move something up 5px in CSS without changing the rest of the layout?” but if the relationship of several parts is complex (those Rust traits or anything involving a fight with the borrow checker) or things have to go in some particular order it hasn’t seen (say US states in order of percent water area) they struggle.

SQL is a good target language because the translation from ideas (or written description) is more or less linear, the SQL engine uses entirely different techniques to turn that query into a set of relational operators which can be rewritten for efficiency and compiled or interpreted. The LLM and the SQL engine make a good team.

infecto|1 year ago

I’d bet that about 90% of software engineers today are just rewriting variations of what’s already been done. Most problems can be reduced to similar patterns. Of course, the quality of a model depends on its training data—if a library is new or the language isn’t widely used, the output may struggle. However, this is a challenge people are actively working on, and I believe it’s solvable.

LLMs are definitely suited for tasks of varying complexity, but like any tool, their effectiveness depends on knowing when and how to use them.

ben_w|1 year ago

> Current LLMs fail if what you're coding is not the most common of tasks

Succeeding on the most common tasks (which isn't exactly what you said) is identical to "they're useful".

bee_rider|1 year ago

> Not once did he need to ask me a question. When I asked him "how long did this take" and expected him to say "a few weeks" (it would have taken me - a far more experienced engineer - 2 months minimum).

> Current LLMs fail if what you're coding is not the most common of tasks. And a simple web app is about as basic as it gets.

These two complexity estimates don’t seem to line up.

fhd2|1 year ago

That's still valuable though: For problem validation. It lowers the table stakes for building any sort of useful software, which all start simple.

Personally, I just use the hell out of Django for that. And since tools like that are already ridiculously productive, I don't see much upside from coding assistants. But by and large, so many of our tools are so surprisingly _bad_ at this, that I expect the LLM hype to have a lasting impact here. Even _if_ the solutions aren't actually LLMs, but just better tools, since we reconfigured how long something _should_ take.

gambiting|1 year ago

I work in video games, I've tried several AI assistants for C++ coding and they are all borderline useless for anything beyond writing some simple for loops. Not enough training data to be useful I bet, but I guess that's where the disparity is - web apps, python....that has tonnes of publicly available code that it can train on. Writing code that manages GPU calls on a PS5? Yeah, good luck with that.

ilaksh|1 year ago

So you are basically saying "it failed on some of my Rust tasks, and those other languages aren't even real programming languages, so it's useless".

I've used LLMs to generate quite a lot of Rust code. It can definitely run into issues sometimes. But it's not really about complexity determining whether it will succeed or not. It's the stability of features or lack thereof and the number of examples in the training dataset.

ilaksh|1 year ago

Which model exactly? You understand that every few months we are getting dramatically better models? Did you try the one that came out within the last week or so (o1-preview).

Roark66|1 year ago

I can't understand how anyone can use these tools (copilot especially) to make entire projects from scratch and expand them later. They just lead you down the wrong path 90% of the time.

Personally I much prefer Chatgpt. I give it specific small problems to resolve and some context. At most 100 lines of code. If it gets more the quality goes to shit. In fact copilot feels like chatgpt that was given too much context.

sensanaty|1 year ago

I hear it all the time on HN that people are producing entire apps with LLMs, but I just don't believe it.

All of my experiences with LLMs have been that for anything that isn't a braindead-simple for loop is just unworkable garbage that takes more effort to fix than if you just wrote it from scratch to begin with. And then you're immediately met with "You're using it wrong!", "You're using the wrong model!", "You're prompting it wrong!" and my favorite, "Well, it boosts my productivity a ton!".

I sat down with the "AI Guru" as he calls himself at work to see how he works with it and... He doesn't. He'll ask it something, write an insanely comprehensive prompt, and it spits out... Generic trash that looks the same as the output I ask of it when I provide it 2 sentences total, and it doesn't even work properly. But he still stands by it, even though I'm actively watching him just dump everything he just wrote up for the AI and start implementing things himself. I don't know what to call this phenomenon, but it's shocking to me.

Even something that should be in its wheelhouse like producing simple test cases, it often just isn't able to do it to a satisfactory level. I've tried every one of these shitty things available in the market because my employer pays for it (I would never in my life spend money on this crap), and it just never works. I feel like I'm going crazy reading all the hype, but I'm slowly starting to suspect that most of it is just covert shilling by vested persons.

threeseed|1 year ago

> 20 year software engineering career is about to change

I have also been developing for 20+ years.

And have heard the exact same thing about IDEs, Search Engines, Stack Overflow, Github etc.

But in my experience at least how fast I code has never been the limiting factor in my project's success. So LLMs are nice and all but isn't going to change the industry all that much.

pluc|1 year ago

There will be a whole industry of people who fix what AI has created. I don't know if it will be faster to build the wrong thing and pay to have it fixed or to build the right thing from the get go, but after having seen some shit, like you, I have a little idea.

cml123|1 year ago

yes, but does your colleague even fully understand what was generated? Does he have a good mental map of the organization of the project?

I have a good mental map of the projects I work on because I wrote them myself. When new business problems emerge, I can picture how to solve them using the different components of those applications. If I hadn't actually written the application myself, that expertise would not exist.

Your colleague may have a working application, but I seriously doubt he understands it in the way that is usually needed for maintaining it long term. I am not trying to be pessimistic, but I _really_ worry about these tools crippling an entire generation of programmers.

alonsonic|1 year ago

AI assistants are also quite good at helping you create a high level map of a codebase. They are able to traverse the whole project structure and functionality and explain to you how things are organized and what responsibilities are. I just went back to an old project (didn't remember much about it) and used Cursor to make a small bug fix and it helped me get it done in no time. I used it to identify where the issue might be based on logs and then elaborate on potential causes before then suggesting a solution and implementing it. It's the ultimate pair programmer setup.

skywhopper|1 year ago

I wouldn’t even be so sure the application “works”. All we heard is that it has pretty UI and an API and a database, but does it do something useful and does it do that thing correctly? I wouldn’t be surprised if it totally fails to save data in a restorable way, or to be consistent in its behavior. It certainly doesn’t integrate meaningfully with any existing systems, and as you say, no human has any expertise in how it works, how to maintain it, troubleshoot it, or update it. Worse, the LLM that created it also doesn’t have any of that expertise.

n_ary|1 year ago

> I _really_ worry about these tools crippling an entire generation of programmers.

Isn’t that the point? Degrade the user long enough that the competing user is on-par or below the competence of the tool so that you now have an indispensable product and justification of its cost and existence.

P.S. This is what I understood from a lot of AI saints in news who are too busy parroting productivity gains without citing other consequences, such as loss of understanding of the task or expertise to fact-check.

svantana|1 year ago

Me too, but a more optimistic view is that this is just a nascent form of higher-level programming languages. Gray-beards may bemoan that us "young" developers (born after 1970) can't write machine code from memory, but it's hardly a practical issue anymore. Analogously, I imagine future software dev to consist mostly of writing specs in natural language.

gtvwill|1 year ago

I've coded python scripts that let me take csv data from hornresp and convert it to 3d models I can import into sketchup. I did two coding units at uni, so whilst I can read it... I can't write it from scratch to save my life. I can debug and fix scripts gpt gives me. I did the hornresp script in about 40 mins. It would have taken me weeks to learn what it produced.

I'm not a mathematician, hell i did general maths at school. Currently I've been talking through scripting a method to mix dsd audio files natively without converting to tradional pcm. I'm about to use gpt to craft these scripts. There is no way I could have done this myself without years of learning. Now all I have to do is wait half a day so I can use my free gpt o credits to code it for me (I'm broke af so can't afford subs). The productivity gains are insane. I'd pay for this in a heartbeat if I could afford it.

orwin|1 year ago

I really believe that the front-end part can be mostly automated (the html/CSS at least), copilot is close imho (microsoft+github, I used both), but really they're useless to do anything else complex without making to much calls, proposing bad data structures, using bad /old code design.

skydhash|1 year ago

The frontend part was already automated. We called it Dreamweaver and RAD tools.

JanSt|1 year ago

Copilot is pretty bad compared to cursor with sonnet. I have used Copilot for quite a long time so I can tell.

StefanWestfal|1 year ago

I am curiouse, how complex was the app? I use cursor too and am very satisfied with it. It seem that is very good at code that must have been written so many times before (think react components, node.js REST api endpoints etc.) but it starts to fall of when moving into specific domains.

And for me that is the best case scenario, it takes away the part we have to code / solve already solved problems again and again so we can focus more on the other parts of software engineering beyond writing code.

rurp|1 year ago

Fairly standard greenfield projects seem to be the absolute best scenario for an LLM. It is impressive, but that's not what most professional software development work is, in my experience. Even once I know what specifically to code I spend much more time ensuring that code will be consistent and maintainable with the rest of the project than with just getting it to work. So far I haven't found LLMs to be all that good at that sort of work.

skapadia|1 year ago

Did you take a look at the code generated? Was it well designed and amenable to extension / building on top of?

I've been impressed with the ability to generate "throw away" code for testing out an idea or rapidly prototyping something.

nativeit|1 year ago

Considering the current state of the industry, and the prevailing corporate climate, are you sure your job is about to get easier, or are you about to experience cuts to both jobs and pay?

insane_dreamer|1 year ago

The problem is that it only works for basic stuff for which there is a lot of existing example code out there to work with.

In niche situations it's not helpful at all in writing code that works (or even close). It is helpful as a quick lookup for docs for libs or functions you don't use much, or for gotchas that you might otherwise search StackOverflow for answers to.

It's good for quick-and-dirty code that I need for one-off scripts, testing, and stuff like that which won't make it into production.

apwell23|1 year ago

So what is his plan to fix all the bugs that claude hallucinated in the code ?

JanSt|1 year ago

I'm confident you have not used Cursor Composer + Claude 3.5 Sonnet. I'd say the level of bugs is no higher than that of a typical engineer - maybe even lower.

dagw|1 year ago

Claude is actually surprisingly good at fixing bugs as well. Feed it a code snippet and either the error message or a brief description of the problem and it will in many cases generate new code that works.

charlie0|1 year ago

Sounds like CRUD boilerplate. Sure, it's great to have AI build this out and it saves a ton of time, but I've yet to see any examples (online or otherwise) or people building complex business rules and feature sets using AI.

The sad part is beginners using the boilerplate code won't get any practice building apps and will completely fail at the complex parts of an app OR try to use AI to build it and it will be terrible code.

skywhopper|1 year ago

I hear these stories, and I have to wonder, how useful is the app really? Was it actually built to address a need or was it built to learn the coding tool? Is it secure, maintainable, accessible, deployable, and usable? Or is it just a tweaked demo? Plenty of demo apps have all those features, but would never serve as the basis for something real or meet actual customer needs.

SJC_Hacker|1 year ago

Yeah AI can give you a good base if its something thats been done before (which admittedly, 99% of SE projects are), especially in the target language.

Yeah, if you want tic-tac-toe or snake, you can simply ask ChatGPT and it will spit out something reasonable.

But this is not much better than a search engine/framework to be honest.

Asking it to be "creative" or to tweak existing code however ...

JanSt|1 year ago

Yes, the value of a single engineer can easily double. Even a junior - and it's much easier for them to ask Claude for help than the senior engineer on the team (low barrier for unblock).