top | item 41980089

(no title)

I've been using AI to solve isolated problems, mainly as a replacement of search engine specifically for programming. I'm still not convinced of these "write whole block of code for me" type of use case. Here's my arguments against the videos from the article.

1. Snake case to camelCase. Even without AI we can already complete these tasks easily. VSCode itself has command of "Transform to Camel Case" for selection. It is nice the AI can figure out which text to transform based on context, but not too impressive. I could select one ":, use "Select All Occurrences", press left, then ctrl+shift+left to select all the keys.

2. Generate boilerplate from documentation. Boilerplate are tedious, but not really time-consuming. How many of you spend 90% of time writing boilerplate instead of the core logic of the project? If a language/framework (Java used to be, not sure about now) requires me to spend that much time on boilerplate, that's a language to be ditched/fixed.

3. Turn problem description into a block of concurrency code. Unlike the boilerplate, these code are more complicated. If I already know the area, I don't need AI's help to begin with. If I don't know, how can I trust the generated code to be correct? It could miss a corner case that my question didn't specify, which I don't yet know existing myself. In the end, I still need to spend time learning Python concurrency, then I'll be writing the same code myself in no time.

In summary, my experience about AI is that if the question is easy (e.g. easy to find exactly same question in StackOverflow), their answer is highly accurate. But if it is a unique question, their accuracy drops quickly. But it is the latter case where we spend most of the time on.

discuss

scosman|1 year ago

I started like this. Then I came around and can’t imagine going back.

It’s kinda like having a really smart new grad, who works instantly, and has memorized all the docs. Yes I have to code review and guide it. That’s an easy trade off to make for typing 1000 tokens/s, never losing focus, and double checking every detail in realtime.

First: it really does save a ton of time for tedious tasks. My best example is test cases. I can write a method in 3 minutes, but Sonnet will write the 8 best test cases in 4 seconds, which would have taken me 10 mins of switching back and forth, looking at branches/errors, and mocking. I can code review and run these in 30s. Often it finds a bug. It’s definitely more patient than me in writing detailed tests.

Instant and pretty great code review: it can understand what you are trying to do, find issues, and fix them quickly. Just ask it to review and fix issues.

Writing new code: it’s actually pretty great at this. I needed a util class for config that had fallbacks to config files, env vars and defaults. And I wanted type checking to work on the accessors. Nothing hard, but it would have taken time to look at docs for yaml parsing, how to find the home directory, which env vars api returns null vs error on blank, typing, etc. All easy, but takes time. Instead I described it in about 20 seconds and it wrote it (with tests) in a few seconds.

It’s moved well past the stage “it can answer questions on stack overflow”. If it has been a while (a while=6 months in ML), try again with new sonnet 3.5.

DeathArrow|1 year ago

>My best example is test cases. I can write a method in 3 minutes, but Sonnet will write the 8 best test cases in 4 seconds

For me it doesn't work. Generated tests fail to run or they fail.

I work in large C# codebases and in each file I have lots of injected dependencies. I have one public method which can call lots of private methods in the same class.

AI either doesn't properly mock the dependencies, either ignores what happens in the private methods.

If I take a lot of time guiding it where to look, it can generate unit tests that pass. But it takes longer than if I write the unit tests myself.

tyre|1 year ago

I’ve found it better at writing tests because it tests the code you’ve written vs what you intended. I’ve caught logic bugs because it wrote tests with an assertion for a conditional that was backwards. The readable name of the test clearly pointed out that I was doing the wrong thing (the test passed?.)

scosman|1 year ago

Maybe a TLDR from all the issues I'm reading in this thread:

- It's gotten way better in the last 6 months. Both models (Sonnet 3.5 and new October Sonnet 3.5), and tooling (Cursor). If you last tried Co-pilot, you should probably give it another look. It's also going to keep getting better. [1]

- It can make errors, and expect to do some code review and guiding. However the error rates are going way way down [1]. I'd say it's already below humans for a lot of tasks. I'm often doing 2/3 iterations before applying a diff, but a quick comment like "close, keep the test cases, but use the test fixture at the top of the file to reduce repeated code" and 5 seconds is all it takes to get a full refactor. Compared to code-review turn around with a team, it's magic.

- You need to learn how to use it. Setting the right prompts, adding files to the context, etc. I'd say it's already worth learning.

- I just knows the docs, and that's pretty invaluable. I know 10ish languages, which also means I don't remember the system call to get an env var in any of them. It does, and can insert it a lot faster than I can google it. Again, you'll need to code review, but more and more it's nailing idiomatic error checking in each language.

- You don't need libraries for boilerplate tasks. zero_pad is the extreme/joke example, but a lot more of my code is just using system libraries.

- It can do things other tools can't. Tell it to take the visual style of one blog post and port to another. Take it to use a test file I wrote for style reference, and update 12 other files to follow that style. Read the README and tests, then write pydocs for a library. Write a GitHub action to build docs and deploy to GitHub pages (including suggesting libraries, deploy actions, and offering alternatives). Again: you don't blindly trust anything, you code review, and tests are critical.

[1] https://www.anthropic.com/news/3-5-models-and-computer-use

throwup238|1 year ago

> Instant and pretty great code review: it can understand what you are trying to do, find issues, and fix them quickly. Just ask it to review and fix issues.

Cursor’s code review is surprisingly good. It’s caught many bugs for me that would have taken a while to debug, like off by one errors or improperly refactored code (like changing is_alive to is_dead and forgetting to negate conditionals)

froobrad|1 year ago

This “really smart new grad” take is completely insane to me, especially if you know how LLMs work. Look at this SQL snippet Claude (the new Sonnet) generated recently.

    -- Get recipient's push token and sender's username
    SELECT expo_push_token, p.username 
    INTO recipient_push_token, sender_username
    FROM profiles p
    WHERE p.id = NEW.recipient_id;

Seems like the world has truly gone insane and engineers are tuned into some alternate reality a la Fox News. Well…it’ll be a sobering day when the other shoe falls.

globnomulous|1 year ago

> it can understand

It can't understand. That's not what LLMs do.

sebastiansm|1 year ago

What is the best workflow to code with an AI?

Copy and paste the code to the Claude website? Or use an extension? o something else?

scosman|1 year ago

Another fun example from yesterday: pasted a blog post in markdown into a HTML comment. Selected it and told sonnet to convert it to HTML using another blog post as a style reference.

Done in 5 seconds.

shinycode|1 year ago

I agree.

I replaced SO with cGPT and it’s the only good case I found. Finding an answer I build onto. But outsourcing my reflexion ? That’s a dangerous path. I tried on small projects to do that, building a project from scratch with cursor just to test it. Sometimes it’s right on spot but in many instances it misses completely some cases and edge cases. Impossible to trust blindly. And if I do so and not take proper time to read and think about the code the consequences pile up and make me waste time in the long run because it’s prompt over prompt over prompt to refine it and sometimes it’s not exactly right. That messes up my thinking and I prefer to do it myself and use it as a documentation on steroids. I never used google and SO again for docs. I have the feeling that relying on it to much to write even small blocs of code will make us loose some abilities in the long run and I don’t think that’s a good thing. Will companies allow us to use AI in code interviews for boilerplate ?

Moru|1 year ago

The AI's are to a large degree trained on tutorial code, quick examples, howto's and so on from the net. Code that really should come with a disclamer note: "Dont use in production, only example code.".

This leads to your code being littered with problematic edge-cases that you still have to learn how to fix. Or in worst case you don't even notice that there are edge cases because you just copy-pasted the code and it works for you. The edge cases your users will find with time.

bee_rider|1 year ago

I’m slightly worried that these AI tools will hurt language development. Boilerplate heavy and overly verbose languages are flawed. Coding languages should help us express things more succinctly, both as code writers and as code readers.

If AI tools let us vomit out boilerplate and syntax, I guess that sort of helps with the writing part (maybe. As long as you fully understand what the AI is writing). But it doesn’t make the resulting code any more understandable.

Of course, as is always the case, the tools we have now are the dumbest they’ll ever be. Maybe in the future we can have understandable AI that can be used as a programming language, or something. But AI as a programming language generator seems bad.

wry_discontent|1 year ago

I used to agree with this, but the proliferation of Javascript made me realize that newer/better programming languages were already not coming to save us.

kubanczyk|1 year ago

Maybe it's a rectangle between:

   seniors
   copilots
   juniors
   new languages

Wondering. Since the seniors pair with LLMs, world needs much less juniors. Some juniors will go away to other industries, but some might start projects in new languages without LLM/business support.

Frankly, otherwise I don't see how any new lang corpus might get created.

rco8786|1 year ago

Before you dismiss all of this because "You could do it by hand just as easily", you should actually try using Cursor. It only takes a few minutes to setup.

I'm only 2 weeks in but it's basically impossible for me to imagine going back now.

It's not the same as GH Copilot, or any of the other "glorified auto-complete with a chatbox" tools out there. It's head and shoulders better than everything else I have seen, likely because the people behind it are actual AI experts and have built numerous custom models for specific types of interactions (vs a glorified ChatGPT prompt wrapper).

viraptor|1 year ago

> I could select one ":, use "Select All Occurrences"

Only if it's the same occurrences. Cursor can often get the idea of what you want to do with the whole block of different names. Unless you're a vim macro master, it's not easily doable.

> How many of you spend 90% of time writing boilerplate instead of the core logic of the project?

It doesn't take much time, but it's a distraction. I'd rather tab through some things quickly than context switch to the docs, finding the example, adapting it for the local script, then getting back to what I was initially trying to do. Working memory in my brain is expensive.

JamesBarney|1 year ago

Disagree.

I still spend a good amount of time on boilerplate. Stuff that's not thinking hard about the problem I'm trying to solve. Stuff like units tests, error logging, naming classes, methods and variables. Claude is really pretty good at this, not as good as the best code I've read in my career but definitely better than average.

When I review sonnets code the code is more likely to be correct than if I review my own. If I make a mistake I'll read what I intended to write, and not what I actually wrote. Where as when I review sonnets there's 2 passes so the chance an error slips through is smaller.

unknown|1 year ago

[deleted]

pylua|1 year ago

Unit tests are boiler plate ?

Sateeshm|1 year ago

Completely agree. I find it fails miserably at business logic, which is where we spend most of our time on. But does great at generic stuff, which is already trivial to find on stack overflow.

PUSH_AX|1 year ago

This might be a promoting issue, my experience is very different, I’ve written entire services using it.

rco8786|1 year ago

> But does great at generic stuff, which is already trivial to find on stack overflow.

The major difference is that with Cursor you just hit "tab", and that thing is done. Vs breaking focus to open up a browser, searching SO, finding an applicable answer (hopefully), translating it into your editor, then reloading context in your head to keep moving.

_thisdot|1 year ago

My experience has been different. My major use case for AI tools these days is writing tests. I've found that the generated test cases are very much in line with the domain. It might be because we've been strictly using domain driven design principles It even generates test cases that fail to show what we've missed

Hedepig|1 year ago

Have you had a go with the o1 range of models?

theshrike79|1 year ago

I have a corporation-sponsored subscription to Github CoPilot + Rider

When I'm writing unit tests or integration tests it can guess the boilerplate pretty well.

If I already have a AddUserSucceeds test and I start writing `public void Dele...` it usually fills up the DeleteUserSucceeds function with pretty good guesses on what Asserts I want there - most times it even guesses the API path/function correctly because it uses the whole project as context.

I can also open a fresh project I've never seen and ask "Where is DbContext initialised" and it'll give me the class and code snippet directly.

whatever1|1 year ago

Have you tried recently to start a new web app from scratch? Specially the integration of frontend framework with styling and the frontend backend integration.

Oh my god get ready to waste a full weekend just to setup everything and get a formatted hello world.

earthnail|1 year ago

That’s why I use Rails for work. But I also had to write a small Nodejs project (vite/react + express) recently for a private project, and it has a lot of nice things going for it that make modern frontend dev really easy - but boy is it time consuming to set up the basics.

exe34|1 year ago

that's an indictment of the proliferation of shitty frameworks and documentation. it's not hard to figure out such a combination and then keep a template of it lying around for future projects. you don't have to reach for the latest and shiniest at the start of every project.

CalRobert|1 year ago

I’ve been pretty happy with vite, chakra, and Postgrest lately

dkersten|1 year ago

Most frontend frameworks come with usable templates. Setting up a new Vite React project and getting to a formatted hello world can be done in half an hour tops.

eru|1 year ago

> Boilerplate are tedious, but not really time-consuming.

In the aggregate, almost no programmer can think up code faster than they can type it in. But being a better typist still helps, because it cuts down on the amount you have to hold in your head.

Similar for automatically generating boilerplate.

> If I don't know, how can I trust the generated code to be correct?

Ask the AI for a proof of correctness. (And I'm only half-joking here.)

In languages like Rust the compiler gives you a lot of help in getting concurrency right, but you still have to write the code. If the Rust compiler approves of some code (AI generated or artisanally crafted), you are already pretty far along in concurrency right.

A great mind can take a complex problem and come up with a simple solution that's easy to understand and obviously correct. AI isn't quite there yet, but getting better all the time.

enord|1 year ago

> In the aggregate, almost no programmer can think up code faster than they can type it in.

And thank god! Code is a liability. The price of code is coming down but selling code is almost entirely supplanted by selling features (SaaS) as a business model. The early cloud services have become legacy dependencies by now (great work if you can get it). Maintaining code is becoming a central business concern in all sectors governed by IT (i.e. all sectors, eating the world and all that).

On a per-feature basis, more code means higher maintenance costs, more bugs and greater demands on developer skills and experience. Validated production code that delivers proven customer value is not something you refactor on a whim (unless you plan to go out of business), and the fact that you did it in an evening thanks to ClippyGPT means nothing—-the costly part is always what comes after: demonstrating value or maintaing trust in a competitive market with a much shallower capital investment moat.

Mo’ code mo’ problems.

mewpmewp2|1 year ago

> In the aggregate, almost no programmer can think up code faster than they can type it in. But being a better typist still helps, because it cuts down on the amount you have to hold in your head.

I mean on the big picture level sure they can. Or in detail if it is something that they have good experience with. In many cases I get a visual of the whole code blocks, and then if I use copilot I can already predict what it is going to auto complete for me based on the context and then I can pretty much in a second know if it was right or wrong. Of course it is more so for the side projects since I know exactly what I want to do and so it feels most of the time it is having to just vomit all the code out. And I feel impatient, so copilot helps a lot with that.

unknown|1 year ago

[deleted]

never_inline|1 year ago

100%. Useful cases include

* figuring out how to X in an API - eg "write method dl_file(url, file) to download file from url using requests in a streaming manner"

* Brainstorming which libraries / tools / approaches exist to do a given task. Google can miss some. AI is a nice complement for Google.

dexwiz|1 year ago

I don’t even trust the API based exercises anymore unless it’s a stable and well documented API. Too many times I’ve been bitten by an AI mixing and matching method signatures from different versions, using outdated approaches, mixing in apis from similar libraries, or just completely hallucinating a method. Even if I load the entire library docs into the context, I haven’t found one that’s completely reliable.

yunwal|1 year ago

> Snake case to camelCase > VSCode itself has command of "Transform to Camel Case"

I never understand arguments like this. I have no idea what the shortcut for this command is. I could learn this shortcut, sure, but tomorrow I’ll need something totally different. Surely people can see the value of having a single interface that can complete pretty much any small-to-medium-complexity data transformation. It feels like there’s some kind of purposeful gaslighting going on about this and I don’t really get the motive behind it.

cyral|1 year ago

Exactly. I think some commenters are taking this example too literally. It's not about this specific transformation, but how often you need to do similar transformations and don't know the exact shortcut or regex or whatever to make it happen. I can describe what I want in three seconds and be done with it. Literal dropbox.png going on in this thread.

EGreg|1 year ago

If you aren’t using AI for everything, you’re using it wrong. Go learn how to use it better. It’s your job to find out how. Corporations are going to use it to replace your job.

(Just kidding. I’m just making fun of how AI maxis reply to such comments, but they do it more subtly.)

johnisgood|1 year ago

Boilerplate comes up all the time when writing Erlang with OTP behaviors though, and sometimes you have no idea if it really is the right way or not. There are Emacs skeletons for that (through tempo), but feels like they are sometimes out of date.

dkersten|1 year ago

1. Is such a taste task for me anyway that I don’t lose much just doing it by hand

2. The last time I wrote boilerplate heavy Java code, 15+ years ago, the IDE already generated most of it for me. Nowadays boilerplate comes in two forms for me: new project setup, which I find it far quicker to use a template or just copy and gut an existing project (and it’s not like I start new projects that often anyway), or new components that follow some structure, where AI might actually be useful but I tend to just copy an existing one and gut it.

3. These aren’t tasks I really trust AI for. I still attempt to use AI for them, but 9 out of 10 times come away disappointed. And the other 1 time end up having to change a lot of it anyway.

I find a lot of value from AI, like you, asking it SO style questions. I do also use it for code snippets, eg “do this in CSS”. Its results for that are usually (but not always) reasonably good. I also use it for isolated helper functions (write a function to flood fill a grid where adjacent values match was a recent one). The results for this range from a perfect solution first try, to absolute trash. It’s still overall faster than not having AI, though. And I use it A LOT for rubber ducking.

I find AI is a useful tool, but I find a lot of the positive stories to be overblown compared to my experience with it. I also stopped using code assistants and just keep a ChatGPT tab open. I sometimes use Claude but it’s conversation length limits turned me off.

Looking at the videos in OP, I find the parallelising task to be exactly the kind of tricky and tedious task that I don’t trust AI to do, based on my experience with that kind of task, and with my experience with AI and the subtly buggy results it has given me.

Myrmornis|1 year ago

Have you tried Cursor, or is this just your guess at what your evaluation would be?

yen223|1 year ago

Don't mean to be rude, but was this comment written with an LLM?

greenie_beans|1 year ago

have you tried using cursor or claude?

Writingdorky|1 year ago

[deleted]