top | item 43563270

(no title)

jdcasale | 11 months ago

I recently tried Cursor for about a week and I was disappointed. It was useful for generating code that someone else has definitely written before (boilerplate etc), but any time I tried to do something nontrivial, it failed no matter how much poking, prodding, and thoughtful prompting I tried.

Even when I tried to ask it for stuff like refactoring a relatively simple rust file to be more idiomatic or organized, it consistently generated code that did not compile and was unable to fix the compile errors on 5 or 6 repromptings.

For what it's worth, a lot of SWE work technically trivial -- it makes this much quicker so there's obviously some value there, but if we're comparing it to a pair programmer, I would definitely fire a dev who had this sort of extremely limited complexity ceiling.

It really feels to me (just vibes, obviously not scientific) like it is good at interpolating between things in its training set, but is not really able to do anything more than that. Presumably this will get better over time.

discuss

dughnut|11 months ago

If you asked a junior developer to refactor a rust program to be more idiomatic, how long would you expect that to take? Would you expect the work to compile on the first try?

I love Cline and Copilot. If you carefully specify your task, provide context for uncommon APIs, and keep the scope limited, then the results are often very good. It’s code completion for whole classes and methods or whole utility scripts for common use cases.

Refactoring to taste may be under specified.

jdcasale|11 months ago

"If you asked a junior developer to refactor a rust program to be more idiomatic, how long would you expect that to take? Would you expect the work to compile on the first try?"

The purpose of giving that task to a junior dev isn't to get the task done, it's to teach them -- I will almost always be at least an order order of magnitude faster than a junior for any given task. I don't expect juniors to be similarly productive to me, I expect them to learn.

The parent comment also referred to a 'competent pair programmer', not a junior dev.

My point was that for the tasks that I wanted to use the LLM, frequently there was no amount of specificity that could help the model solve it -- I tried for a long time, and generally if the task wasn't obvious to me, the model generally could not solve it. I'd end up in a game of trying to do nondeterministic/fuzzy programming in English instead of just writing some code to solve the problem.

Again I agree that there is significant value here, because there is a ton of SWE work that is technically trivial, boring, and just eats up time. It's also super helpful as a natural-language info-lookup interface.

Retric|11 months ago

What matters here is the communication overhead not how long between responses. If I’m indefinitely spending more time handholding a jr dev than they save me eventually I just fire em, same with code gen.