top | item 46526628

(no title)

danfritz | 1 month ago

Every time I see a post like this on HN I try again and every time I come to the same conclusion. I have never see one agent managing to pull something off that I could instantly ship. It still ends up being very junior code.

I just tried again and ask Opus to add custom video controls around ReactPlayer. I started in Plan mode which looked overal good (used our styling libs, existing components, icons and so on).

I let it execute the plan and behold I have controls on the video, so far so good. I then look at the code and I see multiple issues: Over usage of useEffect for trivial things, storing state in useState which should be computed at run time, failing to correctly display the time / duration of the video and so on...

I ask follow up question like: Hide the controls after 2 seconds and it starts introducing more useEffects and states which all are not needed (granted you need one).

Cherry on the cake, I asked to place the slider at the bottom and the other controls above it, it placed the slider on the top...

So I suck at prompting and will start looking for a gardening job I guess...

discuss

thewillowcat|1 month ago

These posts are never, never made by someone who is responsible for shipping production code in a large, heavily used application. It's always someone at a director+ level who stopped production coding years ago, if they ever did, and is tired of their engineers trying to explain why something will take more than an hour.

moduspol|1 month ago

It is also often low-proficiency developers with their minds blown over how quickly they can build something using frameworks / languages they never wanted to learn or understand.

Though even that group probably has some overlap with yours.

weitendorf|1 month ago

Back in the day when you found a solution to your problem on Stackoverflow, you typically had to make some minor changes and perhaps engage in some critical thinking to integrate it into your code base. It was still worth looking for those answers, though, because it was much easier to complete the fix starting from something 90% working than 0%.

The first few times in your career you found answers that solved your problem but needed non-trivial changes to apply it to your code, you might remember that it was a real struggle to complete the fix even starting from 90%. Maybe you thought that ultimately, that stackoverflow fix really was more trouble than it was worth. And then the next few times you went looking for answers on stackoverflow you were better at determining what answers were relevant to your problem/worth using, and better at going from 90% to 100% by applying their answers.

Still, nobody really uses stackoverflow anymore: https://blog.pragmaticengineer.com/stack-overflow-is-almost-...

You and most of the rest of us are all actively learning how to use their replacement

eudamoniac|1 month ago

> it was much easier to complete the fix starting from something 90% working than 0%.

As an expert now though, it is genuinely easier and faster to complete the work starting from 0 than to modify something junky. The realplayer example above I could do much faster, correctly, than I could figure out what the AI code was trying to do with all the effects and refactor it correctly. This is why I don't use AI for programming.

And for the cases where I'm not skilled, I would prefer to just gain skill, even though it takes longer than using the AI.

Forgeties79|1 month ago

The difference is you’re generally retooling for your purpose rather than scouring for multiple, easily avoidable screw ups that if overlooked will cause massive headaches later on.

ammut|1 month ago

I've spent quite a bit of time with Codex recently and come to the conclusion that you can't simply say "Let's add custom video controls around ReactPlayer." You need to follow up with a set of strict requirements to set expectations, guard rails, and what the final product should do (and not do). Even then it may have a few issues, but continuing to prompt with clearly stated problems that don't meet the requirements (or you forgot to include) usually clears it up.

Code that would have taken me a week to write is done in about 10 minutes. It's likely on average better than what I could personally write as a novice-mid level programmer.

strobe|1 month ago

>You need to follow up with a set of strict requirements to set expectations, guard rails, and what the final product should do (and not do).

that usually very hard to do part, and is was possible to spent few days on something like that in real word before LLMs. But with LLMs is worse because is not enough to have those requirements, some of those won't work for random reasons and is no any 'rules' that can grantee results. It always like 'try that' and 'probably this will work'.

Just recently I was struggled with same prompt produced different result between API calls before I realized that just usage of few more '\"' and few spaces in prompt leaded model to completely different route of logic which produced opposite answers.

danfritz|1 month ago

By the time I have figured out all those quirks and guardrails I could have done it myself in 45min tops.

lucianbr|1 month ago

It sounds like it takes you at least 10 minutes to just write the prompt with all the details you mentioned. Especially if you need to continue and prompt again (and again?).

enraged_camel|1 month ago

I find anecdotes like yours bewildering, because I've been using Opus with Vue.js and it crushes everything I throw at it. The amount of corrections I need to make tend to be minimal, and mostly cosmetic.

The tasks I give it are not trivial either. Just yesterday I had it create a full-blown WYSIWYG editor for authoring the content we serve through our app. This is something that would have taken me two weeks, give or take. Opus looked at the content definitions on the server, queried the database for examples, then started writing code and finished it in ~15 minutes, and after another 15-20 minutes of further prompting for refinement, it was ready to ship.

alternatex|1 month ago

Created a WYSIWYG editor or copied it off the internet like your average junior would, bugs included?

If that editor is very complicated (as they usually are) it makes sense to just opt for a library. If it's simple then AI is not required and would only reduce familiarity with how it works. The third option is what you did and I feel like it's the option with the lowest probability of ending up with a quality solution.

gejose|1 month ago

I used to run into this quite a bit until I added an explicit instruction in CLAUDE.md to the effect of:

> Be thoughtful when using `useEffect`. Read docs at https://react.dev/learn/you-might-not-need-an-effect to understand if you really need an effect

phn|1 month ago

Have you tried Roo Code in "Orchestrator" mode? I find it generally "chews" the tasks I give it to then spoon feed into sub-tasks in "Code" (or others) mode, leaving less room to stray from very focused "bite-sized" changes.

I do need to steer it sometimes, but since it doesn't change a lot at a time, I can usually guide the agent and stop the disaster before it spreads.

A big caveat is I haven't tried heavy front-end stuff with it, more django stuff, and I'm pretty happy with the output.

andai|1 month ago

I have a vanilla JS project. I find that very small llms are able to work on it with no issue. (Including complete rewrites.) But I asked even large LLMs to port it to React and they all consistently fail. Basic functionality broken, rapid memory leaks.

So I just stuck with vanilla JS.

n = 1 but React might not be a great thing to test this stuff with. For the man and the machine! I tried and failed to learn React properly like 8 times but I've shipped multiple full stack things in like 5 other languages no problem.

doubleorseven|1 month ago

usually for me, after a good plan is 90% solid working code. the problem do arise when you ask it to change the colors it choose of light grey text over a white background. this thing still can't see and it's a huge drawback for those who got used to just prompting away their problems

mnky9800n|1 month ago

I always assume the person either didn't use coding agents in a while or its their first time. don't get me wrong, i love claude code, but my students are still better at getting stuff done that i can just approve and not micromanage. thats what i think everyone is missing from their commentary. you have to micromanage a coding agent. you don't have to micromanage a good student. when you dont need to micromanage anymore at all, that's when the floor falls out and everyone has a team of agents doing whatever they want to make them all billionaires or whatever it is AI is promising to do those days.

PaulHoule|1 month ago

Around a Uni I think a lot about what students are good at and what they aren't good at.

I wouldn't even think about hiring a student to do marketing work. They just don't understand how hard it is to break through people's indifference and lack the hustle. I want 10-100x more than I get out of them.

Photos in The Cornell Daily Sun make me depressed. Students take a step out the door, take a snap, then upload it. I think the campus is breathtakingly beautiful and students just don't do the work to take good photos that show it.

In coding it is across the map. Even when I am happy with the results they still do the first 80% that takes another 80% to put in front of customers. I can be really proud of how it turned out in the end despite them missing the point of the design document they were handed.

I was in a game design hackathon where most of the winners were adults or teams with an adult on them. My team won player's choice. I'll take credit for my startup veteran talent of fearlessly demonstrating broken software on stage and making it look great and doing project management with that in mind. One student was solid on C# and making platformers in Unity. I was the backup programmer who worked like a junior other than driving them crazy slowing them down with relentlessly practical project management. The other student made art that fit our game.

We were at each other's throats at the end and shocked that we won. I think I understood the value everybody brought but I'm not sure my teammates did.

hu3|1 month ago

Yep. It sucks. People are delusional. Let's ignore LLMs and carry on...

On a more serious note:

1) Split tasks into smaller tasks just like a human would do

Would you bash your keyboard for an hour, adding all video controls at once before even testing if anything works at all? Ofc not. You would start by adding a slider and test it until you are satisfied. Then move to next video control. An so on. LLMs are the same. Sometimes they can one-shot many related changes in a single prompt but the common reality is what you experienced: it works sometimes but the code is suboptimal.

2) Document desireable and undesireable coding patterns in AGENTS.md (or CLAUDE.md)

If you found over usage of useEffect, document it on AGENTS.md so next time the LLM knows your preference.

I have been using LLMs since Sonet 3.5 for large enterprise projects (1kk+ lines of code, 1k+ database tables). I just don't ask it to "draw the rest of owl" as the saying goes.

jf22|1 month ago

So? Getting a months' worth of junior level code in an hour is still unbelievable.

danfritz|1 month ago

Whats the improvement here? I spend more time fixing it then doing it myself anyways. And I have less confidence in the code Opus generates

animanoir|1 month ago

[deleted]