top | item 46703442

(no title)

I'm so happy someone else says this, because I'm doing exactly the same. I tried to use agent mode in vs code and the output was still bad. You read simple things like: "We use it to write tests". I gave it a very simple repository, said to write tests, and the result wasn't usable at all. Really wonder if I'm doing it wrong.

discuss

kace91|1 month ago

I’m not particularly proAI but I struggle with the mentality some engineers seem to apply to trying.

If you read someone say “I don’t know what’s the big deal with vim, I ran it and pressed some keys and it didn’t write text at all” they’d be mocked for it.

But with these tools there seems to be an attitude of “if I don’t get results straight away it’s bad”. Why the difference?

Macha|1 month ago

There isn't a bunch of managers metaphorically asking people if they're using vim enough, and not so many blog posts proclaiming vim as the only future for building software

alkonaut|1 month ago

I don't understand how to get even bad results. Or any results at all. I'm at a level where I'm going "This can't just be me not having read the manual".

I get the same change applied multiple times, the agent having some absurd method of applying changes that conflict with what I say it like some git merge from hell and so on. I can't get it to understand even the simplest of contexts etc.

It's not really that the code it writes might not work. I just can't get past the actual tool use. In fact, I don't think I'm even at the stage where the AI output is even the problem yet.

neumann|1 month ago

I agree to a degree, but I am in that camp. I subscribe to alphasignal, and every morning there are 3 new agent tools, and two new features, and a new agentic approach, and I am left wondering, where is the production stuff?

galaxyLogic|1 month ago

Well one could say that since it's AI, AI should be able to tell us what we're doing wrong. No?

AI is supposed to make our work easier.

chewz|1 month ago

Some people shouldn't just be engineers in the first place, I guess.

embedding-shape|1 month ago

You didn't actually just say "write tests" though right? What was the actual prompt you used?

I feel like that matters more than the tooling at this point.

I can't really understand letting LLMs decide what to test or not, they seem to completely miss the boat when it comes to testing. Half of them are useless because they duplicate what they test, and the other half doesn't test what they should be testing. So many shortcuts, and LLMs require A LOT of hand-holding when writing tests, more so than other code I'd wager.

Balinares|1 month ago

There are a lot of comments on HN and other places breathlessly gushing about agents totally doing everything end to end, so I couldn't blame someone new to this space for naively assuming that agents would be able to handle a well-bounded problem such as test coverage reasonably well.

prettygood|1 month ago

No, that was an exaggeration. The prompt was decent. I explained the point of the repository, that I wanted full coverage with tests, that it could keep going until it worked. Maybe that was still not enough. With how others talk about it, I must be missing something.

threecheese|1 month ago

“Write tests“ may not be enough; provide it with a test harness, and instruct it to “write tests until they pass “. Next would be “your feature isn’t complete without N% coverage”. These require the ‘agentic’ piece, which is at its simplest some prompts run in a loop until an exit condition is met.

tasuki|1 month ago

> I gave it a very simple repository, said to write tests, and the result wasn't usable at all. Really wonder if I'm doing it wrong.

I think so. The humans should be writing the spec. The AI can then (try to) make the tests pass.

sixtyj|1 month ago

No, you have similar experience as a lot of people have.

LLMs just fail (hallucinate) in less known fields of expertise.

Funny: Today I have asked Claude to give me syntax how to run Claude Code. And its answer was totally wrong :) So you go to documentation… and its parts are obsolete as well.

LLM development is in style “move fast and break things”.

So in few years there will be so many repos with gibberish code because “everybody is coder now” even basketball players or taxi drivers (no offense, ofc, just an example).

It is like giving F1 car to me :)

agumonkey|1 month ago

you need to write a test suite to check his test generation (soft /s)