top | item 44801713

(no title)

coltonv | 6 months ago

Appreciate the comment!

> I mentioned in another comment the major flaw in your productivity calculation, is that you aren’t accounting for the work that wouldn’t have gotten done otherwise. That’s where my improvements are almost universally coming from. I can improve the codebase in ways that weren’t justifiable before in places that do not suffer from the coordination costs you rightly point out.

I'm a bit confused by this. There is work that apparently is unlocking big productivity boosts but was somehow not justified before? Are you referring to places like my ESLint rule example, where eliminating the startup costs of learning how to write one allows you to do things you wouldn't have previously bothered with? If so, I feel like I covered this pretty well in the article and we probably largely agree on the value that productivity boost. My point is still stands that that doesn't scale. If this is not what you mean, feel free to correct me.

Appreciate your thoughts on hallucinations. My guess is the difference between what we're experiencing is that in your code hallucinations are still happening but getting corrected after tests are run, whereas my agents typically get stuck in these write-and-test loops and can't figure out how to solve the problem, or it "solves" it by deleting the tests or something like that. I've seen videos and viewed open source AI PRs which end up in similar loops as to what I've experienced, so I think what I see is common.

Perhaps that's an indication of that we're trying to solve different problems with agents, or using different languages/libraries, and that explains the divergence of experiences. Either way, I still contend that this kind of productivity boost is likely going to be hard to scale and will get tougher to realize as time goes on. If you keep seeing it, I'd really love to hear more about your methods to see what I'm missing. One thing that has been frustrating me is that people rarely share their workflows after makign big claims. This is unlike previous hype cycles where people would share descriptions of exactly what they did ("we rewrote in Rust, here's how we did it", etc.) Feel free to email me at the address in my about page[1] or send me a request on LinkedIn or whatever. I'm being 100% genuine that I'd love to learn from you!

[1] https://colton.dev/about/

discuss

kasey_junk|6 months ago

> but getting corrected after tests are run, whereas my agents typically get stuck in these write-and-test loops

This maybe a definition problem then. I don’t think “the agent did a dumb thing that it can’t reason out of” is a hallucination. To me a hallucination is a pretty specific failure mode, it invents something that doesn’t exist. Models still do that for me but the build test loop sets them aright on that nearly perfectly. So I guess the model is still hallucinating but the agent isn’t so the output is unimpacted. So I don’t care.

For the agent is dumb scenario, I aggressively delete and reprompt. This is something I’ve actually gotten much better at with time and experience, both so it doesn’t happen often and I can course correct quickly. I find it works nearly as well for teaching me about the problem domain as my own mistakes do but is much faster to get to.

But if I were going to be pithy. Aggressively deleting work output from an agent is part of their value proposition. They don’t get offended and they don’t need explanations why. Of course they don’t learn well either, that’s on you.

coltonv|6 months ago

What I'm saying is that the model will get into one of these loops where it needs to be killed, and I'll look at some of the intermediate states and the reasons for failure and they are because it hallucinated things, ran tests, got an error. Does that make sense?

Deleting and re-prompting is fine. I do that too. But even one cycle of that often means the whole prompting exercise takes me longer than if I just wrote the code myself.

samtp|6 months ago

> One thing that has been frustrating me is that people rarely share their workflows after making big claims

Good luck ever getting that. I've asked that about a dozen times on here from people making these claims and have never received a response. And I'm genuinely curious as well, so I will continue asking.

tptacek|6 months ago

People share this stuff all the time. Kenton Varda published a whole walkthrough[1], prompts and all. Stories about people's personal LLM workflows have been on the front page here repeatedly over the last few months.

What people aren't doing is proving to you that their workflows work as well as they say they do. You want proof, you can DM people for their rate card and see what that costs.

[1] https://news.ycombinator.com/item?id=44159166