(no title)
darkxanthos | 8 months ago
My thinking now is removed from the gory details and is a step or two up. How can I validate the changes are working? Can I understand this code? How should it be structured so I can better understand it? Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?
Last night I had a file with 38 mypy errors. I turned it over to the agent and went and had a conversation with my wife for 15 minutes. I came back, it summarized the changes it made and why, I debated one of the changes with it but ultimately decided it was right.
Mypy passed. Good to go.
I'm currently trying to get my team to really understand the power here. There's a lot of skeptics and the AI still isn't perfect and people who are against the AI era will latch onto that as validation but it's exactly opposite the correct reaction. It's really validation because as a friend of mine says
"Today is the worst day you will have with this technology for the rest of your life."
ajdidbdbsgs|8 months ago
Fixing type checker errors should be one the least time consuming things you do. This was previously consuming a lot of your time?
A lot of the AI discourse would be more effective if we could all see the actual work one another is doing with it (similar to the cloudflare post).
diggan|8 months ago
Yes, this is a frequent problem both here and everywhere else. The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details. Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on. People just write out "Wow fast model" or stuff like that, and call it a day.
Although I understand why, every comment be huge if everyone always add sufficient context. I don't know the solution to this, but it does frustrate me.
dimal|8 months ago
These tools have turned out to be great at this stuff. I don’t think I’ve turned over any interesting problems to an LLM and had it go well, but by using them to take care of drudgery, I have a lot more time to think about the interesting problems.
I would suggest that instead of asking people to post their work, try it out on whatever bullshit tasks you’ve been avoiding. And I specifically mean “tasks”. Stuff where the problem has already been solved a thousand times before.
bubblyworld|8 months ago
GardenLetter27|8 months ago
diggan|8 months ago
I think the excellent error messages in Rust also help as much humans as it does LLMs, but some of the weaker models get misdirected by some of the "helpful" tips, like some error message suggest "Why don't you try .clone here?" when the actual way to address the issue was something else.
redman25|8 months ago
andnand|8 months ago
SparkyMcUnicorn|8 months ago
Write good base instructions for your agent[0][1] and keep them up to date. Have your agent help you write and critique it.
Start tasks by planning with your agent (e.g. "do not write any code."), and have your agent propose 2-3 ways to implement what you want. Jumping straight into something with a big prompt is hit or miss, especially with increased task complexity. Planning also gives your agent a chance to read and understand the context/files/code involved.
Apologies if I'm giving you info you're already aware of.
[0] https://code.visualstudio.com/docs/copilot/copilot-customiza...
[1] Claude Code `/init`
namaria|8 months ago
But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.
The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?
It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.
8note|8 months ago
when it has a work plan, track the workplan as a checklist that it fills out as it works.
you can also atart your conversations by asking it to summarize the code base
polskibus|8 months ago
apwell23|8 months ago
I can easily tell from git history which commits were heavily AI generated
sitkack|8 months ago
Your employer, if it is not you, will now expect this level of output.
km144|8 months ago
Forgive my ignorance, but is this just a file you're adding to the context of every agent turn or this a formal convention in the VS code copilot agent? And I'm curious if there's any resources you used to determine the structure of that document or if it was just a refinement over time based on mistakes the AI was repeating?
jnwatson|8 months ago
It is the same stuff you'd tell a new developer on your team: here are the design docs, here are the tools, the code, and this is how you build and test, and here are the parts you might get hung up on.
In hindsight, it is the doc I should have already written.
namaria|8 months ago
Why do we trust corporations to keep making things better all of a sudden?
The most jarring effect of this hype cycle is that all appear to refers to some imaginary set of corporate entities.