top | item 44256178

(no title)

I stumbled into Agentic Coding in VS Code Nightlys with co-pilot using Claude Sonnet 4 and I've been silly productive. Even when half my day is meetings, you wouldn't be able to tell from my git history.

My thinking now is removed from the gory details and is a step or two up. How can I validate the changes are working? Can I understand this code? How should it be structured so I can better understand it? Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?

Last night I had a file with 38 mypy errors. I turned it over to the agent and went and had a conversation with my wife for 15 minutes. I came back, it summarized the changes it made and why, I debated one of the changes with it but ultimately decided it was right.

Mypy passed. Good to go.

I'm currently trying to get my team to really understand the power here. There's a lot of skeptics and the AI still isn't perfect and people who are against the AI era will latch onto that as validation but it's exactly opposite the correct reaction. It's really validation because as a friend of mine says

"Today is the worst day you will have with this technology for the rest of your life."

discuss

ajdidbdbsgs|8 months ago

> Last night I had a file with 38 mypy errors

Fixing type checker errors should be one the least time consuming things you do. This was previously consuming a lot of your time?

A lot of the AI discourse would be more effective if we could all see the actual work one another is doing with it (similar to the cloudflare post).

diggan|8 months ago

> AI discourse would be more effective if we could all see the actual work one another is doing with it

Yes, this is a frequent problem both here and everywhere else. The discussions need to include things like exact model version, inference parameters, what system prompt you used, what user prompt, what code you gave it, what exactly it replied and so much more details, as currently almost every comment is "Well, I used Sonnet last week and it worked great" without any details. Not to mention discussions around local models missing basic stuff like what quantization (if any) and what hardware you're running it on. People just write out "Wow fast model" or stuff like that, and call it a day.

Although I understand why, every comment be huge if everyone always add sufficient context. I don't know the solution to this, but it does frustrate me.

dimal|8 months ago

I don’t know about fixing python types, but fixing typescript types can be very time consuming. A LOT of programming work is like this —- not solving anything interesting or difficult, but just time-consuming drudgery.

These tools have turned out to be great at this stuff. I don’t think I’ve turned over any interesting problems to an LLM and had it go well, but by using them to take care of drudgery, I have a lot more time to think about the interesting problems.

I would suggest that instead of asking people to post their work, try it out on whatever bullshit tasks you’ve been avoiding. And I specifically mean “tasks”. Stuff where the problem has already been solved a thousand times before.

bubblyworld|8 months ago

For me comments are for discussions, not essays - from my perspective you went straight into snark about the parent's coding abilities, which kinda kills any hope of a conversation.

GardenLetter27|8 months ago

I trust it more with Rust than Python tbh, because with Python you need to make sure it runs every code path as the static analysis isn't as good as clippy + rust-analyzer.

diggan|8 months ago

I agree, had more luck with various models writing Rust than Python, but only in the case where they have tools available so one way or another it can run `cargo check` and see the nice errors, otherwise it's pretty equal between the two.

I think the excellent error messages in Rust also help as much humans as it does LLMs, but some of the weaker models get misdirected by some of the "helpful" tips, like some error message suggest "Why don't you try .clone here?" when the actual way to address the issue was something else.

redman25|8 months ago

That's true typed languages seem to handle the slop better. One thing I've noticed specifically with rust is that agents tend to overcomplicate things though. They tend to start digging into the gnarlier bits of the language much quicker than they probably need to.

andnand|8 months ago

Whats your workflow? Ive been playing with Claude Code for personal use. Usually new projects for experimentation. We have Copilot licenses through work so I've been playing around with VS Code agent mode for the last week. Usually using 3.5, 3.7 Sonnet or 04-mini. This is in a large Go project. Its been abysmal at everything other than tests. I've been trying to figure out if I'm just using the tooling wrong but I feel like I've tried all the "best practices" currently. Contexts, switching models for planning and coding, rules, better prompting. Nothings worked so far.

SparkyMcUnicorn|8 months ago

Switch to using Sonnet 4 (it's available in VS Code Insiders for me at least). I'm not 100% sure but a Github org admin and/or you might need to enable this model in the Github web interface.

Write good base instructions for your agent[0][1] and keep them up to date. Have your agent help you write and critique it.

Start tasks by planning with your agent (e.g. "do not write any code."), and have your agent propose 2-3 ways to implement what you want. Jumping straight into something with a big prompt is hit or miss, especially with increased task complexity. Planning also gives your agent a chance to read and understand the context/files/code involved.

Apologies if I'm giving you info you're already aware of.

[0] https://code.visualstudio.com/docs/copilot/copilot-customiza...

[1] Claude Code `/init`

namaria|8 months ago

I really don't get it. I've tested some agents and they can generate boilerplate. It looks quite impressive if you look at the logs, actually seems like an autonomous intelligent agent.

But I can run commands on my local linux box that generate boilerplate in seconds. Why do I need to subscribe to access gpu farms for that? Then the agent gets stuck at some simple bug and goes back and forth saying "yes, I figured out and solved it now" and it keeps changing between two broken states.

The rabid prose, the Fly.io post deriding detractors... To me it seems same hype as usual. Lots of words about it, the first few steps look super impressive, then it gets stuck banging against a wall. If almost all that is said is prognostication and preaching, and we haven't seen teams and organizations racing ahead on top of this new engine of growth... maybe it can't actually carry loads outside of the demo track?

It can be useful. Does it merit 100 billion dollar outlays and datacenter-cum-nuclear-powerplant projects? I hardly think so.

8note|8 months ago

make sure it writes a requirements and design doc for the change its gonna make, and review those. and, ask it to ask you questions about where there's ambiguity, and to record those responses.

when it has a work plan, track the workplan as a checklist that it fills out as it works.

you can also atart your conversations by asking it to summarize the code base

polskibus|8 months ago

My experiments with copilot and Claude desktop via mcp on the same codebase suggest that copilot is trimming the context much more than desktop. Using the same model the outputs are just less informed.

apwell23|8 months ago

> you wouldn't be able to tell from my git history.

I can easily tell from git history which commits were heavily AI generated

sitkack|8 months ago

> Even when half my day is meetings, you wouldn't be able to tell from my git history.

Your employer, if it is not you, will now expect this level of output.

km144|8 months ago

> Is there more we can add to the AI conventions markdown in the repo to guide the Agent to make fewer mistaken assumptions?

Forgive my ignorance, but is this just a file you're adding to the context of every agent turn or this a formal convention in the VS code copilot agent? And I'm curious if there's any resources you used to determine the structure of that document or if it was just a refinement over time based on mistakes the AI was repeating?

jnwatson|8 months ago

I just finished writing one. It is essentially the onboarding doc for your project.

It is the same stuff you'd tell a new developer on your team: here are the design docs, here are the tools, the code, and this is how you build and test, and here are the parts you might get hung up on.

In hindsight, it is the doc I should have already written.

namaria|8 months ago

> "Today is the worst day you will have with this technology for the rest of your life."

Why do we trust corporations to keep making things better all of a sudden?

The most jarring effect of this hype cycle is that all appear to refers to some imaginary set of corporate entities.