Snorting the AGI with Claude Code

rbren|8 months ago

I’m biased [0], but I think we should be scripting around LLM-agnostic open source agents. This technology is changing software development at its foundations—-we need to ensure we continue to control how we work.

[0] https://github.com/all-hands-ai/openhands

robotbikes|8 months ago

This looks like a good resource. There are some pretty powerful models that will run on a Nvidia 4090 w/ 24gb of RAM. Devstral and Queen 3. Ollama makes it simple to run them on your own hardware, but the cost of the GPU is a significant investment. But if you are paying $250 a month for a proprietary tool it would pay for itself pretty quickly.

handfuloflight|8 months ago

But what do we do if the closed models are just better?

ProofHouse|8 months ago

This 10000%

tinyhouse|8 months ago

This article is a bit all over the place. First, a slide deck to describe a codebase is not that useful. There's a reason why no one ever uses a slide deck for anything besides supporting an oral presentation.

Most of these things in the post aren't new capabilities. The automation of workflows is indeed valuable and cool. Not sure what AGI has anything to do with it.

bravesoul2|8 months ago

Also I don't trust it. They touched on that I think (I only skimmed).

Plus you shouldn't need an LLM to understand a codebase. Just make it more understandable! Of course capital likes shortcuts and hacks to get the next feature out in Q3.

sandos|8 months ago

The number one thing I have found LLMs useful for is producing mermaidjs diagrams of code. Now, I know they are not always perfect but it has been "good enough" very many times, and I have never seen hallucinations here, only omissions. If I notice something missing its super-easy to tell it to amend.

Uehreka|8 months ago

> Not sure what AGI has anything to do with it.

Judging from the tone of the article, they’re using the term AGI in a jokey way and not taking themselves too seriously, which is refreshing.

I mean like, it wouldn’t be refreshing if the article didn’t also have useful information, but I do actually think a slide deck could be a useful way to understand a codebase. It’s exactly the kind of nice-to-have that I’d never want a junior wasting time on, but if it costs like $5 and gets me something minorly useful, that’s pretty cool.

Part of the mind-expanding transition to using LLMs involves recognizing that there are some things we used to dislike because of how much effort they took relative to their worth. But if you don’t need to do the thing yourself or burn through a team member’s time/sanity doing it, it can make you start to go “yeah fuck it, trawl the codebase and try to write a markdown document describing all of the features and requirements in a tabular format. Maybe it’ll go better than I expect, and if it doesn’t then on to something else.”

jumski|8 months ago

Great article! I have similar observations and techniques and Claude Code is exceptionally good - most of the days I'm working on multiple things at once (thanks to git worktrees) and each going faster than ever - that's really crazy.

For the "sub agents"thing, I must admit, that Claude Code calling o3 via sigoden/aichat saved me countless of times!

There are just issues that o3 excells at (race conditions, bug hunting - anything that requires lot of context and really high reasoning abilities).

But I'm using it less since Opus 4 came out. And of course its none of the sub-agent thing at all.

I use this prompt @included in the main CLAUDE.md: https://github.com/pgflow-dev/pgflow/blob/main/.claude/advan...

sigoden/aichat: https://github.com/sigoden/aichat

_1tem|8 months ago

wait what? how do you work on multiple things at once with git worktrees?

jasonthorsness|8 months ago

The terminal really is sort of the perfect interface for an LLM; I wonder whether this approach will become favored over the custom IDE integrations.

ed_mercer|8 months ago

Exactly. It has access to literally everything including any MCP server. It's so awesome having claude code check my database using a read-only user, or have it open a puppeteer browser and check whether its CSS changes look weird or not. It's the perfect interface and anthropic nailed it.

It can even debug my k8s cluster using kubectl commands and check prometheus over the API, how awesome is this?

drcode|8 months ago

sort of, except I think the future of llms will be to to have the llm try 5 separate attempts to create a fix in parallel, since llm time is cheaper than human time... and once you introduce this aspect into the workflow, you'll want to spin up multiple containers, and the benefits of the terminal aren't as strong anymore.

ldjkfkdsjnv|8 months ago

as the models get better, IDEs will be seen as low level

mountainriver|8 months ago

What??? It’s literally the worst interface

Do you not want to edit your code after it’s generated?

blahgeek|8 months ago

Asking it to explain rust borrow checker is one of the worst examples to demonstrate its ability to read code. There are piles of that in its training data.

dundarious|8 months ago

Agreed, ask it to explain how exceptions are handled in python asyncio tasks, even given all the code, and it will vacillate like the worst intern in the world. What's more, there's no way to "teach" it, and even if there was, it would not last beyond the current context.

A complete waste of time for important but relatively simple tasks.

unknown|8 months ago

[deleted]

gilbetron|8 months ago

"There are piles of that in its training data"

Such a weird complaint. If you were to explain the rust borrow checker to me, should I complain that it doesn't count because you had read explanations of the borrow checker? That it was "in your training data"? I mean, do you think you just understand the borrow checker without being taught about it in some form?

I mean, I get what you are kind of saying, that there isn't much evidence that they tools are able to generate new ideas, and that the sheer amount of knowledge it has obscures the detection of that phenomenon, but practically speaking I don't care because it is useful and helpful (within its hallucinatory framework).

bionhoward|8 months ago

Assuming attention to detail is one of the best signs people give a fuck about craftsmanship, isn’t the fact the Anthropic legal terms are logically impossible to satisfy a bad sign for their ability to be trusted as careful stewards of ASI?

Not exactly “three laws safe” if we can’t use the thing for work without violating their competitive use prohibition

alwa|8 months ago

I can’t speak for their legal department, but their product, Claude Code, bears signs of lavish attention to detail. Right down to running Haiku on the context to come up with cute appropriate verbs for the “working…” indicators.

abhisheksp1993|8 months ago

``` claude --dangerously-skip-permissions # science mode ```

This made me chuckle

SamPatt|8 months ago

>Claude code feels more powerful than cursor, but why? One of the reasons seems it's ability to be scripted. At the end of the day, cursor is an editor, while claude code is a swiss army knife (on steroids).

Agreed, and I find that I use Claude Code on more than traditional code bases. I run it in my Obsidian vault for all kinds of things. I run it to build local custom keyboard bindings with scripts that publish screenshots to my CDN and give me a markdown link, or to build a program that talks to Ollama to summarize my terminal commands for the last day.

I remember the old days of needing to figure out if the formatting changes I wanted to make to a file were sufficient to build a script or just do them manually - now I just run Claude in the directory and have it done for me. It's useful for so many things.

Aeolun|8 months ago

The thing is, Claude Code only works if you have the plan. It’s impossible to use it on the API, and it makes me wonder if $100/month is truly enough. I use it all day every day now, and I must be consuming a whole lot more than my $100 is worth.

jjice|8 months ago

I'm very interested to hear what your uses cases are when using it in your Obsidian Vault

cpard|8 months ago

How do you script Claude code? I've been using it as a CLI but haven't thought of invoking Claude code through a script, sounds very interesting.

AstroBen|8 months ago

I had an LLM sort a crap-tonne of my notes into category folders the other day. My god that was helpful

AstroBen|8 months ago

Side note but the contrast between background and text here makes this really hard to read

thunkle|8 months ago

For me it's the blinking cursor at the top... It's hard to focus on the text.

jsjohnst|8 months ago

You aren’t missing much if you just skip it

Syzygies|8 months ago

No mention of Opus there or here (so far).

Having tried everything I settled on a $100/month Anthropic "Max" plan to use Claude Code. Then I learned how Claude Opus 4 is currently their best but most expensive model for my situation (math code and research). I limited out of a five hour session, switched to their API, and burned $20 in an hour. So I upgraded to $200/month "Max" and haven't hit limits yet.

Models matter. All these stories are like "I met a person who wasn't that smart." Duh!

beigebrucewayne|8 months ago

All of this was with Opus.

dirtbag__dad|8 months ago

This article is inspiring. I haven’t had the moment to get my head out of the Cursor + biz logic water until now. Very cool to think about LLMs automagically creating changelogs, testing packaging when dependencies are bumped, forcing unit tests on features.

Is anyone aware of something like this? Maybe in the GitHub actions or pre-commit world?

pjm331|8 months ago

https://docs.anthropic.com/en/docs/claude-code/github-action...

citizenpaul|8 months ago

>automagically creating changelogs, testing packaging when dependencies are bumped, forcing unit tests on features.

Yeah now companies that paid lip service to those things can still not have them but pretend they do cause the AI did it....

dweinus|8 months ago

> Is it Shakespeare? No.

It's at least decent though, right?

> "What emerged over these seven days was more than just code..."

Yeesh, ok, but is it accurate?

> Over time this will likely degrade the performance and truthfulness

Sure, but it's cheap right?

> $250 a month.

Well at least it's not horrible for the environment and built on top of massive copyright violations, right?

Right?

tom_m|8 months ago

Well, there will always be a job for programmers folks.

citizenpaul|8 months ago

>openai codex (soon to be rewritten in rust)

Lol, I guess their AI is too good for a redactor. Better have humans do it.

rikschennink|8 months ago

I tried to read this on mobile but the blinking cursor makes it impossible.

beigebrucewayne|8 months ago

Removed it! I agree it was distracting.

fullstackchris|8 months ago

Gonna be a bit blunt here and ask why hooking up an agentic CLI tool to one or more other software tool(s) is the top post on HN right now... sure, some of these ideas are interesting but at the end of the day literally all of them have been explored / revisited by various MCP tools (or can be done more or less in scripted / hacked ways as the author shows here)

I don't know, just feels like a weird community response to something that is the equivalent to me of bash piping...

mjrbrennan|8 months ago

Not trying to be rude here, but that `last_week.md` is horrible to me. I can't imagine having to read that let alone listen to the computer say it to me. It's so much blah blah and fluff that reads like a bad PR piece. I'd much rather scan through commits of the last week.

I've found this generally with AI summaries...usually their writing style is terrible, and I feel like I cannot really trust them to get the facts right, and reading the original text is often faster and better.

never_inline|8 months ago

Here's a system prompt I tend to use

    ## Instructions
    * Be concise
    * Use simple sentences. But feel free to use technical jargon.
    * Do NOT overexplain basic concepts. Assume the user is technically proficient.
    * AVOID flattering, corporate-ish or marketing language. Maintain a neutral viewpoint.
    * AVOID vague and / or generic claims which may seem correct but are not substantiated by the the context.

Cannot completely avoid hallucinations and it's good to avoid AI for text that's used for human-to-human communication. But this makes AI answers to coding and technical questions easier to read.

WD-42|8 months ago

I felt the same thing about the onboarding. Like what future are we trying to build for ourselves here, exactly? The kind where instead of sitting down with a coworker to learn about a codebase, instead we get an ai generated PowerPoint to read alone????

Im so over this timeline.

fennecfoxy|8 months ago

Yup, you can always tell LLMs just from the ridiculous output most of the time. Like 8-20 sentences minimum, for the most basic thing.

Even Gemini/gpt4o/etc are all guilty of this. Maybe they'll tighten things up at some point - if I ask an assistant a simple question like "is it possible to put apples into a pie?" what I want is "Yes, it is possible to put apples into a pie. Would you like to know more?"

But not "Yes, absolutely — putting apples into a pie is not only possible, it's classic! Apple pie is one of the most well-known and traditional fruit pies. Typically, sliced apples are mixed with sugar, cinnamon, nutmeg, and sometimes lemon juice or flour, then baked inside a buttery crust. You can use various types of apples depending on the flavor and texture you want (like Granny Smith for tartness or Honeycrisp for sweetness). Would you like a recipe or tips on which apples work best?" (from gpt4).

fullstackchris|8 months ago

Yeah I was done at "What happened here was more than just code..." -_-

ozim|8 months ago

Python, a journey that began with an initial commit and evolved through a series of careful refinements to establish a robust foundation for the project..

Wow yeah what a waste. That is exactly the opposite of saving time.

block_dagger|8 months ago

You can specify desired style in the prompt. The author seems to like PR sounding fluff while making morning coffee.

unknown|8 months ago

[deleted]

TeMPOraL|8 months ago

If this was meant to be read, I might've agreed, but:

1) This was supposed to be piped through TTS and listened to in the background, and...

2) People like podcasts.

Your typical podcast is much worse than this. It's "blah blah" and "hahaha <interaction>" and "ooh <emoting>" and "<irrelevant anecdote>" and "<turning facts upside down and injecting a lie for humorous effect>", and maybe some of the actual topic mixed in between, and yet for some reason, people love it.

I honestly doubt this specific thing would be useful for me, but I'm not going to assume it's plain dumb, because again, podcasts are worse, and people love it.

TZubiri|8 months ago

Remember the sycophant bug? Maybe making the user FEELGOOD is part of what makes it feel smart or like a good experience. Is the reward function being smart? Is it maximizing interaction? Does it conflict with being accurate?

rsynnott|8 months ago

Yeah, I honestly don't know how anyone can put up with reading this sort of thing, much less have it read to them by a computer(!)

I suppose preferences differ, but really, does anyone _like_ this sort of writing style?

beigebrucewayne|8 months ago

I agree, it's atrocious!

1. I shouldn't have used a newly created repo that had no real work over the course of the last week.

2. I should have put more time into the prompt to make it sound less nails on chalkboard.

hoppp|8 months ago

First time I heard about marp, very handy tool

jvanderbot|8 months ago

I can't wait until Section 174 changes are repealed and nobody is financially invested in software from AI anymore.

tra3|8 months ago

Thank you, finally a realistic take.

eru|8 months ago

America = world?

unknown|8 months ago

[deleted]

distortionfield|8 months ago

Unrelated; but I am absolutely in love with this blog theme and color scheme.

BoredPositron|8 months ago

If people would be as patient and inventive to teach junior devs as they are with llms the whole industry would be better of.

sorcerer-mar|8 months ago

You pay junior devs way way way more money for the privilege of them being bad.

And since they're human, the juniors themselves do not have the patience of an LLM.

I really would not want to be a junior dev right now... Very unfair and undesirable situation they've landed in.

qsort|8 months ago

The vilification of juniors and the abandonment of the idea that teaching and mentoring are worthwhile are single-handedly making me speedrun burnout. May a hundred years of Microsoft Visio befall anybody who thinks that way.

godelski|8 months ago

A constant reminder: you can't have wizards without having noobs.

Every wizard was once a noob. No one is born that way, they were forged. It's in everybody's interest to train them. If they leave, you still benefit from the other companies who trained them, making the cost equal. Though if they leave, there's probably better ways to make them stay that you haven't considered (e.g. have you considered not paying new juniors more than your current junior that has been with the company for a few years? They should be able to get a pay bump without leaving)

jayofdoom|8 months ago

I spent a lot of time in my career, honestly some of the most impactful stuff I've done, mentoring college students and junior developers. I think you are dead on about the skills being very similar. Being verbose, not making assumptions about existing context, and generalized warnings against pitfalls when doing the sort of thing you're asking it to do goes a long long way.

Just make sure you talk to Claude in addition to the humans and not instead of.

unknown|8 months ago

[deleted]

handfuloflight|8 months ago

[deleted]

dwohnitmok|8 months ago

On the one hand very cool.

On the other hand, every time people are just spinning off sub-agents I am reminded of this: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...

It's simultaneously the obvious next step and portends a potentially very dangerous future.

TeMPOraL|8 months ago

> It's simultaneously the obvious next step

As it has been over three years ago, when that was originally published.

I'm continuously surprised both by how fast the models themselves evolve, and how slow their use patterns are. We're still barely playing with the patterns that were obvious and thoroughly discussed back before GPT-4 was a thing.

Right now, the whole industry is obsessed with "agents", aka. giving LLMs function calls and limited control over the loop they're running under. How many years before the industry will get to the point of giving LLMs proper control over the top-level loop and managing the context, plus an ability to "shell out" to "subagents" as a matter of course?

lubujackson|8 months ago

Am I the only one who saw in the prompt:

> ${SUGESTION}

And recognized it wouldn't do anything because of a typo? Alas, my kind is not long for this world...

unknown|8 months ago

[deleted]

b0a04gl|8 months ago

[deleted]

intralogic|8 months ago

[deleted]

CGamesPlay|8 months ago

In general, "reader mode". I don't use Chrome but Google suggests that it's in a menu <https://support.google.com/chrome/answer/14218344?hl=en>. Many Chrome-alikes provide it built-in (Brave calls it Speedreader), and many extensions can add it for you (Readability was the OG one).

unknown|8 months ago

[deleted]

konexis007|8 months ago

.

jilles|8 months ago

How does this compare with Apples or Orange?

johnwheeler|8 months ago

I've actually stumbled upon a novel new way of using Claude code that I don't think anybody else is doing that's insanely better. I'll release it soon.

throwawayoldie|8 months ago

"...but the proof is too large to fit in this margin."

aussieguy1234|8 months ago

I played around with agents yesterday, now I'm hooked.

I got Claude Code (With CLine and VSCode) to do a task for a personal project. It did it about 5x faster than i'd have been able to do manually including running bash commands e.g. to install dependencies for new npm packages.

These things can do real work. If you have things in plain text format like markdown, csv spreadsheets etc, alot of what normal human employees do today could be somewhat automated.

You currently still need a human to supervise the agent and what its doing, but that won't be needed anymore in the not so distant future.

242 comments