top | item 46984648

(no title)

mkw5053 | 18 days ago

Aider [0] wrote a piece about this [1] way back in Oct 2023!

I stumbled upon it in late 2023 when investigating ways to give OpenHands [2] better context dynamically.

[0] https://aider.chat/

[1] https://aider.chat/2023/10/22/repomap.html

[2] https://openhands.dev/

discuss

order

emporas|18 days ago

Aider's repomap is a great idea. I remember participating in the discussion back then.

The unfortunate thing for Python that the repomap mentions, and untyped/duck-typed languages, is that function signatures do not mean a lot.

When it comes to Rust, it's a totally different story, function and method signatures convey a lot of important information. As a general rule, in every LLM query I include maximum one function/method implementation and everything else is function/method signatures.

By not giving mindlessly LLMs whole files and implementations, I have never used more than 200.000 tokens/day, counting input and output. This counts as 30 queries for a whole day of programming, and costs less than a dollar per day not matter which model I use.

Anyway, putting the agent to build the repomap doesn't sound such a great idea. Agents are horribly inefficient. It is better to build the repomap deterministically using something like ast-grep, and then let the agent read the resulting repomap.

jared_stewart|18 days ago

Typed languages definitely provide richer signal in there signatures - and my experience has been that I get more reliable generations from those languages.

On the efficiency point, the agent isn't doing any expensive exploration here. There is a standalone server which builds and maintains the index, the agent is only querying it. So it's closer to the deterministic approach implemented in aider (at least in a conceptual sense) with the added benefit that the LLM can execute targeted queries in a recursive manner.

jared_stewart|18 days ago

Aider's repo-map concept is great! thanks for sharing, I'd not been aware of it. Using tree-sitter to give the LLM structural awareness is the right foundation IMO. The key difference is how that information gets to the model.

Aider builds a static map, with some importance ranking, and then stuffs the most relevant part into the context window upfront. That's smart - but it is still the model receiving a fixed snapshot before it starts working.

What the RLM paper crystallized for me is that the agent could query the structure interactively as it works. A live index exposed through an API lets the agent decide what to look at, how deep to go, and when it has enough. When I watch it work it's not one or two lookups but many, each informed by what the previous revealed. The recursive exploration pattern is the core difference.

anotherpaulg|18 days ago

Aider actually prompts the model to say if it needs to see additional files. Whenever the model mentions file names, aider asks the user if they should be added to context.

As well, any files or symbols mentioned by the model are noted. They influence the repomap ranking algorithm, so subsequent requests have even more relevant repository context.

This is designed as a sort of implicit search and ranking flow. The blog article doesn’t get into any of this detail, but much of this has been around and working well since 2023.

mohsen1|18 days ago

I am planning to add similar concepts to Yek. Either tree-sitter or ast-grep. Your work here and Aider's work would be my guiding prior art. Thank you for sharing!

https://github.com/mohsen1/yek

aitchnyu|17 days ago

Hey, are you planning to update docs for end users of your CLI? I was an Aider user who switched to Opencode but I want to experiment with token and time-efficient agents, and I'm assuming OpenHands is one.