top | item 46716564

(no title)

AmiteK | 1 month ago

I agree - that’s on me for the wording. I’m not claiming repeatability of agent inference or LLM sessions.

By “repeatability” I mean the extraction itself: given the same repo state + config, the derived semantic artifact is identical every time. That gives CI and agents a stable reference point, but it doesn’t make agent behavior deterministic.

The value is in not having to re-infer structure from raw source each run - not in making inference runs repeatable.

discuss

order

verdverm|1 month ago

> The value is in not having to re-infer structure

Except this is not the case, it has to re-infer structure on every new session, you are providing an index to supposedly speed that up. But the model still has to infer something from your index as it processes the tokenized version, it's not automagically injected into its understanding.

I agree having something like this helps a lot. I don't agree that auto generating it from the code and providing that comprehensive list to the model is helpful. I tried across this whole spectrum, from none at all to as detailed as you are. There is a point where these indexes become more noise than helpful, which is why (1) I keep hounding on evals, because mine show a different conclusion than the one you are making (2) having a curated version of this in the agents.md files was more than sufficient to noticably improve performance, format doesn't matter much

The other drawback I've experienced from doing this is that the model tends to go look up things based on the index even though it doesn't need it for the task at hand. It ends up making more tool calls and spending more tokens in the long run.