Yea but the goal it not to bloat the context space.
Here you "waste" context by providing non usefull information.
What they did instead is put an index of the documentation into the context, then the LLM can fetch the documentation. This is the same idea that skills but it apparently works better without the agentic part of the skills.
Furthermore instead of having a nice index pointing to the doc, They compressed it.
chr15m|1 month ago
Their approach is still agentic in the sense that the LLM must make a tool cool to load the particular doc in. The most efficient approach would be to know ahead of time which parts of the doc will be needed, and then give the LLM a compressed version of those docs specifically. That doesn't require an agentic tool call.
Of course, it's a tradeoff.
bmitc|1 month ago
therealpygon|1 month ago
bagels|1 month ago
PKop|1 month ago
LLM's have always been at any time limited in the amount of tokens it can process at one time. This is increasing, but one problem is chat threads continually increase in size as you send messages back and forth because within any session or thread you are sending the full conversation to the LLM every message (aside from particular optimizations that compact or prune this). This also increases costs which are charged per token. Efficiency of cost and performance/precision/accuracy dictates using the context window judiciously.