top | item 44458861

(no title)

wooders | 8 months ago

I think the "memory blocks" are essentially what you are describing - to have an infinite session (which systems like Letta is designed for) you have to have a mechanism for organizing the important information and persisting it for future interactions. This organization can be done via tool calls (which was what MemGPT did) or done by other agents in the background. While the message buffer is continues to grow / old messages get evicted, the memory blocks are fixed size and always in context.

discuss

alganet|8 months ago

Your answer is too vague for the details I asked.

I could design an autoexec.bat to remember the programs that were opened after reboot, all automatically. If I open something, it goes there. If I close, I remove it from autoexec.bat. MacOS does this. But that's not really the persistence that saves me time and money. MacOS is good because _I rarely need to reboot it_, and the "reopen windows after reboot" option is barely used.

There's one question I placed there that perfectly encapsulates my doubts:

_Can I use this "context engineering" to mitigate the costs of the time for first token?_

If I cannot, then it's just like rebooting an OS, and it is merely the illusion of persistance. I can totally do this on my own just like I can craft hacky autoexec.bat scripts, nothing special about it.

I've seen attempts at doing "snapshotting" of parts of a GPU memory, which are similar to pausing a VM after boot and then restoring it. That's also not what I'm talking about, and it is just an optimization on the process of rebooting and does not improve much on the time for first token (there's a time penalty either way).