(no title)
charlesding2024 | 2 months ago
A technical question on the v6 offline support (Phi-2): Does the CLI keep the model loaded in RAM in the background (daemon mode), or does it have to load the weights from disk every time an error occurs? I'm curious about the latency trade-off for that 'instant fix' feel.
taklaxbr|2 months ago
To answer your question about v6/Phi-2: It uses a session-based RAM residency approach rather than a background daemon or per-request loading.
When you toggle the offline mode (or if it starts in that mode), the OfflineModelManager class loads the weights into memory once. Since the shell runs in a continuous while True loop, the model stays 'hot' in RAM for the duration of that session.
This eliminates the cold-start latency for every error correction, making the 'self-healing' feel instantaneous. The trade-off is, of course, the sustained RAM usage while the shell is open, but I found this preferable to waiting 10+ seconds for a re-load on every command failure.
charlesding2024|2 months ago