(no title)
taklaxbr | 2 months ago
To answer your question about v6/Phi-2: It uses a session-based RAM residency approach rather than a background daemon or per-request loading.
When you toggle the offline mode (or if it starts in that mode), the OfflineModelManager class loads the weights into memory once. Since the shell runs in a continuous while True loop, the model stays 'hot' in RAM for the duration of that session.
This eliminates the cold-start latency for every error correction, making the 'self-healing' feel instantaneous. The trade-off is, of course, the sustained RAM usage while the shell is open, but I found this preferable to waiting 10+ seconds for a re-load on every command failure.
charlesding2024|2 months ago
taklaxbr|2 months ago