(no title)
guerython | 3 days ago
One metric that helped us was retrieval precision@k against a small gold set of "must-return" facts from prior sessions. Drift there showed degradation earlier than latency/token metrics.
If you haven’t already, adding write-amplification + duplicate-rate tracking is useful too. We found many systems look healthy while gradually filling with near-duplicate notes that poison recall.
sukinai|3 days ago
Retrieval precision@k against a small gold set is a very strong suggestion. That feels like a much better early warning signal than just latency or token usage, because those can look fine while memory quality is quietly degrading.
Write amplification and duplicate-rate tracking also make a lot of sense. Near-duplicate buildup is exactly the kind of thing that makes a memory system look healthy on the outside while slowly poisoning recall underneath.
I have basic duplicate detection in /nemp:health, but I haven’t framed it yet in terms of retrieval quality metrics the way you described. That’s a really good direction. Thank you