"Confidently drifting into a structurally stable manifold" — that's the harder failure mode. The geometry looks clean because the model isn't collapsing, it's just navigating to a coherent but factually detached region of latent space. Geometric drift detection is blind to it by design.
This is exactly where output-layer validation becomes non-negotiable rather than complementary. If the structural signal is absent, the semantic signal is all you have. Rule-based checks on absolute claims, unattributed statistics, and banned assertions catch the confident hallucination that geometry misses entirely.
The hybrid architecture you're describing — SIB for process detection, rule-based for output validation — covers both failure modes. Worth also considering confidence scoring at the rule level: flagging outputs where the geometric signal is clean but semantic violations cluster, since that pattern may itself be a reliable indicator of the stable-manifold drift you're investigating.
Solid research direction.
No comments yet.