(no title)
gormen
|
5 days ago
Most interpretability methods fail for LLMs because they try to explain outputs without modeling the intent, constraints, or internal structure that produced them.
Token‑level attribution is useful, but without a framework for how the model reasons, you’re still explaining shadows on the wall.
adebayoj|5 days ago
codeflo|5 days ago
IshKebab|5 days ago
The demo just says "Wikipedia" or "ArXiV". That's pretty broad and maybe not that useful. Can it get more specific than that, like the actual pages?