On the other hand, we can also diagnose LLM itself: the activation value is their EEG, the gradient is their BOLD - if you are at the cost, you can even calculate their true variational free energy - that is, KL divergence.
"Don't just train your model, understand its mind."
No comments yet.