(no title)
jaehong747 | 11 months ago
I believe this phenomenon occurs because high-performance LLMs have probability distributions of future words already reflected in their neural networks, resulting in increased output values of LLM neurons (activation functions). It's something that happens during the process of predicting probability distributions for the next or future output token dictionaries.
No comments yet.