(no title)
Moosdijk | 2 months ago
Instead of big models that “brute force” the right answer by knowing a lot of possible outcomes, this model seems to come to results with less knowledge but more wisdom.
Kind of like having a database of most possible frames in a video game and blending between them instead of rendering the scene.
omneity|2 months ago
ctoa|2 months ago
The notion of context window applies to the sequence, it doesn't really affect that, each iteration sees and attends over the whole sequence.
nl|2 months ago
I'm not sure what you mean here, but there isn't a difference in the number of times a model runs during inference.
Moosdijk|2 months ago
unknown|2 months ago
[deleted]