top | item 45668961

(no title)

achr2 | 4 months ago

My assumption is that there could be higher dimensional 'token' representations that can be used instead of vision tokens (though obviously not human interpretable). I wonder if there is an optimisation for compression that takes the vector space at a specific level in the network to provide the most context for minimal memory space.

discuss

No comments yet.