top | item 45719544

(no title)

ToJans | 4 months ago

A series of tokens is one-dimensional (a sequence). An image is 2-dimensional. What about 3D/4D/... representation (until we end up with an LLM-dimensional solution ofc).

discuss

dvt|4 months ago

This isn't exactly true, as tokens live in the embedding space, which is n-dimensional, like 256 or 512 or whatever (so you might see one word, but it's actually an array of a bunch of numbers). With that said, I think it's pretty intuitive that continuous tokens are more efficient than discrete ones, simply due to the fact that the LLM itself is basically a continuous function (with coefficients/parameters ∈ ℝ).

wongarsu|4 months ago

We call an embedding-space n-dimensional, but in this context I would consider it 1-dimensional, as in it's a 1d vector of n values. The terminology just sucks. If we described images the same way we describe embeddings a 2 megapixel image would have to be called 2-million-dimensional (or 8-million-dimensional if we consider rgba to be four separate values)

I would also argue tokens are outside the embedding space, and a large part of the magic of LLMs (and many other neural network types) is the ability to map sequences of rather crude inputs (tokens) into a more meaningful embedding space, and then map from a meaningful embedding space back to tokens we humans understand