top | item 41926909

(no title)

SEGyges | 1 year ago

tokens are on average four characters and the number of residual streams (and therefore RAM) the LLM allocates to a given sequence is proportionate to the number of units of input. the flops is proportionate to their square in the attention calculation.

you can hypothetically try to ameliorate this by other means, but if you just naively drop from tokenization to character or byte level models this is what goes wrong

discuss

p1esk|1 year ago

4x seq length expansion doesn’t sound that bad.

lechatonnoir|1 year ago

I mean, it's not completely fatal, but it means an approximately 16x increase in runtime cost, if I'm not mistaken. That's probably not worth trying to solve letter counting in most applications.