top | item 36055134

(no title)

BorisTheBrave | 2 years ago

I think you've misunderstood. Megabyte scales better with context window length. I don't know if they're saying the training data / compute are any more efficient.

discuss

order

No comments yet.