top | item 36055134 (no title) BorisTheBrave | 2 years ago I think you've misunderstood. Megabyte scales better with context window length. I don't know if they're saying the training data / compute are any more efficient. discuss order hn newest No comments yet.
No comments yet.