top | item 41386533 (no title) CuriousJ | 1 year ago This paper shows that 200-800 is the ideal chunk size; if you go above, the model starts getting confused / distracted. https://arxiv.org/pdf/2406.14497 discuss order hn newest zaptrem|1 year ago Makes sense. Thanks!
zaptrem|1 year ago