top | item 40058866 (no title) euclaise | 1 year ago This one does have attention, it's just chunked into segments of 4096 discuss order hn newest cs702|1 year ago Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.
cs702|1 year ago Yes, but the claim is about "unlimited context length." I doubt attention over each segment can be as good at recall as attention over the full input context.
cs702|1 year ago