I don't think recall really addresses it sufficiently: the main issue I see is answers getting "muddy". Like it's getting pulled in too many directions and averaging.
Page 8 of the technical paper [1] is especially informative.
The first chart (Cumulative Average NLL for Long Documents) shows a deviation from the trend and an increase in accuracy when working with >=1M tokens. The 1.0 graph is overlaid and supports the experience of 'muddiness'.
westoncb|2 years ago
a_wild_dandan|2 years ago
andy_ppp|2 years ago
tcdent|2 years ago
The first chart (Cumulative Average NLL for Long Documents) shows a deviation from the trend and an increase in accuracy when working with >=1M tokens. The 1.0 graph is overlaid and supports the experience of 'muddiness'.
[1] https://storage.googleapis.com/deepmind-media/gemini/gemini_...
moffkalast|2 years ago
[deleted]