top | item 45493465

(no title)

yalok | 4 months ago

The paper only mentions evals for Ruler 4K and 16K - I wish they’d go further and measure for longer context windows. I was wondering if there would be some gain as compared to baseline (no compression) for this method - their results for Qwen with Ruler 16K seem to allude to that - at small compression ratios the evals look better than baseline - which means they are not just improving inference speed/memory, but addressing attenuation dilution problem…

discuss

No comments yet.