top | item 40739918

(no title)

atlex2 | 1 year ago

Might look small, but the needle in a haystack numbers they report in the model card addenda at 200k are also a massive improvement towards “Proving a negative”… I.e. your answer does not exist in your text. %99.7 vs 98.3 for Opus https://cdn.sanity.io/files/4zrzovbb/website/fed9cc193a14b84...

discuss

maeil|1 year ago

Could you explain how these two are related? That benchmark seems to be asking for very specific information inside a large body of text. For LLMs, that seems quite a different task compared to proving a negative. Any improvements on proving a negative would mean less hallucinations and would be a huge deal.