top | item 46759989

Google's Gist: Greedy Independent Set Thresholding for Retrieval Explained

2 points| aggeeinn | 1 month ago |websiteaiscore.com

1 comment

aggeeinn|1 month ago

OP here.

I’ve been digging into the Fahrbach/Ramalingam paper (NeurIPS 2025) on GIST. The core finding suggests Google is moving away from pure ranking toward 'max-min diversity' sampling for AI Overviews, primarily to reduce the compute cost of processing redundant tokens in RAG.

We ran a Python simulation (code in the post) to test the exclusion radius. It seems that if a draft has high semantic overlap (cosine similarity > ~0.85) with a seed node (like Wikipedia), it gets mathematically filtered out of the selection set regardless of domain authority.

Curious to hear if anyone working in search/retrieval is seeing this hard filtering in production yet, or if it's still just research-side.