top | item 46724008

(no title)

Interesting approach with the cascade. How do you decide when to escalate from fuzzy matching to LLM?

discuss

So fuzzy matching only makes sense if you expect two columns having the same data more or less, otherwise you can skip that step.

And then you have to pick a threshold -> if similarity of strings is above that threshold, it's a match, otherwise, not. Threshold should be high to prevent false positives. LLM will take care of the non-matches