top | item 46724008

(no title)

mckennameyer | 1 month ago

Interesting approach with the cascade. How do you decide when to escalate from fuzzy matching to LLM?

discuss

order

parad0x0n|1 month ago

So fuzzy matching only makes sense if you expect two columns having the same data more or less, otherwise you can skip that step.

And then you have to pick a threshold -> if similarity of strings is above that threshold, it's a match, otherwise, not. Threshold should be high to prevent false positives. LLM will take care of the non-matches