top | item 44839230

(no title)

ellenhp | 6 months ago

There are a few piece of this that rely on proprietary data, especially the FastText training step, so that's a dead-end unfortunately (would love to be proven wrong). I'd consider subbing in a small bert model with a classifier head for something FOSS without access to tons of user data, but then you lose the ability to serve high qps.

discuss

order

mips_avatar|6 months ago

I guess not having that would only breaking forward geocoding from an address?

ellenhp|6 months ago

My guess is that they're using FastText for semantic search, so it's more likely to break queries like "coffee near me" than address search, the latter likely being handled by tantivy. For context, I've also written a geocoder [0] based on tantivy. :)

[0] https://github.com/ellenhp/airmail