There are a few piece of this that rely on proprietary data, especially the FastText training step, so that's a dead-end unfortunately (would love to be proven wrong). I'd consider subbing in a small bert model with a classifier head for something FOSS without access to tons of user data, but then you lose the ability to serve high qps.
mips_avatar|6 months ago
ellenhp|6 months ago
[0] https://github.com/ellenhp/airmail