Does anyone have successfully worked with Non-English text with FTS5 in Sqlite? I could not find any reference for German, e.g. and the default stemming does not seem to work properly (given some short tests).
> Does anyone have successfully worked with Non-English text with FTS5 in Sqlite? I could not find any reference for German, e.g.
We use it in the Fossil SCM project and users have reported success with Chinese and Russian, so it presumably works fine with any European/Germanic language.
> and the default stemming does not seem to work properly (given some short tests).
The Porter Stemmer is documented as only being useful for English.
It has pretty much the same support for other languages as most text mining tools and Elasticsearch via the snowball stemmer: https://github.com/abiliojr/fts5-snowball
Should work well for German, I’m using it with Nordic languages.
sgbeal|1 year ago
We use it in the Fossil SCM project and users have reported success with Chinese and Russian, so it presumably works fine with any European/Germanic language.
> and the default stemming does not seem to work properly (given some short tests).
The Porter Stemmer is documented as only being useful for English.
djhn|1 year ago
Should work well for German, I’m using it with Nordic languages.
atoav|1 year ago
Just that you would need to tokenize the right characters for your target language (e.g. ÜüÖöÄäßẞ¹), maybe those are already included in Unicode61.
¹: Yeah there is now a capital Eszett