In this release, I introduced synthetic data generation for text classification. As a result, one can build classifiers from scratch, even without a dataset. It achieves competitive results with synthetic data, comparable to using real data, in 5 benchmarks.
There are many open researches and implementation in the future:
- research on synthetic data algorithm resulting in higher performance
- agentic workflow of model evaluation, error analysis and model improvement
- multilingual support
If you are interested, please give a star, try it out, and contribute.
kenhktsui|1 year ago
In this release, I introduced synthetic data generation for text classification. As a result, one can build classifiers from scratch, even without a dataset. It achieves competitive results with synthetic data, comparable to using real data, in 5 benchmarks.
There are many open researches and implementation in the future: - research on synthetic data algorithm resulting in higher performance - agentic workflow of model evaluation, error analysis and model improvement - multilingual support If you are interested, please give a star, try it out, and contribute.
Detailed Blog: https://huggingface.co/blog/kenhktsui/anyclassifier
Best, Ken