top | item 43565699

Show HN: FlashTokenizer – 10x faster C++ tokenizer for Python

5 points| springkim | 11 months ago |github.com

I built a tokenizer in C++ with a Python binding that outperforms HuggingFace tokenizers by 10x on large inputs. It's optimized for minimal memory usage and latency.

Benchmarks and comparison included in README. Would love feedback or contributions!

discuss

order

No comments yet.