top | item 46604108

Show HN: Shimmytok – Pure Rust GGUF tokenizer (no C++, no extra files)

2 points| MKuykendall | 1 month ago |github.com

I got tired of needing llama.cpp bindings or separate tokenizer.json files just to tokenize text, so I wrote a pure Rust library that reads the tokenizer directly from your GGUF model file, the same output as llama.cpp, zero C++ in the dependency tree.

2 comments

holg|1 month ago

Great work, sounds very promising, i did do some AI with burn and candle and always needed to create/convert the own models for this purpose, this could really help, thanks

MKuykendall|1 month ago

Thanks! Rust needed it