top | item 39691433

(no title)

I use the OpenAI tokenizer UI a lot when prompt engineering.

Token count for inputs allows comparison of different data formats (YAML, JSON, TS) and is a crude measure of prompt importance weighting. For outputs it is a relative measure of output speed between prompts (tok/s varies by time of day) and a crude measure of compute used in outputs (why “Think step-by-step” works). Token count also determines the cost of a prompt.

Since there’s no equivalent for other providers, I built one for Mistral & Anthropic. If it’s useful, I can add other providers too - let me know which you’d like.

discuss

Zambyte|1 year ago

Thanks for building this. Are the tokens different for the different models? For example, will the Mistral tokenization apply for both the 7B open model, and their propriety API only models?

henry_pulver|1 year ago

On the tokenizers Mistral use for proprietary models, this isn't common knowledge.

This tokenizer is correct for the 7B open model and 8x7B MoE model. It'll probably be the closest to the ones their proprietary API-only models use