Hi folks – I work at OpenAI and helped build this page, awesome to see it on here!
Heads up that it's a bit out of date as GPT4 has a different tokenizer than GPT3. I'd recommend checking out tiktoken (https://github.com/openai/tiktoken) or this other excellent app that a community member made (https://tiktokenizer.vercel.app)
lowefk|2 years ago
unknown|2 years ago
[deleted]
egorfine|2 years ago
Test phrase could be something like "Жизнь прекрасна и удивительна" ("Life is great" in russian).
I make an assumption that this is the implementation on the page that is broken, not the actual tokenizer. The reason: russian works perfectly in GPT-3 which I guess wouldn't be the case with a tokenization as presented on the page.
dqbd|2 years ago
lemming|2 years ago
teruakohatu|2 years ago
resters|2 years ago