All major tokenisers have explicit support for encoding arbitrary byte sequences. There's usually a consecutive range of tokens reserved for 0x00 to 0xFF, and you can encode any novel UTF-8 words or structures with it. Including emoji and characters that weren't a part of the model's initial training, if you show it some examples.
barrenko|10 months ago
docmechanic|10 months ago
selfhoster11|10 months ago
asdff|10 months ago
emaro|10 months ago