According to [1], the byte pair encoding for “Apoploe vesrreaitais” (the words producing bird images) is "apo, plo, e</w>, ,ve, sr, re, ait, ais</w>", and Apo-didae & Plo-ceidae are families of birds.
On the other hand the openai tokenizer gives me a different tokenization ap - opl - oe [0]. If you capitalize A the result is A - pop - loe. The dalle 2 paper only specifies that it uses a BPE encoding, I would assume they used the same one as for gpt3
[0] https://beta.openai.com/tokenizer
DalasNoin|3 years ago
karmasimida|3 years ago
And for the record, they use BPE dropout for DALLE-1, see https://arxiv.org/pdf/2102.12092.pdf
unknown|3 years ago
[deleted]