Your explanation makes it sound like an incredibly stupid decision. I imagine what you're getting at is that 3 bytes were/are sufficient for the basic multilingual plane, which is incidentally also what can be represented in a single utf-16 byte pair. So they imposed the same limitation as utf-16 had on utf-8. This would have seemed logical in a world where utf-16 was the default and utf-8 was some annoying exception they had to get out of the way.
evanelias|16 days ago
There wasn't a clear winner in utf-8 at the time, especially given its 6-byte-max representation back then. Memory and storage were a lot more limited.
And yes while 6 bytes was the maximum, a bunch of critical paths (e.g. sorting logic) in old MySQL required allocating a worst-case buffer size, so this would have been prohibitively expensive.
booi|3 days ago
Even if there were no codepoints in the 4-byte range yet, they could and should have implemented it anyway. It literally does not take any more storage because it is a variable width encoding.