(no title)
ynik
|
6 months ago
Python 3 internally uses UTF-32.
When exchanging data with the outside world, it uses the "default encoding" which it derives from various system settings. This usually ends up being UTF-8 on non-Windows systems, but on weird enough systems (and almost always on Windows), you can end up with a default encoding other than UTF-8.
"UTF-8 mode" (https://peps.python.org/pep-0540/) fixes this but it's not yet enabled by default (this is planned for Python 3.15).
arcticbull|6 months ago
It uses Latin-1 for ASCII strings, UCS-2 for strings that contain code points in the BMP and UCS-4 only for strings that contain code points outside the BMP.
It would be pretty silly for them to explode all strings to 4-byte characters.
jibal|6 months ago
account42|6 months ago