(no title)
ilovetux | 6 months ago
> The first 128 characters of Unicode, which are the same as the ASCII character set (characters 0-127), are encoded in UTF-8 using a single byte with the exact same binary value as their ASCII representation. This means that any file containing only ASCII characters is also a valid UTF-8 file
staplung|6 months ago
Now, technically, ASCII only concerns the lower 127 characters. There's no single standard definition as to what the upper half of the byte space represents in ASCII itself so technically it's true that all valid ASCII files are valid UTF-8. By the same logic however, the box drawing characters are not ASCII. They're actually part of something called code page 437, which maps those bit patterns to box drawing characters. With other code pages they map to something else, often non-Latin characters or ones with accents.
So, the name ASCII flow is misleading and the the output options are too. ;-) Basically, if the high bit is set in UTF-8 it indicates that more than one byte is needed to represent the code point.
ilovetux|6 months ago
em3rgent0rdr|6 months ago
https://en.wikipedia.org/wiki/UTF-8