top | item 38362413

(no title)

pravus | 2 years ago

> I'm guessing due to some backwards compatibility idiocy that seemed like it made sense at some point ... > ... making a compelling reason to fuck over the future in favor of optimisation now

> I never questioned the competence of past engineers

False just based on your opening volley of toxic spew. Backwards compatibility is an engineering decision and it was made by very competent people to interoperate with a large number of systems. The future has never been fucked over.

You seem to not understand how ASCII is encoded. It is primarily based on bit-groups where the numeric ranges for character groupings can be easily determined using very simple (and fast) bit-wise operations. All of the basic C functions to test single-byte characters such as `isalpha()`, `isdigit()`, `islower()`, `isupper()`, etc. use this fact. You can then optimize these into grouped instructions and pipeline them. Pull up `man ascii` and pay attention to the hex encodings at the start of all the major symbol groups. This is still useful today!

No, the biggest fuckage of the internet age has been Unicode which absolutely destroys this mapping. We no longer have any semblance of a 1:1 translation between any set of input bytes and any other set of character attributes. And this is just required to get simple language idioms correct. The best you can do is use bit-groupings to determine encoding errors (ala UTF-8) or stick with a larger translation table that includes surrogates (UTF-16, UTF-32, etc). They will all suffer the same "performance" problem called the "real world".

discuss

No comments yet.