> The lut is zeroed except for the indices corresponding to the vowels, which contain it self
So in the comparison vs. the original str, vowels get a true and consonants get a false. Doesn't the "==" then returns true if all the chars are vowels?
I was curious if there were any interesting bit-masking patterns in ASCII for vowels that could be exploited, so I shamelessly asked ChatGPT and it gave a pretty nice insight:
- Setting bit 5 high forces lowercase for the input letter
- masking with `& 31` gives an index from 0-25 for the input letter
Then you can the `bt` instruction (in x86_64) to bit-test against the set of bits for a,e,i,o,u (after lowercasing) and return whether it matches, in a single instruction.
I'm sure there's other cool ways to test multiple vowels at once using AVX2 or AVX-512, I didn't really get that far. I just thought the bit-test trick was pretty sweet.
You can do extremely trivial binary checks on ASCII, UTF-8 (and most? other) encoding schemes. All vowels including W and Y contain a 1 in the lowest bit. Then by comparing bits 1-4, you can do a trivial case-insensitive comparison. You detect case by checking bit 5. 0 is upper, 1 is lower.
fedsocpuppet|8 months ago
camel-cdr|8 months ago
str|0x32 if you want case-insensitive.
This works because the lowest five bits, more specifically bits two to five, of the vowels are distinct.
The lut is zeroed except for the indices corresponding to the vowels, which contain it self: lut[vowel&31]=vowel or lut[(vowel&31)>>1]=vowel
teo_zero|8 months ago
So in the comparison vs. the original str, vowels get a true and consonants get a false. Doesn't the "==" then returns true if all the chars are vowels?
ninkendo|8 months ago
- Setting bit 5 high forces lowercase for the input letter
- masking with `& 31` gives an index from 0-25 for the input letter
Then you can the `bt` instruction (in x86_64) to bit-test against the set of bits for a,e,i,o,u (after lowercasing) and return whether it matches, in a single instruction.
It came up with this, which I thought was pretty nice: https://godbolt.org/z/KjMdz99be
I'm sure there's other cool ways to test multiple vowels at once using AVX2 or AVX-512, I didn't really get that far. I just thought the bit-test trick was pretty sweet.
Chat transcript is here (it failed pretty spectacularly the first couple times, tripping over AT&T syntax and getting an off-by-one error, but still pretty good) https://chatgpt.com/share/684c8b39-a9c4-8012-8bb6-74e1f8b6d0...
mystified5016|8 months ago
You can do extremely trivial binary checks on ASCII, UTF-8 (and most? other) encoding schemes. All vowels including W and Y contain a 1 in the lowest bit. Then by comparing bits 1-4, you can do a trivial case-insensitive comparison. You detect case by checking bit 5. 0 is upper, 1 is lower.
rurban|8 months ago
tough|8 months ago
dmoy|8 months ago