top | item 33967317

(no title)

opk | 3 years ago

I assume the author is realigning to get only one bit for each letter. This relies on having an alphabet of 26 characters where 26 is less than the 32 available in a u32. In most real-world problems, you need to allow for the full ASCII or Unicode range and this wouldn't be possible.

Using codepoints directly, there is overlap. 'f' ^ 'd' will give you the same bit pattern as 'b'. You could keep an xor value for each window size smaller than the full window but that effectively brings back the inner loop that using xor is avoiding and you could just use equality. With codepoints, there may be a solution similar to a bloom filter so efficiently determine whether a duplicate is possible but I've not thought through that fully.

discuss

order

ribit|3 years ago

You are right. Total brainfart on my side. I skimmed the article too fast and completely misunderstood the approach.