top | item 38742893

(no title)

missblit | 2 years ago

So this is pure speculation, but more people should be aware of parser differentials (same thing as that email thing the other day) so let me say what I mean...

Hypothetically say a website has an internal service to index posts for keywords for search, that just so happens to unescape HTML entities during keyword normalization due to a seemingly harmless bug.

Plus a second internal service to identify keyword spam that _doesn't_ do any HTML entity unescaping (because why would you?)

Then you could end up in a situation where a spammer uses HTML entities to avoid spam detection while still showing up in search results. They hope that the user ignores the nonsense text and just clicks their link based on the image (a list of big shopping brands in the middle east) instead.

discuss

No comments yet.