top | item 23455208

Chardetng: A More Compact Character Encoding Detector for the Legacy Web

63 points| hsivonen | 5 years ago |hsivonen.fi

4 comments

order

donatj|5 years ago

About a year ago I had a webpage which was interpreted in the wrong encoding and was taken aback that Chrome no longer allows you to override a pages encoding.

I think it’s interesting how far we have come with UTF-8 adoption that it was the first time I had reached for said menu in probably nearly a decade.

BiteCode_dev|5 years ago

Fantastic write up.

I regularly use the Python port of the original chardet (https://pypi.org/project/chardet/). In fact, most python devs do since it comes with requests.

This post is full of gems. E.G: I learned that it's important for your meta charset to be in the first 1024 bytes of your HTML :)

jdashg|5 years ago

FWIW Firefox issues a warning if it finds your charset declaration late, outside the 1024. Long copyright or license headers can cause this problem, annoyingly.

camgunz|5 years ago

This is super cool and interesting. Great write-up, thanks.