top | item 41390983

(no title)

dn3500 | 1 year ago

If no character set is specified, plain text content is assumed to be 1252. This probably extends to application/javascript as well but I'd have to check to be sure.

The web pre-dates utf-8, although not by much. Ken Thompson introduced utf-8 at winter Usenix in 1993 and CERN released the web in April, but it would be several more years before utf-8 became common. The early web was ISO 8859-1 by default. But people were pretty lazy about specifying character sets back then (still are actually) and Microsoft started sending or assuming their 1252 character set where 8859-1 was required by the spec. Eventually the spec was changed to match de facto behavior. I guess the assumption was that if you're too stupid or lazy to say what character set you're using, then it's probably 1252. (Today the assumption would be that it's probably utf-8). I'm not sure what the specs say today, but I think html is assumed to be in utf-8, and everything else is assumed to be 1252 (if the character set is not explicitly declared).

discuss

No comments yet.