(no title)
dmsnell | 6 months ago
It’s helpful to recognize that the inner script tags are not actual script tags. Yes, once entering a script element, the browser switches parsers and wants to skip everything until a closing script tag appears. The STYLE element, TITLE, TEXTAREA, and a few others do this. Once they chop up the HTML like this they send the contents to the separate inner parser (in this case, the JS engine). SCRIPT is unique due to the legacy behavior^1.
HTML5 specifies these “inner” tags as transitions into escape modes. The entire goal is to allow JavaScript to contain the string “</script>” without it leaking to the outer parser. The early pattern of hiding inside an HTML comment is what determined the escaping mechanism rather than making some special syntax (which today does exist as noted in the post).
The opening script tag inside the comment is actually what triggers the escaping mode, and so it’s less an HTML tag and more some kind of pseudo JS syntax. The inner closing tag is therefore the escaped string value and simultaneously closes the escaped mode.
Consider the use of double quotes inside a string. We have to close the outer quote, but if the inner quote is escaped like `\”` then we don’t have to close it — it’s merely data and not syntax.
There is only one level of nesting, and eight opening tags would still be “closed” by the single closing tag.
^1: (edit) This is one reason HTML and XML (XHTML) are incompatible. The content of SCRIPT and STYLE elements are essentially just bytes. In XML they must be well-formed markup. XML parsers cannot parse HTML.
tannhaeuser|6 months ago
socalgal2|6 months ago
robocat|6 months ago
Bullshit - Navigator and IE didn't have HTTP/2. I'm guessing you didn't use dialup where your external CSS or JavaScript regularly failed to load. You didn't add extra dependencies because IE would only had two concurrent connections to load files.
It's easy to criticize past mistakes from your armchair: but I suggest you try and be a little more fair towards the people that made decisions especially when overall HTML has been a resounding success.
dullcrisp|6 months ago
dmsnell|6 months ago
In this link we can see the expectation that the HTML comment surrounds a call to document.write() which inserts a new SCRIPT element. The tags are balanced.
https://stackoverflow.com/questions/236073/why-split-the-scr...
In this HTML 4.01 spec, it’s noted to use HTML comments to hide the script contents from render, which is where we start to get the notion of using these to hide markup from display.
https://www.w3.org/TR/html401/interact/scripts.html
Some drafts of the HTML standard attempted to escape differently and didn’t have the double escape state.
https://www.w3.org/TR/2016/WD-html52-20161206/semantics-scri...
My guess is that at some point the parsers looked for balanced tags, as evidenced in the note in the last link above, but then practical issues with improperly-generated scripts led to the idea that a single SCRIPT closing tag ends the escaping. Maybe people were attempting to concatenate script contents wrong and getting stacks of opening tags that were never closed. I don’t know, but I suppose it’s recorded somewhere.
Many things in today’s HTML arose because of widespread issues with how people generated the content. The same is true of XML and XHTML by the way. Early XML mailing lists were full of people parsing XML with naive PERL regular expressions and suggesting that when someone wants to “fix” broken markup, that they do it with string-based find-and-replace.
The main difference is that the HTML spec went in the direction of saying, _if we can agree how to handle these errors then in the face of some errors we can display some content_ and we can all do it in the same way. XML is worse in some regards: certain kinds of errors are still ambiguous and up to the parser to determine how to handle, whether they are non-recoverable or recoverable. For those non-recoverable, the presence of a single error destroys the entire document, like being refused a withdrawal at the bank because you didn’t cross a 7.
At least with HTML5, it’s agreed upon what to do when errors are present and all parsers can produce the same output document; XML parsers routinely handle malformed content and do so in different ways (though most at least provide or default to a strict mode). It’s better than the early web, but not that much better.