(no title)
HiPhish | 1 month ago
I learned HTML quite late, when HTML 5 was already all the rage, and I never understood why the more strict rules of XML for HTML never took off. They seem so much saner than whatever soup of special rules and exceptions we currently have. HTML 5 was an opportunity to make a clear cut between legacy HTML and the future of HTML. Even though I don't have to, I strive to adhere to the stricter rules of closing all tags, closing self-closing tags and only using lower-case tag names.
pwdisswordfishy|1 month ago
Internet Explorer failing to support XHTML at all (which also forced everyone to serve XHTML with the HTML media type and avoid incompatible syntaxes like self-closing <script />), Firefox at first failing to support progressive rendering of XHTML, a dearth of tooling to emit well-formed XHTML (remember, those were the days of PHP emitting markup by string concatenation) and the resulting fear of pages entirely failing to render (the so-called Yellow Screen of Death), and a side helping of the WHATWG cartel^W organization declaring XHTML "obsolete". It probably didn't help that XHTML did not offer any new features over tag-soup HTML syntax.
I think most of those are actually no longer relevant, so I still kind of hope that XHTML could have a resurgence, and that the tag-soup syntax could be finally discarded. It's long overdue.
xg15|1 month ago
Meanwhile, in any other formal language (including JS and CSS!), the standard assumption is that syntax errors are fatal, the responsibility for fixing lies with the page author, but also that fixing those errors is not a difficult problem.
Why is this a problem for HTML - and only HTML?
hinkley|1 month ago
Netscape started this. NCSA was in favor of XML style rules over SGML, but Netscape embraced SGML leniency fully and several tools of that era generated web pages that only rendered properly in Netscape. So people voted with their feet and went to the panderers. If I had a dollar for every time someone told me, “well it works in Netscape” I’d be retired by now.
pwdisswordfishy|1 month ago
Well, this is not entirely true: XML namespaces enabled attaching arbitrary data to XHTML elements in a much more elegant, orthogonal way than the half-assed solution HTML5 ended up with (the data-* attribute set), and embedding other XML applications like XForms, SVG and MathML (though I am not sure how widely supported this was at the time; some of this was backported into HTML5 anyway, in a way that later led to CVEs). But this is rather niche.
Sankozi|1 month ago
bazoom42|1 month ago
Original SGML was actually closer to markdown. It had various options to shorten and simplify the syntax, making it easy to write and edit by hand, while still having an unambiguous structure.
The verbose and explicit structure of xhtml makes it easier to process by tools, but more tedious for humans.
nathankleyn|1 month ago
It’s kind of a huge deal that I can give a Markdown file of plain text content to somebody non-technical and they aren’t overwhelmed by it in raw form.
HTML fails that same test.
Pxtl|1 month ago
And markdown tables are harder to write than HTML tables. However, they are generally easier to read. Unless multi line cell.
oneeyedpigeon|1 month ago
thisislife2|1 month ago
kbolino|1 month ago
A p or li tag, at least when used and nested properly, logically ends where either the next one begins or the enclosing block ends. Closing li also creates the opportunity for nonsensical content inside of a list but not in any list item. Of course all of these corner cases are now well specified because people did close their tags sometimes.
afavour|1 month ago
While this is true I’ve never liked it.
Implies a closing </p> in the middle. But Does not. Obviously with the knowledge of the difference between what span and p represent I understand why but in terms of pure markup it’s always left a bad taste in my mouth. I’ll always close tags whenever relevant even if it’s not necessary.Pxtl|1 month ago
So we'll add another syntax for browsers to handle.
https://xkcd.com/927/
dragonwriter|1 month ago
Because of the vast quantity of legacy HTML content, largely.
> HTML 5 was an opportunity to make a clear cut between legacy HTML and the future of HTML.
WHATWG and its living standard that W3C took various versions of and made changes to and called it HTML 5, 5.1, etc., to pretend that they were still relevant in HTML, before finally giving up on that entirely, was a direct result of the failure of XHTML and the idea of a clear cut between legacy HTML and the future of HTML. It was a direct reaction against the “clear cut” approach based on experience, not an opportunity to repeat its mistakes. (Instead of a clear break, HTML incorporated the “more strict rules of XML” via the XML serialization for HTML; for the applications where that approach offers value, it is available and supported and has an object model 100% compatible with the more common form, and they are maintained together rather than competing.)
mgr86|1 month ago
MarsIronPI|1 month ago
Besides, at this point technologies like tree-sitter make editor integration a moot point: once tree-sitter knows how to parse it, the editor does too.
interactivecode|1 month ago
Html, css and js got used so much because you could mess around and still get something to work. While other languages that people use to write “serious” applications just screamed at you for not being smart enough to know how to allocate memory correctly.
Html and css is not a competitor to C. Its more like an alternative to file formats like txt or rtf. Meant to be written by hand in a text editor to get styled pages. So easy and forgiving your mom could do it! (And did, just like everyone else in the myspace days)
vimax|1 month ago
coffeefirst|1 month ago
But.
The future of HTML will forever contain content that was first handtyped in Notepad++ in 2001 or created in Wordpress in 2008. It's the right move for the browser to stay forgiving, even if you have rules in your personal styleguide.
ndiddy|1 month ago
XHTML came out at a time when Internet Explorer, the most popular browser, was essentially frozen apart from security fixes because Microsoft knew that if the web took off as a viable application platform it would threaten Windows' dominance. XHTML 1.1 Transitional was essentially HTML 4.01 except that if it wasn't also valid XML, the spec required the browser to display a yellow "parsing error" page rather than display the content. This meant that any "working" XHTML site might not display because the page author didn't test in your browser. It also meant that any XHTML site might break at any time because a content writer used a noncompliant browser like IE 6 to write an article, or because the developers missed an edge case that causes invalid syntax.
XHTML 2.0 was a far more radical design. Because IE 6 was frozen, XHTML 2.0 was written with the expectation that no current web browser would implement it, and instead was a ground-up redesign of the web written "the right way" that would eventually entirely replace all existing web browsers. For example, forms were gone, frames were gone, and all presentational elements like <b> and <i> were gone in favor of semantic elements like <strong> and <samp> that made it possible for a page to be reasoned about automatically by a program. This required different processing from existing HTML and XHTML documents, but there was no way to differentiate between "old" and "new" documents, meaning no thought was given to adding XHTML 2.0 support to browsers that supported existing web technologies. Even by the mid-2000s, asking everyone to restart the web from scratch was obviously unrealistic compared to incrementally improving it. See here for a good overview of XHTML 2.0's failure from a web browser implementor's perspective: https://dbaron.org/log/20090707-ex-html
jrm4|1 month ago
Wowfunhappy|1 month ago
It's still a little annoying to put <p> before each paragraph, but not by that much. By contrast, once you start adding closing tags, you're much closer to computer code.
I'm not sure if that makes sense but it's the way I think about it.
SoftTalker|1 month ago
Any time I have to write Markdown I have to open a cheat sheet for reference. With HTML, which I have used for years, I just write it.
onion2k|1 month ago
ndiddy|1 month ago
> On void elements, [the trailing slash] does not mark the start tag as self-closing but instead is unnecessary and has no effect of any kind. For such void elements, it should be used only with caution — especially since, if directly preceded by an unquoted attribute value, it becomes part of the attribute value rather than being discarded by the parser.
It was mainly added to HTML5 to make it easier to convert XHTML pages to HTML5. IMO using the trailing slash in new pages is a mistake. It makes it appear as though the slash is what closes the element when in reality it does nothing and the element is self-closing because it's part of a hardcoded set of void elements. See here for more information: https://github.com/validator/validator/wiki/Markup-%C2%BB-Vo...
Hendrikto|1 month ago
jakelazaroff|1 month ago
vbezhenar|1 month ago
Because browsers close some tags automatically. And if your closing tag is wrong, it'll generate empty element instead of being ignored. Without even emitting warning in developer console. So by closing tags you're risking introducing very subtle DOM bugs.
If you want to close tags, make sure that your building or testing pipeline ensures strict validation of produced HTML.