top | item 9552272

(no title)

ICWiener | 10 years ago

Did you just summarize thousands of sentences in a single one?

Neither <input type=button/> nor <input><type>button</type></input> does mean anything without a spec of your language, and trying to infer that <input>foo</input> or <type>bar</type> is valid is only due to some pre-conception about the underlying semantics you gave to your data (you chose familiar tag names, after all).

If I gave you a sample XML file from work, you'd have a very hard time knowing whether one attribute/element is or isn't optional (or bound) to an element. You'd need a schema/meta-model or even an informal spec.

With attributes/elements, you have two ways of defining data where only one is necessary: (input (type button)). The closet example I can think of is for example, if you had a lisp that is not homoiconic but had two different syntaxes for lists at its core: [] and (), without a simple way to abstract over them. Then your data format would be cumbersome to use with no added benefit: whether or not a []-list means something different than a ()-list depends on the specification of your data.

discuss

order

lisper|10 years ago

> Did you just summarize thousands of sentences in a single one?

Could be. The comment format demands a certain succinctness.

> Neither <input type=button/> nor <input><type>button</type></input> does mean anything without a spec of your language, and trying to infer that <input>foo</input> or <type>bar</type> is valid is only due to some pre-conception about the underlying semantics you gave to your data (you chose familiar tag names, after all).

Yes, that's true. Semantics is ultimately a very deep rabbit hole. My main claim is that it's useful to have an unambiguous syntactic distinction between data and metadata so that you can tell that the BAZ in <FOO BAZ=BAR>BING</FOO> is INTENDED to be a MODIFIER for FOO and NOT part of the CONTENT that FOO is marking up.

Actually, let me try to be excruciatingly precise: my claim is that it is not self-evident that a syntactic distinction between data and metadata is useless. Hence a rendering of XML into sexprs that preserves this distinction is defensible, and dismissing it in the pejorative terms that Erik uses is not justified.

ICWiener|10 years ago

Just as code is data and data is code, meta-data is another form of data. Erik' quotes are all taken from https://groups.google.com/d/msg/comp.lang.lisp/8eUxiibm_zA/C...:

> [...] is INTENDED to be a MODIFIER for FOO and NOT part of the CONTENT that FOO is marking up

"The intended use has less to do with it than the notion that you can define what is meta-information and what is information at the time you want to decide whether something goes in an attribute or a sub-element. My argument is that this is impossible. Whether it is meta-information or information is a reflection of the actual use, not the intended use.

However, given that the mechanism was created, and I will argue that it was not so much created as it was never thought possible to be any other way, it was used to define several language properties. "Now that we have this, would it not also be nice to have that." This means that several of the attribute types grew very far apart from the contents of sub-elements and you sort of "had" to use them as attributes, but only sort of, because the application can and does define the semantics of everything, and if you want ID and IDREF, you can make the same choice as you would in Common Lisp to use symbols or a hash tables of strings."

> Actually, let me try to be excruciatingly precise: my claim is that it is not self-evident that a syntactic distinction between data and metadata is useless. Hence a rendering of XML into sexprs that preserves this distinction is defensible, and dismissing it in the pejorative terms that Erik uses is not justified.

"No, there is nothing that requires there to be element attributes as a distinct concept from element contents. There are, however, a number of practical things that follow from making that arbitrar distinction which can look like rationales, but if you ask yourself "why can it not be a subelement", there are no real answers, only appeals to the idea that there somehow __have to be a distinction. It took me years to figure out that the whole attribute idea is completely vacuous, and I worked with the creator of SGML himself for several years on several SGML-related standards and projects. I started writing "A conceptual introduction to SGML" back in 1994, but as I had pained my way through five chapters, I had to realize that it was all wrong. There was a basic design mistake in the whole language framework. That mistake is that simply put: "what is good enough for the users of the language is not good enough for its creators". Each and every level of "containership" in SGML has its own syntax, optimized for the task. Each and every level has a different syntax for "the writing on the box" as opposed to "the contents of the box". This follows from a very simple, yet amazingly elusive principle in its design: Meta-data is conceptually incompatible with data. This is in fact wrong. Meta-data is only data viewed from a different angle, and vice versa. SGML forces you to remain loyal to your chosen angle of view."

So, the opinion seems to be that the arbitrary structure you put between meta-data and data at the moment you create data is not necessarly well-suited for people using your data: then, despite being useful, the syntactic distinction becomes an annoyance.

By the way, Erik's dissmissal is (1) the result of years of experience working on SGML, and (2) not pejorative, but argumented. He is the one arguing against the "self-evidence" of attributes.