Lots of comments here about XML vs. JSON... but there are areas where these two don't collide. I'm thinking about text/document encoding (real annotated text, things like books, etc).
Even though XML is still king here (see TEI and other norms), some of its limitations are a problem. Consider the following text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Now say you want to qualify a part of it:
Lorem ipsum <sometag>dolor sit amet</sometag>, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Now say you want to qualify another part, but it's overlapping with previous part:
Lorem ipsum <sometag>dolor sit <someothertag>amet</sometag>, consectetur</someothertag> adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Of course, this is illegal XML... so we have to do dirty hacks like this:
Lorem ipsum <start someid="part1"/>dolor sit <start someid="part2"/>amet<end someid="part1"/>, consectetur<end someid="part2"> adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Which means rather inefficient queries afterwards :-/
A strategy I've seen for dealing the inability of XML to handle overlapping tags, is to treat the tagging as an annotation layer on top of the node with the data:
<doc>
<data type="text">
This is some sample text.
</data>
<annotations>
<tag1 start="1" end="3" comment="foo"/>
<tag2 start="2" end="4" type="bar" />
</annotations>
</doc>
The start and end are usually byte offsets from the start of the text content in the data node. It still sucks, but at least you could apply the same general stragegy to more than just text data - I've seen it used with audio/video where the offsets are treated as time offsets into the media.
I would argue that the inline way of annotating things in XML is actually ok-ish if one absolutely needs human edit-ability, but otherwise bad design.
{text: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.",
annotations: [{tag: "sometag", ranges: [{from: 12, to: 26}]},
{tag: "sometothertag", ranges: [{from: 21, to: 39}]}
Note that this also removes the limitation that annotations have to be consecutive.
You are absolutely right that XML is better for document structures.
My current theory is that Yjs [0] is the new JSON+XML. It gives you both JSON and XML types in one nested structure, all with conflict free merging via incremental updates.
Also, you note the issue with XML and overlapping inline markup. Yjs has an answer for that with its text type, you can apply attributes (for styling or anything else) via arbatary ranges. They can overlap.
Obviously I'm being a little hypabolic suggesting it will replace JSON, the beauty of JSON is is simplicity, but for many systems building on Yjs or similar CRDT based serialisation systems is the future.
This is actually one of the things processing instructions are useful for - but you would need to define the data within the PI, since they don't have attributes.
Nobody deserves XML! In all seriousness I get the idea behind XML and I have used a couple of SOAP services which were absolutely brilliant, but as someone who has spent a decade “linking” data from various sources in non-tech enterprise… Well… let’s just say that I’m being kind if I say that 5% of the services which used XML were doing it in a way that was nice to work with.
Which is why JSON’s simplicity is such a win for our industry. Because it’s always easy to handle. Sure you can build it pretty terrible, but you’re not going to do this: <y value=x> and then later do <y>x</y> which I’m not convinced you didn’t do in XML because you’re chaotic evil. And you’re not going to run into an issue where some Adobe Lifecycle schema doesn’t work with .Net because of reasons I never really found out because why wouldn’t an XML schema work in .Net? Anyway, I’m sure the intentions behind XML were quite brilliant but in my anecdotal experience it just doesn’t work for the Thursday afternoon programmer mindset and JSON does.
A catchy but a meaningless phrase. JSON is a dumpster on fire. Probably in even more ways than XML is. Maybe you deserve it... I feel like I'm being punished by the stupid people who make me use it in a way similar to the sham court hearings from The Planet of Apes.
Whenever two or more are gathered together, they shall argue about JSON vs XML.
Personally I like the simplicity of JSON and also the expressive power of XML. But then I tend to only use each for the task it was primarily intended: application data-on-the-wire in JSON and "documents" in XML. It seems like a lot of the recurrent discussion around these technologies happens when they're pushed to do things outside their comfort zone. And I wonder if some of this is down to siloing of developer knowledge.
There was a comment on HN a few days ago (not by me, and I can't find it now) to the effect that web development has historically attracted self-taught developers or those who have come to it by routes like bootcamps. It went on to say that they perhaps consequently lack some knowledge of existing techniques and solutions, and therefore tend to recreate solutions that may already exist (and not always well). And this drives the well-known churn in webdev tech: of which bolting schemas onto JSON is arguably an example.
I wonder what people think of this? Personally I think it has some merit, but that the "churn" has also generated (along with much wheel-reinvention) some great innovations. And I say that as someone who works mainly on back-end stuff.
It is interesting that people love json (now with schema), but hate XML while loving HTML at the same time. It is all pretty boring and largely the same imo.
The absolute worst bit of XML is the confused implementations. What should be an attribute on a tag, and what should go between tags? Even worse, nothing is sanely typed without an xsd. Different systems will treat the following differently:
<some>true</some>
versus
<some>1</some>
Some systems require the token "true", others will only treat 1 as the boolean true.
For example, MS claims that for exchange ASD boolean values must be integer 1 or 0 [0], but then links to a W3C spec that allows for the tokens true and false [1]
At least with JSON and HTML, you don't need a separate definition file for basic, primitive data types.
JSON is a much better serialization format since XML was designed as a document format. For example, there is no standardized way to serialize a string with a null character even if you escape it (this is allowed in many programming languages). JSON just says do “\0” and calls it a day. I’m not sure if it’s better for users, but it’s certainly easier to work with as a dev.
HTML isn’t trying to serialize abstract data and is doing what XML does best in being a document/GUI format. It doesn’t matter all that much that it can’t represent null characters in a standard way because it isn’t a printable character.
It's very easy to understand why people prefer JSON.
95% of developers know exactly what JSON is without ever having read anything technical about it. It's obvious.
XML on the other hand... Who here can say they actually know anything substantial about XML besides the syntax? My guess is <10%.
XML suffers from too many options and useless bells and whistles.
E.g. the attribute vs Parameter topic is a source of confusion, without adding much value, especially if the source and target are object oriented and/ or a relational db. What's the point?
Then there are namespaces, sure there are probably lots of places where you need to use them. But I never encountered a place where they are really needed, but because they are the default you need to work with them or your queries do not work. Super confusing for beginners and annoying as heck.
These are actually what IntelliJ uses to validate all sorts of config files behind the scenes.
For work, we even do code generation off of the Meltano (ETL tool) spec and use it to validate reads and writes to the file (which we edit at application runtime) to catch errors as close to when they actually occur.
There are some YAML based schemas there too. How does this work, is there a canonical YAML->JSON transformation, or does JSON schema spec have explicit YAML support?
YAML is effectively a superset of JSON although the syntax used in YAML is often different. So you can't translate all YAML to JSON, but all JSON can be represented as YAML.
but XML allowed an easy way to have distributed schemas not needing a central place. The schema. URI could be made a resource that existed at the URI.
The wording gets complex as the URI does not need to exist on the web and that need for exact wording is I suspect a reason for XML to be perceived as complex
can JSON achema be used to describe say the schema of a RDBMS table?
is these some standardization here so I might use a JSON schema that already covers a lot of the fields that are needed to describe columns, constraints etc?
[+] [-] JodieBenitez|2 years ago|reply
Even though XML is still king here (see TEI and other norms), some of its limitations are a problem. Consider the following text:
Now say you want to qualify a part of it: Now say you want to qualify another part, but it's overlapping with previous part: Of course, this is illegal XML... so we have to do dirty hacks like this: Which means rather inefficient queries afterwards :-/[+] [-] meepmorp|2 years ago|reply
[+] [-] j-pb|2 years ago|reply
[+] [-] samwillis|2 years ago|reply
My current theory is that Yjs [0] is the new JSON+XML. It gives you both JSON and XML types in one nested structure, all with conflict free merging via incremental updates.
Also, you note the issue with XML and overlapping inline markup. Yjs has an answer for that with its text type, you can apply attributes (for styling or anything else) via arbatary ranges. They can overlap.
Obviously I'm being a little hypabolic suggesting it will replace JSON, the beauty of JSON is is simplicity, but for many systems building on Yjs or similar CRDT based serialisation systems is the future.
Maybe what we need is a YjsSchema...
https://github.com/yjs/yjs/
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] dwaite|2 years ago|reply
[+] [-] thomasfromcdnjs|2 years ago|reply
It has made writing resumes with co pilot super powerful.
[+] [-] nbbaier|2 years ago|reply
[+] [-] sleepytree|2 years ago|reply
[+] [-] ChrisArchitect|2 years ago|reply
Some previous discussion: https://news.ycombinator.com/item?id=23988269
[+] [-] seanp2k2|2 years ago|reply
[+] [-] devjab|2 years ago|reply
Which is why JSON’s simplicity is such a win for our industry. Because it’s always easy to handle. Sure you can build it pretty terrible, but you’re not going to do this: <y value=x> and then later do <y>x</y> which I’m not convinced you didn’t do in XML because you’re chaotic evil. And you’re not going to run into an issue where some Adobe Lifecycle schema doesn’t work with .Net because of reasons I never really found out because why wouldn’t an XML schema work in .Net? Anyway, I’m sure the intentions behind XML were quite brilliant but in my anecdotal experience it just doesn’t work for the Thursday afternoon programmer mindset and JSON does.
[+] [-] crabbone|2 years ago|reply
[+] [-] andyjohnson0|2 years ago|reply
Personally I like the simplicity of JSON and also the expressive power of XML. But then I tend to only use each for the task it was primarily intended: application data-on-the-wire in JSON and "documents" in XML. It seems like a lot of the recurrent discussion around these technologies happens when they're pushed to do things outside their comfort zone. And I wonder if some of this is down to siloing of developer knowledge.
There was a comment on HN a few days ago (not by me, and I can't find it now) to the effect that web development has historically attracted self-taught developers or those who have come to it by routes like bootcamps. It went on to say that they perhaps consequently lack some knowledge of existing techniques and solutions, and therefore tend to recreate solutions that may already exist (and not always well). And this drives the well-known churn in webdev tech: of which bolting schemas onto JSON is arguably an example.
I wonder what people think of this? Personally I think it has some merit, but that the "churn" has also generated (along with much wheel-reinvention) some great innovations. And I say that as someone who works mainly on back-end stuff.
Thoughts?
[+] [-] guideamigo|2 years ago|reply
[+] [-] jhoechtl|2 years ago|reply
[+] [-] unilynx|2 years ago|reply
It's a pity the catalog format doesn't support an 'import' or relative URLs for schemas - would have made local extensions a bit easier.
[+] [-] relequestual|2 years ago|reply
[+] [-] osigurdson|2 years ago|reply
[+] [-] zdragnar|2 years ago|reply
For example, MS claims that for exchange ASD boolean values must be integer 1 or 0 [0], but then links to a W3C spec that allows for the tokens true and false [1]
At least with JSON and HTML, you don't need a separate definition file for basic, primitive data types.
[0] https://learn.microsoft.com/en-us/openspecs/exchange_server_...
[1] https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#boolean
[+] [-] slaymaker1907|2 years ago|reply
HTML isn’t trying to serialize abstract data and is doing what XML does best in being a document/GUI format. It doesn’t matter all that much that it can’t represent null characters in a standard way because it isn’t a printable character.
[+] [-] fauigerzigerk|2 years ago|reply
But I think complexity is always 90% culture. It's pretty arbitrary what kind of culture grows around a particular technology.
[+] [-] geysersam|2 years ago|reply
XML on the other hand... Who here can say they actually know anything substantial about XML besides the syntax? My guess is <10%.
[+] [-] HdS84|2 years ago|reply
Then there are namespaces, sure there are probably lots of places where you need to use them. But I never encountered a place where they are really needed, but because they are the default you need to work with them or your queries do not work. Super confusing for beginners and annoying as heck.
[+] [-] throwawaymaths|2 years ago|reply
[+] [-] revskill|2 years ago|reply
JSON is just pure data.
[+] [-] neverrroot|2 years ago|reply
[+] [-] lolive|2 years ago|reply
[+] [-] yawnxyz|2 years ago|reply
[+] [-] andenacitelli|2 years ago|reply
These are actually what IntelliJ uses to validate all sorts of config files behind the scenes.
For work, we even do code generation off of the Meltano (ETL tool) spec and use it to validate reads and writes to the file (which we edit at application runtime) to catch errors as close to when they actually occur.
[+] [-] mooreed|2 years ago|reply
Or maybe even a way to discover related statically typed definitions based on the validation rules?
It would be really nice to not define parts of a data model that provide little to no business value - but where you can easily “stub your toe”.
[+] [-] zurn|2 years ago|reply
edit: skipping the theoretical foundations, there seems to be at least this tool that claims to validate yaml against json schema: https://github.com/json-schema-everywhere/pajv
[+] [-] michaelmior|2 years ago|reply
[+] [-] andirk|2 years ago|reply
[+] [-] LispSporks22|2 years ago|reply
[+] [-] pasc1878|2 years ago|reply
The wording gets complex as the URI does not need to exist on the web and that need for exact wording is I suspect a reason for XML to be perceived as complex
[+] [-] osigurdson|2 years ago|reply
[+] [-] albert_e|2 years ago|reply
is these some standardization here so I might use a JSON schema that already covers a lot of the fields that are needed to describe columns, constraints etc?
can JSON schema capture relations between fields?
[+] [-] albert_e|2 years ago|reply