JSON Schema Store

[+] JodieBenitez|2 years ago|reply

Lots of comments here about XML vs. JSON... but there are areas where these two don't collide. I'm thinking about text/document encoding (real annotated text, things like books, etc).

Even though XML is still king here (see TEI and other norms), some of its limitations are a problem. Consider the following text:

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Now say you want to qualify a part of it:

    Lorem ipsum <sometag>dolor sit amet</sometag>, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Now say you want to qualify another part, but it's overlapping with previous part:

    Lorem ipsum <sometag>dolor sit <someothertag>amet</sometag>, consectetur</someothertag> adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Of course, this is illegal XML... so we have to do dirty hacks like this:

    Lorem ipsum <start someid="part1"/>dolor sit <start someid="part2"/>amet<end someid="part1"/>, consectetur<end someid="part2"> adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Which means rather inefficient queries afterwards :-/

[+] meepmorp|2 years ago|reply

A strategy I've seen for dealing the inability of XML to handle overlapping tags, is to treat the tagging as an annotation layer on top of the node with the data:

  <doc>
  <data type="text">
    This is some sample text.
  </data>
  <annotations>
    <tag1 start="1" end="3" comment="foo"/>
    <tag2 start="2" end="4" type="bar" />
  </annotations>
  </doc>

The start and end are usually byte offsets from the start of the text content in the data node. It still sucks, but at least you could apply the same general stragegy to more than just text data - I've seen it used with audio/video where the offsets are treated as time offsets into the media.

[+] j-pb|2 years ago|reply

I would argue that the inline way of annotating things in XML is actually ok-ish if one absolutely needs human edit-ability, but otherwise bad design.

  {text: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.",
   annotations: [{tag: "sometag", ranges: [{from: 12, to: 26}]},
                 {tag: "sometothertag", ranges: [{from: 21, to: 39}]}

Note that this also removes the limitation that annotations have to be consecutive.

[+] samwillis|2 years ago|reply

You are absolutely right that XML is better for document structures.

My current theory is that Yjs [0] is the new JSON+XML. It gives you both JSON and XML types in one nested structure, all with conflict free merging via incremental updates.

Also, you note the issue with XML and overlapping inline markup. Yjs has an answer for that with its text type, you can apply attributes (for styling or anything else) via arbatary ranges. They can overlap.

Obviously I'm being a little hypabolic suggesting it will replace JSON, the beauty of JSON is is simplicity, but for many systems building on Yjs or similar CRDT based serialisation systems is the future.

Maybe what we need is a YjsSchema...

https://github.com/yjs/yjs/

[+] unknown|2 years ago|reply

[deleted]

[+] dwaite|2 years ago|reply

This is actually one of the things processing instructions are useful for - but you would need to define the data within the PI, since they don't have attributes.

[+] thomasfromcdnjs|2 years ago|reply

JSON Resume uses a defined schema. (listed on schemastore.org)

It has made writing resumes with co pilot super powerful.

[+] nbbaier|2 years ago|reply

Do you have an example of how you've done this?

[+] sleepytree|2 years ago|reply

Can you share your general process for that? Trying to do more AI for this type of thing.

[+] ChrisArchitect|2 years ago|reply

(2020)?

Some previous discussion: https://news.ycombinator.com/item?id=23988269

[+] seanp2k2|2 years ago|reply

JSON is the version of XML we deserve.

[+] devjab|2 years ago|reply

Nobody deserves XML! In all seriousness I get the idea behind XML and I have used a couple of SOAP services which were absolutely brilliant, but as someone who has spent a decade “linking” data from various sources in non-tech enterprise… Well… let’s just say that I’m being kind if I say that 5% of the services which used XML were doing it in a way that was nice to work with.

Which is why JSON’s simplicity is such a win for our industry. Because it’s always easy to handle. Sure you can build it pretty terrible, but you’re not going to do this: <y value=x> and then later do <y>x</y> which I’m not convinced you didn’t do in XML because you’re chaotic evil. And you’re not going to run into an issue where some Adobe Lifecycle schema doesn’t work with .Net because of reasons I never really found out because why wouldn’t an XML schema work in .Net? Anyway, I’m sure the intentions behind XML were quite brilliant but in my anecdotal experience it just doesn’t work for the Thursday afternoon programmer mindset and JSON does.

[+] crabbone|2 years ago|reply

A catchy but a meaningless phrase. JSON is a dumpster on fire. Probably in even more ways than XML is. Maybe you deserve it... I feel like I'm being punished by the stupid people who make me use it in a way similar to the sham court hearings from The Planet of Apes.

[+] andyjohnson0|2 years ago|reply

Whenever two or more are gathered together, they shall argue about JSON vs XML.

Personally I like the simplicity of JSON and also the expressive power of XML. But then I tend to only use each for the task it was primarily intended: application data-on-the-wire in JSON and "documents" in XML. It seems like a lot of the recurrent discussion around these technologies happens when they're pushed to do things outside their comfort zone. And I wonder if some of this is down to siloing of developer knowledge.

There was a comment on HN a few days ago (not by me, and I can't find it now) to the effect that web development has historically attracted self-taught developers or those who have come to it by routes like bootcamps. It went on to say that they perhaps consequently lack some knowledge of existing techniques and solutions, and therefore tend to recreate solutions that may already exist (and not always well). And this drives the well-known churn in webdev tech: of which bolting schemas onto JSON is arguably an example.

I wonder what people think of this? Personally I think it has some merit, but that the "churn" has also generated (along with much wheel-reinvention) some great innovations. And I say that as someone who works mainly on back-end stuff.

Thoughts?

[+] guideamigo|2 years ago|reply

I wish it had comments. And for that reason, I prefer yaml.

[+] jhoechtl|2 years ago|reply

IS there an on-premise alternative? Not necessarily speaking of schemastore.org on prem but a service comparable in spirit.

[+] unilynx|2 years ago|reply

The 'service' is basically hosting this file: https://www.schemastore.org/api/json/catalog.json - you could host that locally and point your software to it, modifying the other URLs where needed

It's a pity the catalog format doesn't support an 'import' or relative URLs for schemas - would have made local extensions a bit easier.

[+] relequestual|2 years ago|reply

We (JSON Schema) did a case study/interview with the guy behind it https://www.youtube.com/watch?v=-yYTxLZZk58&list=PLHVhS4Tj1Y...

[+] osigurdson|2 years ago|reply

It is interesting that people love json (now with schema), but hate XML while loving HTML at the same time. It is all pretty boring and largely the same imo.

[+] zdragnar|2 years ago|reply

The absolute worst bit of XML is the confused implementations. What should be an attribute on a tag, and what should go between tags? Even worse, nothing is sanely typed without an xsd. Different systems will treat the following differently:

    <some>true</some>

versus

    <some>1</some>

Some systems require the token "true", others will only treat 1 as the boolean true.

For example, MS claims that for exchange ASD boolean values must be integer 1 or 0 [0], but then links to a W3C spec that allows for the tokens true and false [1]

At least with JSON and HTML, you don't need a separate definition file for basic, primitive data types.

[0] https://learn.microsoft.com/en-us/openspecs/exchange_server_...

[1] https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#boolean

[+] slaymaker1907|2 years ago|reply

JSON is a much better serialization format since XML was designed as a document format. For example, there is no standardized way to serialize a string with a null character even if you escape it (this is allowed in many programming languages). JSON just says do “\0” and calls it a day. I’m not sure if it’s better for users, but it’s certainly easier to work with as a dev.

HTML isn’t trying to serialize abstract data and is doing what XML does best in being a document/GUI format. It doesn’t matter all that much that it can’t represent null characters in a standard way because it isn’t a printable character.

[+] fauigerzigerk|2 years ago|reply

JSON is far simpler because it has no namespaces and no entities.

But I think complexity is always 90% culture. It's pretty arbitrary what kind of culture grows around a particular technology.

[+] geysersam|2 years ago|reply

It's very easy to understand why people prefer JSON. 95% of developers know exactly what JSON is without ever having read anything technical about it. It's obvious.

XML on the other hand... Who here can say they actually know anything substantial about XML besides the syntax? My guess is <10%.

[+] HdS84|2 years ago|reply

XML suffers from too many options and useless bells and whistles. E.g. the attribute vs Parameter topic is a source of confusion, without adding much value, especially if the source and target are object oriented and/ or a relational db. What's the point?

Then there are namespaces, sure there are probably lots of places where you need to use them. But I never encountered a place where they are really needed, but because they are the default you need to work with them or your queries do not work. Super confusing for beginners and annoying as heck.

[+] throwawaymaths|2 years ago|reply

Which is better xml design for a pure data payload (not textual content)?

    <foo>something</foo>

Or

    <foo value="something"/>

When you get back with a coherent universal argument, we'll revisit the json vs xml question.

[+] revskill|2 years ago|reply

Look, with JSX you can put anything as props.

JSON is just pure data.

[+] neverrroot|2 years ago|reply

A repository of over 700 JSON schemas for various file types. Quite useful.

[+] lolive|2 years ago|reply

Oh my good. This Semantic Web stuff is going live !

[+] yawnxyz|2 years ago|reply

Took me an embarrassingly long time to figure out you could scroll that list

[+] andenacitelli|2 years ago|reply

JSON Schemas are great.

These are actually what IntelliJ uses to validate all sorts of config files behind the scenes.

For work, we even do code generation off of the Meltano (ETL tool) spec and use it to validate reads and writes to the file (which we edit at application runtime) to catch errors as close to when they actually occur.

[+] mooreed|2 years ago|reply

Does anyone know of a typescript translation for each of those validation models?

Or maybe even a way to discover related statically typed definitions based on the validation rules?

It would be really nice to not define parts of a data model that provide little to no business value - but where you can easily “stub your toe”.

[+] zurn|2 years ago|reply

There are some YAML based schemas there too. How does this work, is there a canonical YAML->JSON transformation, or does JSON schema spec have explicit YAML support?

edit: skipping the theoretical foundations, there seems to be at least this tool that claims to validate yaml against json schema: https://github.com/json-schema-everywhere/pajv

[+] michaelmior|2 years ago|reply

YAML is effectively a superset of JSON although the syntax used in YAML is often different. So you can't translate all YAML to JSON, but all JSON can be represented as YAML.

[+] andirk|2 years ago|reply

Does this have any relation to https://jsonapi.org/ ?

[+] LispSporks22|2 years ago|reply

Heh I remember we did similar registry for XML things.

[+] pasc1878|2 years ago|reply

but XML allowed an easy way to have distributed schemas not needing a central place. The schema. URI could be made a resource that existed at the URI.

The wording gets complex as the URI does not need to exist on the web and that need for exact wording is I suspect a reason for XML to be perceived as complex

[+] osigurdson|2 years ago|reply

See, Steve Ballmer was right after all.

[+] albert_e|2 years ago|reply

can JSON achema be used to describe say the schema of a RDBMS table?

is these some standardization here so I might use a JSON schema that already covers a lot of the fields that are needed to describe columns, constraints etc?

can JSON schema capture relations between fields?

[+] albert_e|2 years ago|reply

is there a good tool or library to create a JSON Schema manually/programmatically ?

146 comments