top | item 12364393

The Fixing-JSON Conversation

49 points| robin_reala | 9 years ago |tbray.org | reply

54 comments

order
[+] Zardoz84|9 years ago|reply
SDLang !!!! : https://sdlang.org/

Full example : https://github.com/Abscissa/SDLang-D/wiki/Language-Guide#exa...

Examples:

Creating a Tree

    plants {
        trees {
            deciduous {
                elm
                oak
            }
        }
    }
Creating a Matrix

    myMatrix {
       4  2  5
       2  8  2
       4  2  1
    }
A Tree of Nodes with Values and Attributes

    folder "myFiles" color="yellow" protection=on {
        folder "my images" {
            file "myHouse.jpg" color=true date=2005/11/05
            file "myCar.jpg" color=false date=2002/01/05
        }
        folder "my documents" {
            document "resume.pdf"
        }
    }
Date and Date/Time Literals (and comments!)

    # create a tag called "date" with a date value of Dec 5, 2005
    date 2005/12/05

    # a date time literal without a timezone
    here 2005/12/05 14:12:23.345

    # a date time literal with a timezone
    in_japan 2005/12/05 14:12:23.345-JST
[+] _pmf_|9 years ago|reply
That's beautiful!
[+] RoryH|9 years ago|reply
I think now is a good time to re-quote the man himself... Douglas Crockford..

https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...

  I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.  I know that the lack of comments makes some people sad, but it shouldn't. 

  Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
[+] coldtea|9 years ago|reply
>I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.

I call BS. If people want to have custom parsing directives they can send them out of band, encode them in the filename, or whatever. But they don't. And I've not seen this happening with most other serialisation formats either, so why would JSON be a particular target? After all it's value comes from being trivially parsable across languages, and that would be killed by custom parsing directives. Those wanting those would also implement their own parsers etc.

Addition: Besides, reading comments to decide how to parse, implies either "comments on top of the file" or a "2 stage parsing".

With 2 stage parsing, you could implement comments and whetever else yourself, even in pure JSON anyway.

As for "comments on top of the file", well, just disallow them (only allow comments after the first JSON object starts), and no issue with "parsing directives" anymore...

[+] moonshinefe|9 years ago|reply
Okay, so the reasoning is we remove a highly useful feature that most people who use JSON regularly want, because some people were abusing it and using terrible practices?

That's terrible reasoning.

[+] realharo|9 years ago|reply
The second part basically says "don't use JSON for config files, use some unspecified JSON superset".
[+] DonHopkins|9 years ago|reply
Ugh. Comments are also useful for disabling things without actually deleting them.

My hackey in-band work-around is for the persistence layer (and other code) to ignore any dictionaries in an array that have the "//" key, so I can put a "//": "DISABLED" key at the top of a dict to disable it (and document that and why it's disabled).

[+] thymelord|9 years ago|reply
Timestamps are so complicated once you factor in timezones and daylight savings that it doesn't belong in JSON. Time zones are not static. They can change from country to country, or even states within countries. Ditto for when daylight savings is enacted during the year - even changing over the years. There is no rhyme or reason to any of this. The data for this has to be stored in tables and time zone meanings can change retroactively. The only reliable time stamp is UTC without leap seconds. (Speaking of leap seconds, who thought seconds going from 0 - 60 rather than 0 - 59 was a good idea?)

Accurate time is one of the most difficult things to model in computer science.

[+] realharo|9 years ago|reply
Time is actually quite simple if you have a good mental model of what you're trying to represent and don't try to mix different concepts into a single value.

This talk explains it VERY nicely: https://www.youtube.com/watch?v=2rnIHsqABfM

Basically just decide whether you're trying to store an absolute time (a timestamp will do) or a civil time (year, month, day, etc.) and treat them as two separate data types.

(If you just use "civil time + offset from UTC" like RFC 3339 does, then you can convert it to an absolute time, but you can convert only that one specific value using that offset, and not any other - i.e. that offset is not a substitute for an actual timezone identifier.)

[+] espadrine|9 years ago|reply
> The only reliable time stamp is UTC without leap seconds.

That doesn't make a lot of sense, as UTC does have leap seconds. It is similar to saying that the fastest car has no wheels, when you really mean that the fastest vehicle is a rocket.

TAI is the most reliable and easiest to work with. It relies on atomic clock seconds at sea level. https://en.wikipedia.org/wiki/International_Atomic_Time

However, it already has a difference of roughly 40 seconds with UTC (and therefore civil time), and dropping leap seconds in civil time will shift midnight to later in the day.

But as the frequency of leap seconds rapidly increases, maintaining UTC will become harder. They will consider dropping leap seconds from UTC in 2023. It is unlikely that people care about having the sun rise at midnight in 30000 years.

[+] thaumasiotes|9 years ago|reply
> (Speaking of leap seconds, who thought seconds going from 0 - 60 rather than 0 - 59 was a good idea?)

Who thought February going from 1-29 instead of 1-28 was a good idea?

I don't understand why everyone seems to believe that the phenomena are so inherently different.

[+] wtbob|9 years ago|reply
I think the idea about removing whitespace is kinda hilarious, because that would mean one could write:

    ["a" "b" "c"]
or:

    {"a": "foo" "b": "bar" "c": "baz"}
The advantage over:

    (a b c)
or:

    (a foo b bar c baz)
or:

    ((a foo) (b bar) (c baz))
or:

    ((a . foo) (b . bar) (c . baz))
seems … non-existent.
[+] tarnacious_|9 years ago|reply
I don't think # or // for comments is a very good idea as it would also make newline characters significant. I find it useful to be able store a JSON object per-line.
[+] sanqui|9 years ago|reply
Personally, I would really like to see integer object keys (as opposed to only string keys). For simple numeric transformations, strings feel really heavy and require annoying conversion in languages. E.g. {"10": 60, "42": 2}.
[+] niftich|9 years ago|reply
The flipside is that an integer-keyed map is similar in meaning to an array, which associates by virtue of placement, an integer with the value sitting at that index.

While it's possible to spec it to forbid this interpretation, Lua has made this interpretation a language feature, and it'd become impossible to construct an unambigous parser/printer in Lua for this new format.

[+] fiatjaf|9 years ago|reply
This guy is an idiot anyway. There's no way to "fix JSON". All you can do is create a new language, it doesn't matter if you call it JSON 2.0, it will still be incompatible with all the JSON parsers of today. I don't get why he is so mad at people suggesting him to use one of the JSON supersets that exist today.
[+] willvarfar|9 years ago|reply
If // and /* are used as comments, then most of this new extended-JSON will still be valid Javascript.

If # is used as comments, then this breaks documents being Javascript.

The post says that "don't eval() JSON ever", but that's like Crockford leaving out comments originally in order to stop them being abused as processor directives...

[+] daenney|9 years ago|reply
Like the post says, JSON is already not guaranteed to be valid JS so this isn't really a problem. The fact that 99% of the time it works to just eval it is great and granted, the "feature" that triggers this is incompatibility is a bit obscure.

But if you just do the right thing from the start you'll never have a thing to worry about in the first place, # comments or not.

[+] velox_io|9 years ago|reply
JSON (for the most part) is a nice format to work with, aside from loosely defined datetimes as mention.

The two areas where I believe the format can greatly be improved; 1# having a standard to define the structure (sometimes schemas can be handy!); 2# a stranded binary format, yes right now with have UBJSON (which doesn't have a date format, this is worse in binary) and BSON (which contains some MongoDB specific stuff).

I'm not saying they don't have their place, but.. Protocol Buffers are more akeen to .net or Java serialization, in the they're quite fragile if used with different versions and/ or with different vendors.

[+] fiatjaf|9 years ago|reply

    “Just use X” · For val­ues of X in­clud­ing Hj­son, Ama­zon Ion, edn, Tran­sit, YAML, and TOML. ¶
    Nah, most of them are way, way rich­er than JSON, of­ten with fully-worked-out type sys­tems and Con­cep­tu­al Tu­to­ri­als and so on.
What? MOST OF THEM? YAML is not, Hjson is not, TOML is not.
[+] gengkev|9 years ago|reply
YAML... really? Looking at the examples in the Wikipedia article (https://en.wikipedia.org/wiki/YAML) gives me a headache. Fortunately, most actual YAML files I've seen are not that complicated.
[+] niftich|9 years ago|reply
He summarized the most upvoted posts from the last thread [1] really well.

[1] https://news.ycombinator.com/item?id=12328088

Regarding datetimes, it's worth pointing out the conversation that TOML had about it. It's a pretty long read [2][3][4][5] with lots of points raised for and against, but it also shows some of the process of how consensus was eventually forged: through trial-and-error, some enlightening realizations, expert opinions, and a willingness to leave some aspects of the behavior it up to parser, to avoid requiring all other languages to reimplement half of Java 8 Time.

[2] https://github.com/toml-lang/toml/pull/414

[3] https://github.com/toml-lang/toml/pull/362

[4] https://github.com/toml-lang/toml/issues/412

[5] https://github.com/toml-lang/toml/issues/263

The salient point being that RFC 3339 does not in truth describe exactly one datatype, so you can't just reference the spec and hope everyone reads it the same way. EDIT: Specifically, RFC 3339 says:

"Date and time expressions indicate an instant in time. Description of time periods, or intervals, is not covered here.", but then goes on to define [6] a number of different syntaxes in ABNF, to indicate the subsets of ISO 8601 that "SHOULD be used in new protocols on the Internet." It essentially never defines what a 'valid' RFC 3339 object looks like, it doesn't explicitly say which ones are considered complete representations, so it's not clear if, say, '2016' is a valid RFC 3339 object... but the ones towards the bottom contain more than one discrete term, and can be presumed to be 'complete' representations. These are:

[A] partial-time: HH:MM:SS(.SSS)

[B] full-date: YYYY-MM-DD

[C] full-time: 'partial-time' +/- offsetFromUTC(HH:MM)

[D] date-time: 'full-date' "T" 'full-time'

Out of these, [D] is clearly a timestamp of an absolute instant in time, but the rest are debatable.

[6] https://tools.ietf.org/html/rfc3339#section-5.6

[+] outsidetheparty|9 years ago|reply
> He summarized the most upvoted posts from the last thread [1] really well.

I feel like he glossed right past the objections to the biggest and (to my mind) most destructive proposed change, the commas-to-whitespace thing; in fact doubles down on it (let's just declare that commas are whitespace! That surely won't confuse anyone!)