top | item 29220320

YAML: It's Time to Move On

265 points| firearm-halter | 4 years ago |nestedtext.org | reply

392 comments

order
[+] thefifthsetpin|4 years ago|reply
I don't like YAML and would like to move on, but I hope we don't move onto this.

I think it's crazy that when I add a string to an inline list, I may need to convert that inline list to a list because this string needs different handling. I think it's crazy that "convert an inline list to a list" is a coherent statement, but that is the nomenclature that they chose.

I don't like that a truncated document is a complete and valid document.

But what is most unappealing is their whitespace handling. I couldn't even figure out how to encode a string with CR line endings. So, I downloaded their python client to see how it did it. Turns out, they couldn't figure it out either:

>>> nt.loads(nt.dumps("\r"),top="str") '\n'

[+] nanis|4 years ago|reply
I wish people would stop trying to write programs for which there are no interpreters, compilers, or linters:

    name: Install dependencies
    run:
        > python -m pip install --upgrade pip
        > pip install pytest
        > if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi
That is a program that is hiding in the bowels of a "nestedtext" document ... It is no better than a program that is hiding in the bowels a JSON or YAML document.

We all have to deal with this, but it is beyond stupid.

    [Install Dependencies]
    run=/path/to/install-script
Then, write `install-script` in whatever language you want ... verify it works. It should have tests. etc etc etc.
[+] ljm|4 years ago|reply
It would be nice if YAML wasn't horrendously abused the way it is. You have CI pipelines that let you construct DAGs to represent your builds, but you need several thousand lines of YAML and a load of custom parsing to get programming constructs in the string types, for example. And then each provider has its own way of providing those.

I don't have to re-read manuals describing how to do if/else in Ruby or Java or Lisp, but as soon as yaml and some 'devops' tooling is involved, I have to constantly jump back and forth between the reference and my config.

The main point being that the problem isn't the file format but the products that continue to push it, presumably because hacking stuff on top of `YAML.parse` is less effort than designing something that fits the purpose.

[+] tialaramex|4 years ago|reply
> I don't like that a truncated document is a complete and valid document.

Me either. If your documents have this property you're likely to tempt people to start trying to process partial documents.

When they do that, they violate Full Recognition Before Processing and likely there's a latent security bug as a result.

[+] jrochkind1|4 years ago|reply
> So, I downloaded their python client to see how it did it.

Who are you suggesting the python client belongs to, who is 'they' in 'their'?

[+] stillicidious|4 years ago|reply
Author seems to use misfeatures of a particular implementation to tar all implementations with. The round-tripping issue is not a statement about YAML as a markup language, much in the way a rendering bug in Firefox is not a statement about the web.

Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on. Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy. They're prone to opinion and style, as if replacing some part or other will make the high level problem (that's us) go away. Fretting over perfection in UI is an utterly pointless waste of time.

I don't know what NestedText is and find it very difficulty to care, there are far more important problems in life to be concerned with than yet another incremental retake on serialization. I find it hard to consider contributions like this to be helpful or represent progress in any way.

[+] spicybright|4 years ago|reply
I actually disagree it's bike shedding.

If you can write a bad YAML document because of those mis-features/edge cases, I'd say you've already lost.

Humans are messy, but at the end of the day the data has to go to a program, so a concise and super simple interface has a lot of power to it for humans.

Working at a typical software company with average skill level engineers (including myself), no one likes writing YAML. But everyone is fine with JSON.

I think it's a case of conceptual purity vs what an average engineer would actually want to use. And JSON wins that. If YAML was really better than JSON, we'd all be using that right now.

So does it really matter if YAML is superior if >80% of engineers pick JSON instead?

[+] throwaway81523|4 years ago|reply
> Stepping back a bit, YAML is good enough, and this problem has been incrementally bikeshedded since at least the 1970s, it is time to move on

Nah, in the 1970s we had Lisp S-expressions that completely solved the problem, and everything since then has been regressions on S-expressions due to parenthesis phobia.

After hearing that thing about the country code for Norway, I became convinced that YAML has to just die. Become an ex-markup language. Pine for the fjords. Be a syntax that wouldn't VOOM if you put 4 million volts through it. Join the choir invisible, etc.

This is good: https://noyaml.com/

Erik Naggum had a notoriously NSFW rant about XML (over the top even for him) that I better not link to here, but lots of it applies to YAML as well.

[+] AYBABTME|4 years ago|reply
It'd be bikeshedding if the status quo was good. But it isn't.
[+] dmitriid|4 years ago|reply
> Author seems to use misfeatures of a particular implementation to tar all implementations with.

There's no canonical YAML implementation, and YAML spec is enormous (doubly so if you need to work with stuff like non-quoted strings etc. )

[+] Aloha|4 years ago|reply
If you use YAML in situations where it may need hand editing, it means you actively hate your users.

YAML is patently unsuitable for any use case where the resulting output may require hand editing.

[+] tannhaeuser|4 years ago|reply
> YAML as a markup language

YAML ain't markup language.

[+] Banana699|4 years ago|reply
>Human-convenient interfaces (like YAML, bash, perl) are fundamentally messy because we are messy

I don't know what to make of this statment, it has so much handwaving built-in. The most charitable interpretation I can find is that by 'Human-convenient' you simply meant the quick-and-dirty ideology expressed in Worse Is Better: Does job, makes users contemplate suicide only once per month, isn't too boat-rocking for current infrastructure and tooling.

Taken at face value (without special charitable parsing), this statement is trivially false. Python is often used as a paragon of 'Human-convenience', I sometimes find this trope tiring but whatever Python's merits and vices its _definitely_ NOT messy in design.

Perl is the C++ of scripting languages, it's a very [badly|un] designed language widely mocked by both language designers and users. Lua and tcl instead are languages literally created for the sole exact purpose of (non-) programmers expressing configuration inside of a fixed kernel of code created by other programmers, and look at their design : the whole of tcl's syntax and semantics is a single human-readable sentence, while lua thought it would be funny if 70% of the language involved dictionaries for some reason. These are extremely elegant and minimal designs, and they are brutally efficient and successful at their niches : tcl is EDA's and Network Administration's darling, and lua is used by game artists utterly uninterested in programming to express level design.

'Humans are messy' isn't a satisfactory way to put it. 'Humans love simple rules that get the job done' is more like it. But because the world is very complex and exception-laden, though, simple rules don't hug its contours well. There are two responses to this:

- you can declare it a free-for-all and just have people make up simple rules on the fly as situations come up, that's the Worse Is Better approach. It doesn't work for long because very soon the sheer mountain of simple rules interact and create lovecraftian horrors more complex than anything the world would have thrown at you. Remember that the world itself is animated by extremely simple rules (Maxwell's equations, Evolution by Natural Selection, etc...), it's the multitude and interaction of those simple rules that give it its gargantuan complexity and variety.

- you stop and think about The One Simple Rule To Rule All Rules, a kernel of order that can be extended and added to gradually, consistently and beautifully.

The first approach can be called the 'raster ideology', it's a way of approximating reality by dividing it into a huge part of small, simple 'pixels' and describing each one seperately by simple rules. I'm not sure it's 'easy' or 'convenient', maybe seductive. It promises you can always come up with more rules to describe new patterns and situations, and never ever throw away the old rules. This doesn't work if your problem is the sheer multitude and inconsistency of rules. The second approach is the 'vector ideology', it promises you that there is a small basis of simple rules that will describe your pattern in entirety, and can always be tweaked or added to (consistently!) when new patterns arise, the only catch is that you have to think hard about it first.

[+] posharma|4 years ago|reply
It's really sad to see the pervasiveness of JSON. For one thing its usage as a config file is disturbing. Config files need to have comments. Second, even as a data transfer format the lack of schema is even more disturbing. I really wish JSON didn't happen and now these malpractices are so widespread that it's hurting everyone.
[+] jackjeff|4 years ago|reply
JSONC. JSON with comments. And even if your favorite parser does not support it natively it’s not so hard to add with a very simple pre-lexer step.

JSON schemas exist and they’re ok for relatively simple things. For more complex cases I find myself wishing I could just turn Typescript into some kind of schema validation for JSON.

[+] matja|4 years ago|reply
Seems to me that YAML just needs type/schema support to be less of a hurdle.

As an alternative, the encoding/decoding roundtrip using protobuf seems reasonable to me, catches the footgun of using floating-point version numbers (it becomes a parse error), whitespace/multiline concatenation being more obvious, and allowing comments (compared to JSON):

  ( cat << EOF
  # yes, comments are allowed
  name: "Python package"
  on: "push"
  build {
    python_version: ["3.6", "3.7", "3.8", "3.9", "3.10"]
    steps: [
      {
        name: "Install dependencies"
          run:
            "python -m pip install --upgrade pip\n"
            "pip install pytest\n"
            "if [ -f 'requirements.txt' ]; then pip install -r requirements.txt; fi\n"
      },
      {
        name: "Test with pytest"
        run: "pytest\n"
      }
    ]
  }
  EOF
  ) | protoc --encode=Config config.proto  | protoc --decode=Config config.proto
  
  name: "Python package"
  on: "push"
  build {
    python_version: "3.6"
    python_version: "3.7"
    python_version: "3.8"
    python_version: "3.9"
    python_version: "3.10"
    steps {
      name: "Install dependencies"
      run: "python -m pip install --upgrade pip\npip install pytest\nif [ -f \'requirements.txt\' ]; then pip   install -r requirements.txt; fi\n"
    }
    steps {
      name: "Test with pytest"
      run: "pytest\n"
    }
  }
[+] afavour|4 years ago|reply
JSON Schema is an official thing that exists and has implementations in all major languages. Personally I’m very glad that it’s an opt-in addition rather than a requirement.

(I agree with you about comments though)

[+] IshKebab|4 years ago|reply
I agree, but I would recommend JSON5 as the solution. Not YAML or this abomination.

JSON5 has many advantages:

* Superset of JSON without being wildly different. I know YAML is a superset of JSON but it's completely different too. Insane.

* Unambiguous grammar. YAML has way too many big structure decisions that are made by unclear and minor formatting differences. My work's YAML data is full of single-element lists that shouldn't be lists for example.

* Comments, trailing commas

* It's a subset of Javascript so basically nothing new to learn.

* It has an unambiguous extension (.json5). I think JSONC would be a reasonable option but everyone uses the same extension as JSON (.json) so you can never be sure which you are using. E.g. `tsconfig.json` is JSONC but `package.json` is just JSON (to everyone's annoyance).

* Doesn't add too much of Javascript. I wouldn't recommend JSON6 because it's just making the format too complicated for little benefit.

[+] runarberg|4 years ago|reply
Tools that use JSON as configuration format could simply allow certain unused keys (e.g. all keys starting with #) and promise never to use them. Then author can write their comments with something like:

    {
      "name": "my-tool",
      "#comment-1": "Don’t change the version!",
      "version": "42.1337.0"
    }
[+] benibela|4 years ago|reply
And I am just writing a JSON de/serializer to move my config from the current system to JSON. I worked on it today and yesterday and several days some time ago.

This situation makes me feel rather silly

[+] umvi|4 years ago|reply
So you prefer the "good old" XML days? I'll take comment-less JSON over XML any day

(and it doesn't have to be comment-less... JSON with comments is a thing and VSCode has syntax highlighting for it - just strip out the comments before parsing).

[+] Waterluvian|4 years ago|reply
My opinion only: I love JSON because it lacks so many foot guns of yaml. If you’re doing lots of clever stuff with yaml you probably want a scripting language instead. Django using Python for configs made me fall in love with this. Spending years with the unmitigated disaster that is ROS xml launchfiles and rosparams makes me love it even more.

Yaml and toml are fine if you keep it simple. JSON direly needs comments support (but of course wasn’t designed to be used as a human config file format so that’s kind of on us). And not just “Jsonc that sometimes might work in places.”

Beyond that, I think we generally have all the things we need and I don’t personally think we need yet another yaml. =)

[+] woodruffw|4 years ago|reply
These aren't foot-guns per se, but I can think of another handful of grievances I have with JSON:

* JSON streaming is a bit of a mess. You can either do JSONL, or keep the entire document in memory at once. I usually end up going with JSONL.

* JSON itself doesn't permit trailing commas. I can measure the amount of time that I've wasted re-opening JSON files after accidentally adding a comma in days, not hours.

* JSON has weakly specified numbers. The specification itself defines the number type symbolically, as (essentially) `[0-9]+`. It's consequently possible (and common) for different parsers to behave differently on large numbers. YAML also, unfortunately, has this problem.

* Similarly: JSON doesn't clearly specify how parsers should behave in the presence of duplicate keys. More opportunity for confusion and bugs.

[+] iamleppert|4 years ago|reply
I’ve never liked YAML. For whatever reason, it always feels like working in a mine field. It comes from the same cargo cult of people who think the problem with human machine formats is that it needs to be “clean”.

Clean, of course to them means some bizarre aesthetic notion of removing as much as possible. Only it’s taken to an extreme. I wonder if the same people also think books would be better with all punctuation be removed to make it look “clean”?

It’s unhealthy minimalism, causes more problems than it solves. As soon as I see a project using YAML I cringe and try to find an alternative because god knows what other poor choices the developer has made. In that sense, YAML can be considered a red herring and I’m usually right. The last project I used that adopted an overly complex and build-breaking YAML configuration syntax had other problems hiding under the covers, and in some cases couldn’t parse it’s own syntax due to YAML’s overly broad but at the same time opinionated syntax.

Just say no to YAML.

[+] avsteele|4 years ago|reply
I'll give my opinion as someone who has to choose among JSON, XML, TOML, and YAML about two years ago for a new project. Whatever I chose had to be easy for end-users who don't know the specification to to understand later.

Here were my thoughts on the options.

JSON - No comments -> impossible

XML - Unreadable

YAML - 2nd place. Meaningful indentation also made me worried someone was going to not understand why their file didn't work. The lack of quotes around strings was frustrating.

TOML - 1st place. Simpler than YAML to read & parse. It truly seems 'obvious' like the name says.

I haven't encountered any situations where I wish I had more than TOML offers.

[+] resonious|4 years ago|reply
A lot of people have really strong opinions towards syntax things like YAML vs JSON vs XML, HTML, even programming languages. I think at some point we assign way too much importance to this kind of stuff.

I recently read a piece by Joel Spolsky that resonated with me (even though my career is not nearly as long as his).

> I took a few stupid years trying to be the CEO of a growing company during which I didn’t have time to code, and when I came back to web programming, after a break of about 10 years, I found Node, React, and other goodies, which are, don’t get me wrong, amazing? Really really great? But I also found that it took approximately the same amount of work to make a CRUD web app as it always has, and that there were some things (like handing a file upload, or centering) that were, shockingly, still just as randomly difficult as they were in VBScript twenty years ago. [0]

It makes me wonder if we're really focusing on the right stuff. Maybe there's lower hanging fruit somewhere that's more valuable than focusing on fundamentally subjective things like syntax.

[0]: https://www.joelonsoftware.com/2021/06/02/kinda-a-big-announ...

[+] georgewfraser|4 years ago|reply
A radically different alternative with a lot going for it is Starlark: https://github.com/bazelbuild/starlark

It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.

[+] DonHopkins|4 years ago|reply
How about simply using pure full blown JavaScript or Python for config files, and not hiring people who you can't trust not to write infinite loops?

Or if you really must, then simply interrupt processes that loop infinitely, and fix the bugs that caused it.

You know, like you already do when you have an infinite loop.

Infinite loops are not the end of the world, you know. Processes can be interrupted, and computers have reset buttons.

[+] OskarS|4 years ago|reply
> Starlark is a dialect of Python. Like Python, it is a dynamically typed language with high-level data types, first-class functions with lexical scope, and garbage collection.

If it has first-class functions, how can you avoid infinite recursion? Like, what stops me from running the omega combinator in it? This is why Meson (a similar language) does not allow those kinds of shenanigans, to keep the language non-Turing-complete.

[+] civilized|4 years ago|reply
Not a bad idea but only implemented in Rust, Go, and Java so far. Meanwhile, all sorts of languages can interpret JSON and YAML.

It's a cool idea to do configuration in a subset of Python but now you have to go implement that subset in every language.

[+] vlovich123|4 years ago|reply
Have you had any experience building on top of it directly outside of blaze/bazel?
[+] remram|4 years ago|reply
Interesting! I started using jsonnet this year, but found that the language was needlessly quirky (e.g. the `::`, purely functional aspect, and no one wants to learn a new language to write configuration in the first place). More importantly, it is extremely slow (lazy evaluation without memoization...): rendering the Kubernetes YAML of my 5-container app taking over 10 seconds...

I will look into this further.

[+] xiaq|4 years ago|reply
> It’s a deterministic subset of Python. This means that if you have complex or repetitive configurations, you can use loops and functions to structure them. But it’s impossible to write an infinite loop or recursion.

Starlark is indeed deterministic and guaranteed to terminate (the Go implementation has a flag that allows recursion, but it's off by default), but these are two orthogonal properties.

[+] seedless-sensat|4 years ago|reply
Plenty of tools lacking in the Starlark environment, e.g.: generating Starlark files, machine editting Starlark maps
[+] im3w1l|4 years ago|reply
So one thing I wasnt sure of is: If you have a Starlark program how is the value of it decided? Is it simply the value of the last expression? And where does the print-output end up? Is it just for diagnostics and has no influence on the value?
[+] account-5|4 years ago|reply
I like INI. It's simple it's readable and it leaves the data types up to the application to interpret. It's also really easy to parse, I can work out how to do it and JSON is beyond me.

I like CSV (and similar delimited files) it's less verbose than anything else for tabular data.

I like JSON for data transfer, you know the data types, it's succinct, and readable.

I personally don't need anything else.

[+] zmmmmm|4 years ago|reply
I have to say I hate the fact that I have low confidence when editing YAML that the result will be what I intend. It's kind of the number one job of such a format. And I routinely run into people using advanced features and then I have no idea at all how to safely edit it. It is interesting that it seems so difficult to pick a good tradeoff between flexibility and complexity with these kinds of languages.
[+] pjmlp|4 years ago|reply
I just stick to XML unless forced to use something else.

Schema validation, code completion on IDEs, endless amount of tooling including graphical visualisation, a language for data transformation and queries, and.... wait for it... comments!

[+] diob|4 years ago|reply
What is the obsession with removing braces? I will never find the lack of clear demarcations (relying on indent) easier than braces.
[+] todd8|4 years ago|reply
I was surprised the first time I saw Daniel J. Bernstein's qmail configuration. Qmail uses separate configuration files for each parameter being set. The directory /var/qmail/control contains most of these files.

For example, to set the maximum message size to by 10Mb and to set the timeout to be 30 seconds:

    echo 10000000 > /var/qmail/control/databytes
    echo 30 > /var/qmail/control/timeoutsmtpd
There are many more files like this that hold simple values. /var/qmail/control/locals is a file that is a list of domain names, one per line.

Dictionaries are just subdirectories with one file per entry, for example this is how aliases are defined to qmail:

    echo fred > /var/qmail/alias/.qmail-postmaster
    echo fred > /var/qmail/alias/.qmail-mailer-daemon
See [1] for more about qmail.

DJB also created a simple, portable encoding for serializing data called netstrings, see [2]. XML, YAML, JSON, TOML, and INI files all have some advantages over netstrings, but netstrings are simple to understand and simple to parse correctly.

[1] https://www.oreilly.com/library/view/qmail/1565926285/ch04.h...

[2] https://en.wikipedia.org/wiki/Netstring

[+] badrabbit|4 years ago|reply
My opinion: I can live with yaml and json. Toml,tjson if I have to. Xml with a gun to my head. But I don't want yet another markup language (ironically that's what YAML stands for)
[+] it_does_follow|4 years ago|reply
> YAML is considered by many to be a human friendly alternative to JSON

I'm not disagreeing with the author here, but as someone old enough to remember the rise of XML as a data transmission format (and Erik Naggum's masterful rant against it[0]), it's strange because historically speaking both XML and JSON were also popularized as more "human readable".

I would be curious how many HNers (and even more so newer developers outside the HN-o-sphere) have worked extensively with or even written parsers for binary (or otherwise non-human readable) file formats. Writing an MP3 metadata parser used to be a standard exercise for devs looking to level up their programming skills a bit.

It personally feels weird to me that we would keep pushing for more "human readable" data formats when the world is increasingly removed from one where non-programmer humans need to read data. Keep your data in whatever format make sense and let software handle transforming it to a more readable or more efficient format depending on the needs, even if humans can't read it (they shouldn't need to!).

On top of all that my experience has been that JSON leads to more atrocities than XML (while fully agreeing with all of Erik Naggum's points about that) and YAML creates even worse horrors than JSON. It seems we'll soon be approaching eldritch horrors if we continue to pursue human readable data exchange formats.

0. https://www.schnada.de/grapt/eriknaggum-xmlrant.html

[+] rendall|4 years ago|reply
I don't understand this. YAML has limitations. All formats have limitations. If a format is too limiting, don't use it. Pick one more suitable, or come up with another one, like NestedText (or whatever). What is this need to tell everyone else to "move on" from using some format because it doesn't sit your specific preferences or use case?
[+] bussierem|4 years ago|reply
As a fun overview of the problem we're discussing, here's a rough list of the various mentioned languages in this comment section:

  - YAML
  - JSON
  - JSONC
  - XML
  - TOML
  - INI
  - CSV
  - NestedText
  - Starlark
  - Python
  - Dhall
  - Cue
  - Jsonnet
  - DADL
  - EDN
  - HCL
[+] bschwindHN|4 years ago|reply
I post this pretty much every time this topic comes up:

JSON5 exists, and is quite nice. I've picked it up for configs on a work project and haven't once had an issue due to misconfiguration, unexpected parsing, or friction with leaving a trailing comma or a comment.

The nesting in JSON5 is simple and familiar to pretty much all programmers, unlike deep nesting in TOML which is a huge pain.

[+] morelisp|4 years ago|reply
The introduction keeps citing "no need for escaping or quoting" as a major advantage, but provides no examples of what a key with a colon, or value beginning with "[", or any datum with leading or trailing whitespace would look like.

Also, the changelog is quite frightening!

> [In 3.0], `[ ]` now represents a list that contains an empty string, whereas previously it represented an empty list.

[+] shruubi|4 years ago|reply
Looking at the comparison examples between TOML and YAML/NestedText, I fail to see how anyone can look at the YAML/NestedText and think "yeah, this is way easier to read and reason about than TOML".

I'm not even a Rust person. I've never worked in Rust in my life, so there is no "preference bias" in my comparing the two. I just don't find YAML, or this "improvement" as "human-readable" as people make out to be.