top | item 10430951

Fastest JSON parser in the world is a D project?

97 points| micaeloliveira | 10 years ago |forum.dlang.org | reply

70 comments

order
[+] _Codemonkeyism|10 years ago|reply
Sounds to me like VW or Nvidia:

(Why so fast)

"On the downside I did not validate the unused side-structures. I think it is not necessary to validate data you are not using. So basically I only scan them so much as to find where they end. Granted it is a bit of optimization for a benchmark, but is actually handy in real-life as well."

[+] kal31dic|10 years ago|reply
Not really, because the only people that benefited from VW were Vw managers, whereas as a user I am quite happy to trade off speed for not validating useless fields when processing terabytes of JSON. And he is quite open about the tradeoff, so it's a perfect analogy except for what's different!
[+] simgidacav|10 years ago|reply
So, how is it going with D? Last time I gave it a look I really liked it, but I sadly left the place when I hit the multiple standard libraries problems.

Anyone using D in real life, among hackers here?

[+] chipsy|10 years ago|reply
I started working with it for DSP and game code recently. It hasn't posed any major issues yet. I am getting what I wanted out of the language - something that is more modern than C or C++, but retains much of the root lineage. There is a lot of room to configure things to your liking and disable things you don't want.

The standard library forking issue is far in the past now - which doesn't mean that it's as complete and comprehensive as it could be yet - for an example of one I ran into the other day, the "pure" annotation is absent from some math functions because they call out to C standard libraries which use global state for error codes.

But the things that are the focus now are mainly "nice-to-have" technologies that will be good for productivity - check out "std.experimental.allocator" for an idea of what's cooking. There are also multiple implementations of the compiler tech rolling around now, not just the Digital Mars one, which is a good sign for future quality.

[+] EvenThisAcronym|10 years ago|reply
I never use anything else anymore for personal projects, unless I absolutely have to due to missing a certain library. A well-known phenomenon in the D community is that it "spoils" you such that when you have to use another language all you can think about is how much easier it would be to do your task in D, and how you miss certain features from D (such as the phenomenally better template syntax compared to C++). I've experienced this many times, such as when writing a large-ish Python script, or when working on a mobile game in C# + Unity. I really wish I could use D at work.
[+] jeremiep|10 years ago|reply
Using D for most of my personal projects, which are almost all pet game engines. Its a real joy to use. I get to write lighting fast code with the productivity I'm used to with high-level languages. Its really the best of both worlds.

I couldn't dream of writing all of the features I now have in another systems language because of all the extra scaffolding it would require, and I couldn't hope to get anywhere near the performance I have in higher-level languages because I'd lose value types and manual memory management.

The multiple stdlib problem has been solved for years now.

[+] kal31dic|10 years ago|reply
Hi. I only joined hacker news relatively recently, but I have been programmimg since 1983. My first 'open source' contribution was to Tom Jenning's bulletin board routing algorithm in 1989.

I'm in the hedge fund world, and my day job is investing but I use technology to help me do that. Andy Smith gave a talk on using D at a 20bn+ hedge fund at dconf. My background is at similar large funds, but I am now using D to develop some tools to help the investment process at a smaller but decent sized fund. A couple of D people will be helping me. So it's ready for real work, and the combination of high productivity with efficiency and correctness is a killer feature for my problem set. Fast compilation is also important as its a dynamic environment and you want to iterate quickly.

[+] qznc|10 years ago|reply
That must have been years ago. Since version 2.0 there is only one standard library.

Using it for small personal projects. Happy. :)

[+] Ace17|10 years ago|reply
I'm using D (gdc) for all my personnal and professional projects. At the moment, the only reason I have to sometimes regret C++ is Emscripten, which only accepts C and C++ as input (although it might change one day thanks to ldc compiler).

Everything else works: calling C functions, ctags, syntax highlighting, automatic make dependency generation, integration in Visual Studio, step by step debugging, profiling (oprofile), valgrind, etc. As a bonus, D makes a fantastic "scripting" language (= no explicit compilation step): at work, we're progressively replacing all of our bash scripts with D scripts.

[+] bachmeier|10 years ago|reply
I use it for econometrics. I write some parts in R and as much as I want in D. I create a dynamic library of the D code and call those functions trivially from R.
[+] sgt|10 years ago|reply
On another note, I see Scala is doing really poorly in the JSON benchmark test: https://github.com/kostya/benchmarks#json
[+] waxjar|10 years ago|reply
Those aren't very fair to the dynamic languages and JIT compiled languages (this includes Scala).

* For the dynamic languages execution time includes the time it takes to lex, parse and interpret the source code.

* For language implementations with a JIT execution time includes the time the JIT takes to properly optimise hot code paths. Generally you start benchmarking after a warm up period in such cases.

The only fair comparisons are those between ahead of time compiled languages.

[+] valarauca1|10 years ago|reply
The benchmark suited used the STDLIB json parser which is slow as hell. How slow is it? In 2011 there was a proposal to deprecate it, which never happened [1].

Why? STDLIB parser loads the file line-by-line, then copies line by line into a new buffer that joins the strings together. Then the parser is called. The Scala community normally circumvents by using non-standard community developed solutions.

Furthermore JIT vs Compiled language benchmarks are pretty unfair to JIT'd languages. Especially for the JVM which doesn't start to compile sections until >10,000 calls.

[1] https://groups.google.com/forum/m/#!msg/scala-user/P7-8PEUUj...

[+] dbcfd|10 years ago|reply
Although D may have a fast json parser, this benchmark is a horrible comparison. Really need a better way to do cross language comparisons.
[+] EvenThisAcronym|10 years ago|reply
The repo owner is accepting pull requests for the benchmarks so at least it's fixable.
[+] mtanski|10 years ago|reply
If I had a nickel for every time I've seen a language benchmark be a very specialized contrived problem (in this case specific JSON with specific access pattern) I'd have a lot of nickels.
[+] tacone|10 years ago|reply
> Yep, that's right. stdx.data.json's pull parser finally beats the dynamic languages with native efficiency. (I used the default options here that provide you with an Exception and line number on errors.)

Nice to see that it only took a couple of years for the D community to beat in speed the scripting languages.

[+] nkozyra|10 years ago|reply
It's literally the next paragraph that extends this:

"A few days ago I decided to get some practical use out of my pet project 'fast' by implementing a JSON parser myself, that could rival even the by then fastest JSON parser, RapidJSON. The result can be seen in the benchmark results right now:

https://github.com/kostya/benchmarks#json

fast: 0.34s, 226.7Mb (GDC) RapidJSON: 0.79s, 687.1Mb (GCC)"

[+] ma2rten|10 years ago|reply
Python's json parser is written in C.
[+] z3t4|10 years ago|reply
The D people laugh at the dynamic type languages, yet they have "auto" in-front of all variables ;)
[+] geofft|10 years ago|reply
To give a little more detail on the difference between type inference and dynamic typing, the following code is valid in a dynamic language, but a static language rejects it:

    auto x = 15;
    if (some_condition) {
        x = "Hello world!";
    }
    print(x);
And this solves real errors. I believe there's even a function somewhere in the Python standard library that, if it finds one result, returns a string, and if it finds multiple results, returns a list of strings. Of course a string is itself iterable, so duck-typing goes horribly wrong.
[+] coldtea|10 years ago|reply
Auto is just a keyword to invoke/assist type inference in the parser -- the objects are still statically typed. C++ 11x (or however it's called) uses the same.

You were probably thinking of something like the "dynamic" type that C# has.

[+] Drup|10 years ago|reply
And apparently, you have no idea what type inference is. ;)
[+] qznc|10 years ago|reply
Infered type is different than dynamic type.
[+] wfunction|10 years ago|reply
> Yep, that's right. stdx.data.json's pull parser finally beats the dynamic languages with native efficiency.

'dynamic' is the key word here. The title is misleading.

[+] masklinn|10 years ago|reply
You may want to read past that part. The author was just pointing out that only around DMD 2.067 (that's March 2015 or later) did D finally get a JSON library which did genuinely better than those in dynamic language stdlibs, which hadn't been the case beforehand (the stdlib DMD 2.067 rivaling but not being better than Python's json on the author's system).

That's setup for the reveal of "fast" which not only does better than dynamic language stdlibs and than the previous fastest D library but does better than any other JSON parser.

The point of the historical recap is how fast things improved for the D ecosystem: in 7 months the best-case option went from parsing JSON 2~3 times slower than Ruby to parsing it in half the time of RapidJSON, a 2 orders of magnitude improvement in speed.

[+] vamega|10 years ago|reply
This post is about the fast project, which is claimed to beat RapidJson, a very fast C++ json parser.
[+] syllogism|10 years ago|reply
The fast Python json libraries are all C extensions, though. The only "dynamic" thing they have to do is create the actual Python objects when they're done.
[+] merb|10 years ago|reply
I don't know but the benchmarks are aweful, since scala and python using their built in Json Parsers and not the fastest parsers available and he is comparing some other super fast libraries against D and says "hoho" it's so fast.

Benchmarks are for the poor people who can't live in the real world.

[+] dbcfd|10 years ago|reply
Benchmarks would be fine if they could somehow actually compare performance between languages. I have yet to see any benchmarks between languages that are even close to being results that would actually be seen.
[+] kal31dic|10 years ago|reply
A strange way of putting it. He started out using the standard library for D with dmd compiler, and then people submitted pull requests to give better information about choices and he incorporated these.

Anyone else is free to do the same.

If you know of a faster parser in any language, I would love to hear, because getting the job done well matters more to me.