(no title)
marknadal | 6 years ago
Meanwhile I can barely get Chrome/NodeJS to parse 20MB in less than 100ms :(.
How useful (or useless) would Simdjson as a Native Addon to V8 be? I assume transferring the object into JS land would kill all the speed gains?
I wrote my own JSON parser just last week, to see if I could improve the NodeJS situation. Discovered some really interesting factoids:
(A) JSON parse is CPU-blocking, so if you get a large object, your server cannot handle any other web request until it finishes parsing, this sucks.
(B) At first I fixed this by using setImmediate/shim, but discovered to annoying issues:
(1) Scheduling too many setImmediates will cause the event loop to block at the "check" cycle, you actually have to load balance across turns in the event loop like so (https://twitter.com/marknadal/status/1242476619752591360)
(2) Doing the above will cause your code to be way slow, so a trick instead, is to actually skip setImmediate and invoke your code 3333 (some divider of NodeJS's ~11K stack depth limit) times or for 1ms before doing a real setImmediate.
(C) Now that we can parse without blocking, our parser's while loop (https://github.com/amark/gun/blob/master/lib/yson.js) marches X byte increments at a time (I found 32KB to be a sweet spot, not sure why).
(D) I'm seeing this pure JS parser be ~2.5X slower than native for big complex JSON objects (20MB).
(E) Interestingly enough, I'm seeing 10X~20X faster than native, for parsing JSON records that have large values (ex, embedded image, etc.).
(F) Why? This happened when I switched my parser to skip per-byte checks when encountering `"` to next indexOf. So it would seem V8's built in JSON parser is still checking every character for a token which slows it down?
(G) I hate switch statements, but woah, I got a minor but noticeable speed boost going from if/else token checks to a switch statement.
Happy to answer any other Qs!
But compared to OP's 2.5GB/s parsing?! Ha, mine is a joke.
bjoli|6 years ago
mianos|6 years ago
huhnmonster|6 years ago
wingi|6 years ago
zbjornson|6 years ago
> JSON parse is CPU-blocking, so if you get a large object, your server cannot handle any other web request until it finishes parsing
Well, your CPU core is busy on one request or another, so I don't understand why this is an issue as long as you're guarding against maliciously large bodies. Blocking I/O is different because your core is partially idle while other hardware is doing async work. Using Node.js' cluster module lets you keep more cores busy. Chunking CPU-limited work increases total CPU time and memory required. (This is a pet peeve of mine and a hill I'm willing to die on :-) .)
marknadal|6 years ago
imtringued|6 years ago
[0] https://github.com/luizperes/simdjson_nodejs/issues/5
luizperes|6 years ago
ksherlock|6 years ago
Q: What happens when you parse "\\" ?
marknadal|6 years ago