top | item 6165214

(no title)

dlsspy | 12 years ago

Did you consider running the go client against the scala server and vice versa?

Also, that's kind of a lot of code. Here's my rewrite of the server: http://play.golang.org/p/hKztKKQf7v

It doesn't return the exact same result, but since you're not verifying the results, it is effectively the same (4 bytes in, 4 bytes back out). I did slightly better with a hand-crafted one.

A little cleanup on the client here: http://play.golang.org/p/vRNMzBFOs5

I'm guessing scala's hiding some magic, though.

I made a small change to the way the client is working, buffering reads and writes independently (can be observed later) and I get similar numbers (dropped my local runs from ~12 to .038). This is that version: http://play.golang.org/p/8fR6-y6EBy

Now, I don't know scala, but based on the constraints of the program, these actually all do the same thing. They time how long it takes to write 4 bytes * N and read 4 bytes * N. (my version adds error checking). The go version is reporting a bit more latency going in and out of the stack for individual syscalls.

I suspect the scala version isn't even making those, as it likely doesn't need to observe the answers.

You just get more options in a lower level language.

discuss

jongraehl|12 years ago

I think you're on the right track in supposing that there can't be a huge performance difference in such a simple task, given that both languages are compiled and reasonably low-level. The most plausible explanation would amount essentially to a misconfigured library, not a fundamental advantage due to say, advanced JVM JIT. Your suggestion to try server-{a,b} x client-{a,b} is also a good one.

Your modified Go server doesn't return "Pong" for "Ping". It returns "Ping". And the "a small change" version is nonsense. It's fundamentally different. - you're firing off all your requests before waiting for any replies, and so hiding the latency in the more common RPC style request-response chain, which is a real problem.

You speculate a lot ("hiding some magic" "likely doesn't need to observe the answers") when you haven't offered any insight.

EDIT: Nagle doesn't matter here - it doesn't delay any writes once you read (waiting server response). It only affects 2+ consecutive small writes (here I'm trusting http://en.wikipedia.org/wiki/Nagle's_algorithm - my own recollection was fuzzy). If Go sleeps client threads between the ping and the read-response call then I suppose it would matter (but only a little? and other comments say that Go defaults to no Nagle alg. anyway).

bsdetector|12 years ago

> The most plausible explanation would amount essentially to a misconfigured library, not a fundamental advantage due to say, advanced JVM JIT.

Really, the most plausible explanation? I'd say the most plausible explanation is that M:N scheduling has always been bad at latency and fair scheduling. That's why everybody else abandoned it when that matters. It's basically only good for when fair and efficient scheduling doesn't matter, like maths for instance, which is why it's still used in Haskell and Rust. I wouldn't be surprised to see Rust at least abandon M:N soon though once they start really optimizing performance.

shin_lao|12 years ago

> The most plausible explanation would amount essentially to a misconfigured library, not a fundamental advantage due to say, advanced JVM JIT.

Configuration rarely impacts such trivial cases. I would rather bet on a thread affinity or page locality.

dlsspy|12 years ago

> Your modified Go server doesn't return "Pong" for "Ping".

The program doesn't read the result, so it doesn't matter. Returning Pong isn't harder, but why write all that code if it's going to be ignored anyway?

> It's fundamentally different. - you're firing off all your requests before waiting for any replies, and so hiding the latency in the more common RPC style request-response chain, which is a real problem.

As I said, the program isn't correlating the responses with the requests in the first place -- or even validating it got one. I don't know scala, but I've done enough benchmarking to watch even less sophisticated compilers do weird things with ignored values.

I made a small change that produced semantically the same program (same validation, etc...). It had similar performance to the scala one. If you don't think that's helpful, then add further constraints.