top | item 9014200

Go concurrency isn't parallelism: Real world lessons with Monte Carlo sims

57 points| soroushjp | 11 years ago |soroushjp.com | reply

52 comments

order
[+] bkeroack|11 years ago|reply
TLDR: Adjust GOMAXPROCS if you want a speedup from multiple goroutines.

http://golang.org/pkg/runtime/

"GOMAXPROCS sets the maximum number of CPUs that can be executing simultaneously and returns the previous setting. If n < 1, it does not change the current setting. The number of logical CPUs on the local machine can be queried with NumCPU. This call will go away when the scheduler improves."

It will be nice when this requirement is eliminated.

[+] rakoo|11 years ago|reply
Actually the most interesting part of the article is not about using GOMAXPROCS, but exactly how using GOMAXPROCS doesn't automagically turn your program into a parallel one: only a real analysis (helped with the helful profiler) tells you whether you've achieved true parallelism. The author's program wasn't truly parallel until he saw the bottleneck with the global mutex lock.

Said differently: concurrency is easy ("just" go func() all the things), parallelism is hard (GOMAXPROCS is not enough, you'll have to go deeper)

[+] dingdingdang|11 years ago|reply
Yup, here's what I normally use for these situations (posted it in disqus too at end of article):

// Initialize to use all available CPU cores

func init() {

   runtime.GOMAXPROCS(runtime.NumCPU())

}
[+] spullara|11 years ago|reply
I think that the default of single threaded is going to bite them hard in the long run. I've already discovered libraries that were never tested with MAXGOPROCS > 1 that are not thread safe. They should default it to at least 2 to make sure these bugs are shaken out.
[+] bdarnell|11 years ago|reply
The go race detector (go test -race) is actually really good at finding these kinds of issues, regardless of GOMAXPROCS. I've gotten a lot more value from running my tests with the race detector in a single process than running the tests with multiple processes and hoping to encounter thread-safety issues in a debuggable way.
[+] sarnowski|11 years ago|reply
Actually that is exactly the case in a library of mine[0]. Its not a bug of my code directly but due to non-POSIX compliance of Linux that triggers only with multiple threads (setuid does set the uid only for the executed thread and not for the others of the same process - unlike the manual page and POSIX states). Its a cornercase but I also explicitly raise the GOMAXPROCS in the test case to trigger it[1].

[0] https://github.com/sarnowski/mitigation

[1] https://github.com/sarnowski/mitigation/blob/master/mitigati...

[+] BuckRogers|11 years ago|reply
You should assume Go's creators thought of this. There are performance implications, as it adds overhead if it's not necessary. Most people when writing libraries will decide if it's necessary or useful enough to use CPU multithreading. Single core is common in cloud servers. This is a really bad idea that will never happen.
[+] dilap|11 years ago|reply
Yes! They know it's bad (the documentation says something like "we'll do something smarter later"), but even just a default of 2 would be much better IMO.
[+] tshadwell|11 years ago|reply
Isn't this a misleading title? The article is essentially the author forgetting to set GOMAXPROCS, not really a lack of parallelism in Go.
[+] chinpokomon|11 years ago|reply
I didn't take it as a mistake in forgetting. I think this was more of an experiment in understanding. Clearly the author was aware of previous Parallelism vs. Concurrency discussions and this was just an applied test to witness the difference first hand.
[+] soroushjp|11 years ago|reply
tshadwell, the article was meant to help people who don't know even know about GOMAXPROCS. People are equating concurrency and parallelism, and the title is a reference to Rob Pike's talk concerning this exact topic.
[+] EugeneOZ|11 years ago|reply
Same title has presentation of Rob Pike.
[+] intortus|11 years ago|reply
Another error the author made is adding to a sync.WaitGroup in a different goroutine than the one that waits. This is another rookie mistake that go test -race would probably catch.
[+] Twirrim|11 years ago|reply
It does indeed catch it

  $ go test -race -bench=.
  
  ...
  
  WARNING: DATA RACE
  Write by goroutine 4:
    sync.raceWrite()
        /usr/lib/go/src/pkg/sync/race.go:41 +0x35
    sync.(*WaitGroup).Wait()
        /usr/lib/go/src/pkg/sync/waitgroup.go:120 +0x16d
    _/home/twirrim/monte.GetPiMulti()
        /home/twirrim/monte/monte.go:56 +0x23a
    _/home/twirrim/monte.BenchmarkGetPiMulti()
        /home/twirrim/monte/monte_test.go:17 +0x62
    testing.(*B).runN()
        /usr/lib/go/src/pkg/testing/benchmark.go:119 +0xc0
    testing.(*B).launch()
        /usr/lib/go/src/pkg/testing/benchmark.go:207 +0x1ba
  
  ...
And so on.
[+] Strom|11 years ago|reply
Indeed. Additionally, waitgroups aren't even needed, the channel usage in the code is already enough sync.
[+] tux1968|11 years ago|reply
OT: The way we use the terms parallel and concurrent in computer science seems completely backward to me. The dictionary says "concurrent" means at the same time, and parallel lines need not be drawn at the same moment...

Yet in CS we talk of things being concurrent even if they're executed as cooperative threads on a single core and parallel only applies when executing concurrently (at the same time).

[+] freyr|11 years ago|reply
Perhaps concurrent could be used in place of parallel, but parallel could not be swapped used in place of concurrent.

Parallel lines are non-intersecting lines, i.e. lines traveling in identical directions. This is a nice allusion to the way parallelism works by running identical processes that do not interact. The fact that these processes can run simultaneously is a by-product of their parallel structure.

But yeah, the term concurrent is confusing because it can be applied to things that never actually overlap in time. But I can't think of a better term off the top of my head.

[+] replicant|11 years ago|reply
Unrelated question, isn't it a bad idea to update the seed for every sample?
[+] danbruc|11 years ago|reply
It is in multiple ways.

First, you are wasting cycles and with so little work to do before reseeding as in the code shown it probably matters quite a bit.

Second, some random number generators need some warm-up time producing lower quality random numbers at the beginning.

Third, if you are reseeding faster than your seed changes, you will repeatedly consume the same sequence of random numbers. I am not sure what the resolution of Now() is, but unless it is on the order of nanoseconds this will heavily affect the shown implementation.

If the resolution is one millisecond and it took 15 seconds to execute on a single thread, then the generated random values changed only every 74 iterations.

[+] xeno42|11 years ago|reply
Additionally - Pushing the seed operation out of the loop decreased the run time for 1,000,000 samples from ~10 seconds to ~50ms on my machine.
[+] reikonomusha|11 years ago|reply
It is wasteful and nullifies the benefits offered by a PRNG.

For Monte Carlo simulations, it's in fact very bad practice to continually reseed a computation, as it makes them unrepeatable.

[+] acadien|11 years ago|reply
Updating the seed is extra pointless operations but if your PRNG is 'good' it shouldn't cause major numerical problems.

Besides the author is only sampling the PRNG 1 million times, this is hardly enough to stumble upon any periodicity in a modern PRNG. I have absolutely no idea if the PRNG provided by Go is any good or what method it is based off of.

[+] SixSigma|11 years ago|reply
No-one has ever claimed it is, in fact they specifically tell you it isn't, multiple times

A Jan 2013 Go lang blog post

http://blog.golang.org/concurrency-is-not-parallelism

reminding people of the Jan 2012 talk Rob Pike did on the subject

https://talks.golang.org/2012/waza.slide

Feb 2011 : Rob Pike on Parallelism and Concurrency in Programming Languages

http://www.infoq.com/interviews/pike-concurrency

I'll skip all the intermediate steps from there back to

Tony Hoare

http://www.usingcsp.com/

[+] soroushjp|11 years ago|reply
Yep, and the title is a reference to Rob's talk on this.

I refer to it in the post:

"Rob Pike, one of the creators of the Go language, dedicates an entire talk, "Concurrency is not Parallelism" to this ...."

Wrote the post as a real world example to demonstrate Rob's point.

[+] cubano|11 years ago|reply
I have just started a Go project for a core aspect of my business, so this information was well timed, for me at least, and gave me a quick overview of concurrency/parallelism in Go.

I will be developing both concurrent and parallel threads in my app, so this was very enlightening.

Thanks to the author for his clear writing style and efforts to educate.

[+] pbnjay|11 years ago|reply
Its a decent intro, but I think instead of just jumping into parallel code, its better to read the docs before hand. There are plenty of references to GOMAXPROCS and the thread safety of math/rand (and most of the stdlib).
[+] soroushjp|11 years ago|reply
Completely agree, nothing beats the docs.
[+] omni|11 years ago|reply
This is a somewhat trivial suggestion, but it would be much more clear to your readers what the speedup was if you aligned the values in the benchmark results.
[+] nickbauman|11 years ago|reply
Parallelism is like hunting elephants. The languages we use like Java, Ruby, Python and C++, for example, give you weapons for the hunt on the level of a very strong toothpick at best. Go has apparently upgraded the situation to a 3" pocket knife. Clojure to the level of a proper spear. But we need languages that allow us to completely avoid hunting elephants in the first place.
[+] soroushjp|11 years ago|reply
Thanks for all the feedback everyone. Incorporated everything to make the code as good as possible, want readers to learn as much as possible. Just to be clear, this article was my way of highlighting Rob Pike's point that concurrency isn't parallelism, not to make it seem as if Go didn't support or was falsely claiming true parallelism.
[+] wpeterson|11 years ago|reply
It looks like you didn't allow for using more than a single process by setting GOMAXPROCS.

Additionally, it looks like you're re-seeding your random engine inside your sample loop, which is very slow. You only need to seed the engine outside the loop at the beginning.

[+] soroushjp|11 years ago|reply
Fixed the issue, thanks wpeterson
[+] zzzcpan|11 years ago|reply
I wouldn't call this "real world". In a real world you are better off distributing these kind of tasks across multiple machines. Multicore parallelism per se is overrated and overhyped.