top | item 17284238

(no title)

RyanZAG | 7 years ago

Kind of defeats the purpose of using golang for a task like this. The whole point of golang is using the little greenlet threads, but actually using them in this case is terrible on performance.

The remaining performance left behind is all in memory allocation and garbage collection - something you could optimize relatively easily if it were written in C. Such as by using a memory pool, so that you wouldn't need allocations or garbage collection at all.

Of course if performance isn't a big issue for your task, then none of this is really important.

discuss

order

_ph_|7 years ago

Using Go doesn’t mean you have to use many goroutines or must not do some manual memory management where it is the right thing to do.

This article nicely shows how optimizing your program yields more speed than randomly throwing goroutines at it. Finally it does use goroutines for a good effect, but after proper consideration.

coldtea|7 years ago

>Kind of defeats the purpose of using golang for a task like this. The whole point of golang is using the little greenlet threads, but actually using them in this case is terrible on performance.

The point of Golang is using them intelligently, not merely throwing at any problem like all you've got is a hammer...

iainmerrick|7 years ago

It's surprising that the per-file Goroutines were so expensive, though. (The original per-line Goroutine, sure, that's excessive if you care about performance.) Just using long-lived workers seems non-idiomatic for Go, but it certainly pays big dividends in this example.

jerf|7 years ago

Per-file may have had other problems not related to the Go runtime, such as IO contention. I'm not going to check it, but it would be easy to verify that just by using a limited number of them at a time. Spawning a new goroutine in that case is not strictly necessary, but would still be good software engineering.

One of the problems I see repeatedly when people try to benchmark things with concurrency is when they don't specify a problem that is CPU-intensive enough, so it ends up blocked on other elements of the machine. For a task like this, I'd expect optimized Go to easily keep up with a conventional hard drive, and with just a bit of work, come within perhaps a factor of 2 or 3 of keeping up with the memory bandwidth on a consumer machine (including the fact that since you're going to read a bit, then write some stuff, you're not going to get the full sequential read performance out of your RAM), not because Go is teh awesomez but because the problem isn't that hard. To get big concurrency wins, you need a problem where the CPU is chewing away at something but isn't constantly hitting RAM or disk or network for it, such that those systems become the bottleneck.

RyanZAG|7 years ago

Yes, it does feel that something is wrong somewhere, but I can't find out where. Nobody would be using the idiomatic goroutine-per-task with that kind of overhead, yet it's one of the most common building blocks of golang projects.

val_deleplace|7 years ago

Hi iainmerrick, just for info the measured per-file cost didn't include reading from the filesystem. Only the in-memory parsing was taken into account.

weberc2|7 years ago

I write single-threaded Go all the time, and you can use most of the same optimizations in Go that you could use in C (including pools). It's pretty easy to opt out of GC in Go. And you still keep all of the other benefits of using a modern, higher-level language (security, memory safety, straightforward tooling, etc).

emperorcezar|7 years ago

It only does because the author is testing for their specific environment. At some number of cores, the concurrent calls will produce more performant code than running sequentially.

Ideally, in this case I would think one would want to check the number of cores and decide what route to take.

scott_s|7 years ago

When the author removed parallelism the first time, I don't think this is the case. Running things in parallel has a cost. That cost often comes in the form of memory allocations and data copies so the unit of work can be stored and shared with another thread, and the synchronization costs of scheduling threads. If that aggregate cost is greater than the computational cost of what you're computing, you'll never win.

For the point at which the author removed parallelism, and the sequential code was faster, I think this was the case. The computation was too fine-grain. The author successfully took advantage of parallelism by applying it at a coarser granularity; each thread did more work. At this point, the author also does tune the solution for the execution environment, as he uses a fixed set of go-routines to process a bunch of messages rather than one go-routine per message.