top | item 9161366

The ups of downs of porting 50k lines of C++ to Go

218 points| logicchains | 11 years ago |togototo.wordpress.com | reply

105 comments

order
[+] acqq|11 years ago|reply
Do read the answer of the author regarding the performance:

"The throughput of the Go program is quite competitive with the C++ one, although the server’s IO-bound so most of the time is just spent in socket write/read syscalls. The latency is at least an order of magnitude worse, due to Go’s garbage collector, which is amplified by the use of an older Go version. If the server was latency-critical I don’t think it could have been written in Go, at least not until the new GC planned for 1.5 or 1.6 is released (assuming we could upgrade to a newer kernel by the time its released)."

[+] logicchains|11 years ago|reply
Author here. Just a note that by latency-critical, I'm referring to >10 millisecond latencies. If you can tolerate occasional pauses of 400-500 milliseconds, then the GC wouldn't be a problem. Also note that the GC slowness came from having to scan a fairly large heap (a lot of cached stuff); it could be avoided by storing all that off-heap, but I suspect that would complicate the code significantly.

Finally, note that by "at least an order of magnitude worse" I'm comparing it to hyper-optimised C++ that's designed for sub-millisecond latencies, as the C++ server used the same framework used in latency-critical HFT software.

[+] more_original|11 years ago|reply
> Go also forced me to write readable code: the language makes it impossible to think something like “hey, the >8=3 operator in this obscure paper on ouroboromorphic sapphotriplets could save me 10 lines of code, I’d better include it. My coworkers won’t have trouble understanding it as the meaning is clearly expressed in the type signature: (PrimMonad W, PoshFunctor Y, ReichsLens S) => W Y S ((I -> W) -> Y) -> G -> Bool”.

I like this description. It's one of the reasons why I prefer OCaml to Haskell, but I've found it hard to verbalise.

[+] sfk|11 years ago|reply
I'm still trying to force myself to like Go. The basic problem is that Go is too regular for me, which makes it painful to read other people's code.

I don't have this problem with functional languages or C written in a free flowing coding style like djb's.

Is it even possible in Go to have an individual style?

[+] p0nce|11 years ago|reply
> we had for instance a library for interacting with a particular service that was generated from an XML schema, making the code perfectly type-safe, with different functions for each datatype. In many languages that allow compile-time metaprogramming, like C++ and D, IO cannot be performed at compile time, so such schema-driven code generation would not be possible.

Actually this would be possible in D since you can read files at compile-time with the "import(filename)" syntax. Then you can use compile-time parser generators to parse it.

[+] logicchains|11 years ago|reply
I didn't realise that was possible; I've updated the post to reflect that.
[+] inglor|11 years ago|reply
"“hey, the >8=3 operator in this obscure paper on ouroboromorphic sapphotriplets could save me 10 lines of code, I’d better include it. My coworkers won’t have trouble understanding it as the meaning is clearly expressed in the type signature: (PrimMonad W, PoshFunctor Y, ReichsLens S) => W Y S ((I -> W) -> Y) -> G -> Bool”."

This was worth it - the fact a languages idioms are "don't write clever code" is extremely positive.

[+] pjmlp|11 years ago|reply
Specially when reviewing that latest code drop from the off-shoring partner company.
[+] zeekay|11 years ago|reply
I've also found the lack of parametric polymorphism a huge pain point. Type safety is constantly sacrificed to allow for code-reuse and nicer APIs, leading to really awful code using type switching at best, inscrutable amounts of reflection at worst. This seems to plague the Google developers as well, just look at the Go App Engine APIs.
[+] learc83|11 years ago|reply
I built ported part of a distributed system I have in production to Go about 2 years ago. I remember doing a lot of reflection trying to factor out some common SQL code. Like this:

    func CreateRecord(record interface{}) (err error) {
        t := reflect.TypeOf(record)
        v := reflect.ValueOf(record)
Where I named the struct the same thing as the table it mapped to.
[+] coliveira|11 years ago|reply
The lack of polymorphism and parameterized types makes Go the C of 2010s. This practically means that in a few years we will have someone, somewhere, creating Go++ and the story will repeat itself.
[+] humanrebar|11 years ago|reply
Go supports several types of runtime polymorphism, including interfaces and closures. It just doesn't support polymorphism through OO-style inheritance.

C supports runtime polymorphism, for that matter, it just requires a bit of boilerplate to set up and use function pointers and tagged dispatch.

You might be right about Go++ (ObjectiveGo?) being inevitable, though.

[+] proveanegative|11 years ago|reply
Swift looks a bit like "Go++" to me, at least on a purely syntactic level. I expect Apple to open source it sooner rather than later, at which point it will become a viable option to develop web-based services in.
[+] fit2rule|11 years ago|reply
>>Simple, regular syntax. When I found myself desiring to add the name of the enclosing function to the start of every log string, an Emacs regexp find-replace was sufficient, whereas more complex languages would require use of a parser to achieve this.

It would be really wonderful to have a series of tutorials on this subject. It might be a good reason for me to learn to use Emacs, anyway ..

[+] jqgatsby|11 years ago|reply
If you are already using emacs, I recommend setting regular expression find/replace as your default search mode. Ctrl-s, etc, so it becomes second nature.
[+] Dewie|11 years ago|reply
Another way to use search in Emacs is to go to a place that you need to edit/insert text. Instead of using the movement commands or the mouse, use forward or backwards search to search for the point you need to be at: like "(str" in "(String s) ...".

I don't know if it is faster for me. But it can feel more ergonomic, since you don't have to make so much effort into going to a specific point. I'm getting better at it, though: maybe soon it will become second-nature.

[+] stcredzero|11 years ago|reply
No inheritance. I’ve personally come to view inheritance-based OO as somewhat of an antipattern in many cases, bloating and obscuring code for little benefit

So everywhere you would use inheritance, you use composition instead? The stuff you'd have stuck in the superclass, you stick somewhere else and stick in your struct?

[+] autarch|11 years ago|reply
> So everywhere you would use inheritance, you use composition instead? The stuff you'd have stuck in the superclass, you stick somewhere else and stick in your struct?

No, Go provides interfaces (aka traits or roles), which you can use to share composable functions (aka methods).

[+] tapirl|11 years ago|reply
I feel many of pros and cons listed in this article are not related to the porting at all.

btw, I think go is really not a replacement of c++. In my experience, go is more a replacement of java to improve the development speed.

[+] logicchains|11 years ago|reply
The pros and cons are all based on what was learned from the porting process. Unfortunately the confidential nature of the software prevents discussing it in greater detail.

Go can be a replacement for C++ for programs that didn't need to be written in C++. There aren't many programs like that around nowadays however, as most of the time C++ is only used when it's really necessary, such as for extremely latency-sensitive applications or applications requiring precise memory/allocation control.

[+] hitlin37|11 years ago|reply
i think in the long run, having to maintain a code that is readable helps a lot. even though c++ itself is easy to follow, it starts to get complicated once you get deeper and deeper in oo where everything is inherited from something else. this is one thing i like in python modules. they are highly readable. and then write something in Cython if its time critical. same with Go, the code feels very clean and easy to maintain. i haven't done parametric polymorphism in c++, so no idea about it.
[+] ayrx|11 years ago|reply
> and then write something in cpython if its time critical

I believe you mean Cython? :)

[+] kakakiki|11 years ago|reply
"Since one of my reasons for getting into programming was the opportunity to get paid to use Emacs, this is definitely a huge plus."

Wow! Wish I could say the same!

[+] blt|11 years ago|reply
"one of my reasons for getting into auto repair was the opportunity to get paid to use Snap-On tools"

I feel text editor loyalty too, but this statement does feel a bit strange :)

[+] xjia|11 years ago|reply
Please, use Dialyzer for Erlang.

BTW, I don't know Go, but Erlang has per-process GC, so there won't be a large heap to scan.

[+] masklinn|11 years ago|reply
Go doesn't have per-process GCs, because goroutines share memory. Structures are not copied or moved across channels, a pointer to the structure is copied and both sender and receiver get access to the same object in memory.
[+] Kiro|11 years ago|reply
I'm a PHP programmer. Can someone explain why the lack of parametric polymorphism is a big deal?
[+] one-more-minute|11 years ago|reply
Say you need a Vector2D type:

    struct Vector2D
      int x
      int y
  
Except, hold on, I have a routine that needs floats. In a dynamic language, I'd leave off the type hints; with a decent type system I'd parametise the `int` type; in Go I have to reimplement the whole type:

    type Vector2DFloat
       float x
       float y
Lather, rinse and repeat for complex numbers, vectors of vectors, etc. The only way around this is (a) to use the `interface{}` type (in which case you're just using a very verbose dynamic language) or (b) to rely on lots of text-based code generation.
[+] shawn-butler|11 years ago|reply
What is the fascination of software people with LOC as a code measure / metric?

It seems indicative of nothing, not quality, especially not readability nor maintainability.

I've never understood this apart from the very early days of punch cards and memory/storage limitations which placed physical limitations on computation.

[+] frostmatthew|11 years ago|reply
> It seems indicative of nothing, not quality, especially not readability nor maintainability.

It's not meant to be indicative of any of those things (though I'd argue, all else being equal, maintaining an application with more LOC is harder than one with less).

It's indicative of complexity and scale. A programmer can read through an entire 500 LOC program and will know everything about the program. This becomes much more difficult for a 50K LOC program and outright impossible for a 5 million LOC program.

Taking a 50K LOC program and bringing it down to 10K (regardless if that's by refactoring, removing unneeded code, or rewriting in a new language) makes it much easier for each developer to know/understand a larger portion of the program.

Prior to my current job I had only worked on relatively small applications (<25K LOC) and I was blown away by the difference between working on things like that and working on something measured in the millions.

[And in this specific context I doubt it would be on the front page of HN if somebody took 500 lines of C++ and rewrote them in a hundred lines of Go, i.e. knowing the LOC is useful to determine if this was a meaningful undertaking or not]

[+] lomnakkus|11 years ago|reply
AFAIR one of the few things that software engineering research has consistently shown is the bug-count is correlated with LOC, esp. with churn, i.e. number of lines changed. (I can't recall exactly how robust the effect is, but I remember reading about it in "Making Software: What Really Works, and Why We Believe It" and you can probably find a few cites in there.)
[+] dr_zoidberg|11 years ago|reply
Writing 50k lines of code takes a lot of time, and every line has a chance of spawning a bug or undesired behaviour. Of course it's not a perfect metric of a system, and of course a line of code written by some experienced C guru will be completely different from one written by a newcomer to the language. But it is also true that the less lines of code, the less potential points of failure in a program.

Also, if you were given the task to read and analyze a piece of code, you'd surely wish it were shorter! As long as the software performs the desired tasks and readability is cared for, shorter tend to be better.

[+] coliveira|11 years ago|reply
Most software engineers still view software construction as the task of manipulating text files. That is why they have such a fascination with lines of code. People coming from a different programming tradition such as SmallTalk or APL know that this makes no sense.
[+] Animats|11 years ago|reply
"It also allows parallel/async code to be written in the exact same way as concurrent code, simply by setting GOMAXPROCS to 1."

Aargh! If your code has race conditions with GOMAXPROCS > 1, it's broken. "Share by communicating, not by sharing". (Ignore the bad examples in "Ineffective Go". Send the actual data over the channel, not a reference to it. Don't try to use channels as a locking mechanism.)

[+] faragon|11 years ago|reply
What's the point of using GC in high performance software? Seriously. In my opinion, it makes no sense.