top | item 40520999

(no title)

felixge | 1 year ago

Thanks! And to answer you question: No, it won't speed up Go programs for now. This was mostly a fun research project for me.

The low hanging fruits to speed up stack unwinding in the Go runtime is to switch to frame pointer unwinding in more places. In go1.21 we contributed patches to do this for the execution tracer. For the upcoming go1.23 release, my colleague Nick contributed patches to upgrade the block and mutex profiler. Once the go1.24 tree opens, we're hoping to tackle the memory profiler as well as copystack. The latter would benefit all Go programs, even those not using profiling. But it's likely going to be relative small win (<= 1%).

Once all of this is done, shadow stacks have the potential to make things even faster. But the problem is that we'll be deeply in diminishing returns territory at that point. Speeding up stack capturing is great when it makes up 80-90% of your overhead (this was the case for the execution tracer before frame pointers). But once we're down to 1-2% (the current situation for the execution tracer), another 8x speedup is not going to buy us much, especially when it has downsides.

The only future in which shadow stacks could speed up real Go programs is one where we decide to drop frame pointer support in the compiler, which could provide 1-2% speedup for all Go programs. Once hardware shadow stacks become widely available and accessible, I think that would be worth considering. But that's likely to be a few years down the road from now.

discuss

order

aerfio|1 year ago

Do you think/know of any areas in Go codebase that would enable jump in performance bigger than e.g 10%? I'm very grateful for any work done in Go codebase, for me this language is plenty fast, I'm just curious what's the state of Go internals, are there any techniques left to speed it up significantly or some parts of codebase/old architectures holding it back? And thank you for your work!

felixge|1 year ago

I don't think any obvious 10%+ opportunities have been overlooked. Go is optimizing for fast and simple builds, which is a bit at odds with optimal code gen. So I think the biggest opportunity is to use Go implementations that are based on aggressively optimizing compilers such as LLVM and GCC. But those implementations tend to be a few major versions behind and are likely to be less stable than the official compiler.

That being said, I'm sure there are a lot of remaining incremental optimization opportunities that could add up to 10% over time. For example a faster map implementation [1]. I'm sure there is more.

Another recent perf opportunity is using pgo [2] which can get you 10% in some cases. Shameless plug: We recently GA'ed our support for it at Datadog [3].

[1] https://github.com/golang/go/issues/54766 [2] https://go.dev/doc/pgo [3] https://www.datadoghq.com/blog/datadog-pgo-go/

neonsunset|1 year ago

Go limitation is it’s a high-level language with a very simple compiler where providing true zero-cost abstractions (full monomorphization rather than GC stenciling) and advanced optimizations is a bridge it wouldn’t cross because it means much greater engineering effort spent on the compiler and increasing LOCs by a factor of 5, especially if a compiler wants to preserve its throughput.

Though I find it unfortunate that the industry considers Go as a choice for performance-sensitive scenarios when C# exists which went the above route and does not sacrifice performance and ability to offer performance-specific APIs (like crossplat SIMD) by paying the price of higher effort/complexity compiler implementation. It also does in-runtime PGO (DynamicPGO) given long-running server workloads are usually using JIT where it's available, so you don't need to carefully craft a sample workload hoping it would match production behavior - JIT does it for you and it yields anything from 10% to 35% depending on how abstraction-heavy the codebase is.

dolmen|1 year ago

Reminder: the Go team consider optimizations in code generation only as far the the compiler is kept fast. That's why the Go compiler doesn't have as many optimizations phases as C/C++/Rust compilers.

As a developer I like that approach as it keeps a great developer experience and helps me stayed focus and gives me great productivity.