top | item 45114950

(no title)

aengelke | 6 months ago

There's a longer paragraph on that topic in Section 8. We also previously built an LLVM back-end using that approach [1]. While that approach leads to even faster compilation, run-time performance is much worse (2.5x slower than LLVM -O0) due to more-or-less impossible register allocation for the snippets.

[1]: https://home.cit.tum.de/~engelke/pubs/2403-cc.pdf

discuss

debugnik|6 months ago

> run-time performance is much worse (2.5x slower than LLVM -O0)

How come? The Copy-and-Patch Compilation paper reports:

> The generated code runs [...] 14% faster than LLVM -O0.

I don't have time right now to compare your approach and benchmark to theirs, but I would have expected comparable performance from what I had read back then.

aengelke|6 months ago

The paper is rather selective about the used benchmarks and baselines. They do two comparisons (3 microbenchmarks and a re-implementation of a few (rather simple) database queries) against LLVM -- and have written all benchmarks themselves through their own framework. These benchmarks start from their custom AST data structures and they have their own way of generating LLVM-IR. For the non-optimizing LLVM back-end, the performance obviously strongly depends on the way the IR is generated -- they might not have put a lot of effort into generating "good IR" (=IR similar to what Clang generates).

The fact that they don't do a comparison against LLVM on larger benchmarks/functions or any other code they haven't written themselves makes that single number rather questionable for a general claim of being faster than LLVM -O0.

t0b1|6 months ago

This is in relation to their TPCH benchmark which can be due to a variety of reasons. My guess would be that they can generate stencils for whole operators which can be transformed into more efficient code at stencil generation time while LLVM-O0 gets the operator in LLVM-IR form and can do no such transformation. Though I can't verify this because their benchmark setup seems a bit more involved.

When used in a C/C++ compiler the stencils correspond to individual (or a few) LLVM-IR instructions which then leads to bad runtime performance. Also as mentioned, on larger functions register allocation becomes a problem for the Copy-and-Patch approach.

procrast33|6 months ago

Apologies! I did do a text search, but in pdfs... I should have known better.

Your work is greatly appriciated. With unit tests everywhere, faster compiling is more important than ever.

PoignardAzur|6 months ago

Wait, so what's the difference between TDPE and Copy-and-Patch?

I thought they used the same technique (pre-generating machine code snippets in a high-level language)?