top | item 46705109

(no title)

shihab | 1 month ago

Hi, I just wanted to note that e3nn is more of an academic software that's a bit high-level by design. A better baseline for comparison would be Nvidia's cuEquivariance, which does pretty much the same thing as you did- take e3nn and optimize it for GPU.

As a HPC developer, it breaks my heart how worse academic software performance is compared to vendor libraries (from Intel or Nvidia). We need to start aiming much higher.

discuss

bee_rider|1 month ago

I took a lot longer than I should have to finish my PhD because I wanted to beat well written/properly used vendor code. I wouldn’t recommend it, TBH.

It did make my defense a lot easier because I could just point at the graphs and say “see I beat MKL, whatever I did must work.” But I did a lot of little MPI tricks and tuning, which doesn’t add much to the scientific record. It was fun though.

I don’t know. Mixed feelings. To some extent I don’t really see how somebody could put all the effort into getting a PhD and not go on a little “I want to tune the heck out of these MPI routines” jaunt.

shihab|1 month ago

To be practically useful, we don't need to beat vendors, just getting close would be enough, by the virtue of being open-source (and often portable). But I found, as an example, PETSc to be ~10x slower than MKL on CPU and CUDA on GPU; It still doesn't have native shared memory parallelism support on CPU etc.

PerryStyle|1 month ago

I would love to do this in the future, but knowing me I’d get caught up making sure I’m benchmarking properly then actually writing code.

teddykoker|1 month ago

OpenEquivariance [1] is another good baseline for with kernels for the Clebsch-Gordon tensor product and convolution, and it is fully open source. Both kernel implementations have been successfully implemented into existing machine learning interatomic potentials, e.g. [2,3].

[1] https://github.com/PASSIONLab/OpenEquivariance

[2] https://arxiv.org/abs/2504.16068

[3] https://arxiv.org/abs/2508.16067

physicsguy|1 month ago

> As a HPC developer, it breaks my heart how worse academic software performance is compared to vendor libraries (from Intel or Nvidia). We need to start aiming much higher.

They're optimising for different things really.

Intel/Nvidia have the resources to (a) optimise across a wide range of hardware in their libraries (b) often use less well documented things (c) don't have to make their source code publicly accessible.

Take MKL for example - it's a great library, but implementing dynamic dispatch for all the different processor types is why it gets such good performance across x86-64 machines, it's not running the same code on each processor. No academic team can really compete with that.

shihab|1 month ago

I'm not asking an academic program first published 8 year ago (e3nn) to beat actively developed CuEquivariance library. An academic proposing new algorithms doesn't need to worry too much about performance. But any new work which focuses on performance, that includes this blog and a huge number of academic papers published every year, should absolutely use latest vendor libraries as baseline.

rapatel0|1 month ago

I think this is the difference between research and industry. Industry should try to grind out obvious improvements through brute force iteration. I really wish the culture of academia was more of an aim towards moonshots (high risk, high reward).

geremiiah|1 month ago

cuEquivariance is unfortunately close sourced (the acutal .cu kernels), but OP's work is targetting a consumer GPU and also a very small particle system so its hard to compare, anyway.