top | item 47179098

(no title)

galdauts | 4 days ago

Thank you for the article! We're mainly interested in floating-point performance and energy consumption w/r/t to solving differential equations and tridiagonal systems of equations, while running on a 128-core compute node. Our current results will likely only be presented in May, but here are last year's results: https://www.cs.uni-potsdam.de/bs/research/docs/papers/2025/l...

Our Julia code is parallelised with FLoops.jl, but so far Numba has shown surprising performance benefits when executing code in parallel, despite being slower when executed sequentially. Therefore I can imagine that Julia might yield better results when run in a regular desktop environment.

discuss

order

Alexander-Barth|3 days ago

Are you using this code for Julia?

https://github.com/JuliaParallel/rodinia/tree/master/julia_m...

It was touched 9 years ago, but maybe you have ported it to current standards. I don't think we had multithreading at that time, only multiprocessing.

Is your Julia implementations available somewhere? (Sorry if it is in your paper but I missed it). I vaguely remembered in the past that working with threads leaded to some additional allocations (compared to the serial code). Maybe this is also biting us here?

ChrisRackauckas|4 days ago

Are you using Polyester.jl? Large numbers of threads are not optimized with Base threads usage due to GC interactions + the hierarchical threading adds overhead vs "unsafe" thread techniques which don't support the worksharing. Polyester is thus required to get very low overhead threading matching performance of non-worksharing scenarios.

jabl|4 days ago

I have a small benchmark program doing tight binding calculations of carbon nanostructures that I have implemented in C++ with Eigen, C++ with Armadillo, Fortran, Python/numpy, and Julia. It's been a while since I've tested it but IIRC all the other implementations were about on par, except for python which was about half the speed of the others. Haven't tried with numba.

To bring Julia performance on par with the compiled languages I had to do a little bit of profiling and tweaking using @views.

https://gitlab.com/jabl/tb

jondea|4 days ago

The JuliaParallel/rodinia repo says that the focus of those benchmarks is the CUDA versions. I suspect that the CPU versions have not had much optimization effort spent on them. Julia isn't a magic wand, but you can usually get within a factor of 2 of C++ with similar effort.

dandanua|4 days ago

Cluster environment with virtualized cores may cause slower performance of Julia's parallel code. People recommend Threadpinnig.jl to solve the issues.

Certhas|4 days ago

That really seems very unlike what everyone else is seeing. There really is no reason why Julia should be slower than numba...