(no title)
heisig
|
2 years ago
Thanks for clarifying. I will definitely write down the specifics of setting up distributed computing once it works. However, support for distributed computing will still take some time. The current step is to iron out all the remaining issues of parallelizing within one CPU socket.
semi-extrinsic|2 years ago
To the best of my knowledge, shared memory approaches have been mostly abandoned in the HPC community. It seems none of the codes that went hybrid MPI+OpenMP for example, ever saw substantial performance benefit over pure MPI. At least not enough to justify the increased code complexity. If you search for "hybrid MPI/OpenMP" on Google Scholar you'll see most results are 10-20 years old.
Part of the reason for this is that on modern CPU cores with the amount of cache available, you typically want to keep at least something like 200 000 degrees of freedom per core. That's e.g. a 36^3 grid for u,v,w,p if you're doing fluid mechanics. Then the amount to communicate per core is just 8% of the total data. Furthermore you can easily do other work like compute auxiliary variables while you are waiting on communication.
I will also say that it feels a bit weird to call something "peta-" and "HPC" if using more than one socket is relatively far off into the future. For the randomly-wandering PhD students out there, it would be nice to tell them this up front in the Readme :)
heisig|2 years ago
This is no special casing. Most of that code will also be used for the distributed parallelization. I agree with your remarks on hybrid MPI+OpenMP, and, in fact, Petalisp doesn't use shared memory anywhere, but always generates ghost layers and explicit communication.
> I will also say that it feels a bit weird to call something "peta-" and "HPC" if using more than one socket is relatively far off into the future. For the randomly-wandering PhD students out there, it would be nice to tell them this up front in the Readme :)
I can do that. But let me explain the rationale for naming Petalisp this way: I wanted to create a programming language that is novel, and that has the potential to scale to petaflop systems. And I wanted to create a robust implementation of that language. I think I have achieved the former part, but the latter simply takes time.
Final remark: Good HPC practice is always to get single core performance right, then getting multicore performance right, and then scaling up to multiple nodes. Anything else is an enormous waste of electricity.