(no title)
appeldorian | 2 years ago
He uses the same data structure in both the Python and Rust code, so I imagine that he can get an extra 4x speedup at least if he rewrites his code with memory layout in mind.
appeldorian | 2 years ago
He uses the same data structure in both the Python and Rust code, so I imagine that he can get an extra 4x speedup at least if he rewrites his code with memory layout in mind.
tremon|2 years ago
ohr|2 years ago
I couldn't get into a this in the article (would be too long), but this is a great point and the original library does this in a lot of places.
One problem in our use case is that the actual structs members are pretty big & that we need to group/regroup them a lot.
The fastest approach for us was to do something like in the article for the initial filtering, then build a hashmap of SoAs with the needed data, and do the heavier math on that.
Yoric|2 years ago
snicker7|2 years ago
prirun|2 years ago
It's easy to make intuitive guesses about how things are working that seem completely reasonable. But you have to benchmark because modern CPUs are so complex that reasoning and intuition mostly don't work.
Programs used for testing are below. I ran everything twice because my system wasn't always idle, so take the lower of the 2 runs.
lights0123|2 years ago