Interesting article. The article mentions "...the NumPy implementation illustrates a marked improvement over the naive algorithm...", but I couldn't find a NumPy implementation in the article.
Yes, they are really great at abstracting the SIMD operations, but the abstraction has only very few common methods. I'm not sure how much real world benefits those abstractions have.
Once you need more complex operations, you need to use the specific operations from System.Runtime.Intrinsics.(X86|ARM) based on the current architecture. And you need to adjust your implementation on the CPUs capabilities. There are still a lot of older x64 CPUs around that don't have AVX512 for example.
andix|1 year ago
Once you need more complex operations, you need to use the specific operations from System.Runtime.Intrinsics.(X86|ARM) based on the current architecture. And you need to adjust your implementation on the CPUs capabilities. There are still a lot of older x64 CPUs around that don't have AVX512 for example.