It’s mostly coming from using the Arm NEON intrinsics, not much magic. While working on the library, I was shocked to see how under-vectorized LibC is on Arm. A lot of improvement potential beyond strings.
Amazon, Microsoft, Nvidia, Ampere, Apple, Qualcomm, and all the other Arm-based CPU vendors should really consider investing more into the ecosystem. The hardware is very capable, they shouldn’t be losing against x86 in so many benchmarks…
I'd say that SIMD and even moreso CPU internals knowledge is not quite common and upmost performance is I think not among the highest priority goals in libc/libc++/libstdc++. The ones who need it will implement it themselves. The ones that don't need them won't even notice.
Implementation effort and maintanence is by several factors larger than usual "good enough" scalar implementation.
ashvardanian|2 years ago
Amazon, Microsoft, Nvidia, Ampere, Apple, Qualcomm, and all the other Arm-based CPU vendors should really consider investing more into the ecosystem. The hardware is very capable, they shouldn’t be losing against x86 in so many benchmarks…
menaerus|2 years ago
Implementation effort and maintanence is by several factors larger than usual "good enough" scalar implementation.