top | item 21875407

(no title)

lightcatcher | 6 years ago

I believe there's work to be done for each new CPU architecture (Broadwell, Skylake (AVX-512), Cascade Lake, let alone ARM or other architectures). The code needs to be updated for things like L1 cache size, number of registers per core, and number of adders per core. So there will likely continue to be frequent work on BLAS implementations until there's some very smart optimizing and profiling compiler (which is related to what ATLAS does I think).

discuss

order

No comments yet.