(no title)
shihab | 1 month ago
Re (b) I'm curious what that middle ground is. Is there any simple refactor to help GCC to get rid of this `if`? (Note, ISPC did fine here)
(c) Just to be clear, all the codes in benchmark figures (baseline and SIMD) were compiled with fast-math flags.
Regarding (a), one of the points I wanted to get across was that it didn't feel that complicated to program in the end as I had thought. Porting to AVX-512 felt mechanical (hence the success of LLMs in one-shotting the whole thing).
This is a subjective opinion, depends on programmer's experience etc- so I won't dwell on it. I just wish more CPU programmers gave it a try.
Remnant44|1 month ago
I avoided it for a long time because, well, it was so damn ugly and verbose to do simple things. However, in actual practice it's not nearly as painful as it looks, and you get used to it quickly.
physicsguy|1 month ago