top | item 42411692

(no title)

berkut | 1 year ago

A 256-item float32 LUT for 8-bit sRGB -> linear conversion is definitely still faster than doing the division live (I re-benchmarked it on Zen4 and Apple M3 last month), however floating point division with the newer microarchs is not as slow as it was on processors 10 years ago or so, so I can imagine using a much larger LUT cache is not worth it.

discuss

order

fp64|1 year ago

does this include vectorized code? I stopped using LUTs for anything “trivial” probably 20 years ago because I rarely see any improvements (in particular where it would benefit the overall runtime noticeably).