top | item 24737451

(no title)

timholy | 5 years ago

Check out LoopVectorization: https://github.com/chriselrod/LoopVectorization.jl

From its benchmarks (https://chriselrod.github.io/LoopVectorization.jl/latest/exa...), a 9-line naive matrix multiplication routine in Julia + LV slightly edges out Intel's MKL.

discuss

order

dalke|5 years ago

It looks like that @avx macro just tells Julia that it's okay to use the AVX instructions?

My specific question is, how do I tell Julia that I want to compute the popcount of the intersection of two byte strings of length 256 bytes?

A reference C code is at http://www.dalkescientific.com/writings/diary/archive/2020/1... in byte_intersect_256() and threshold_bin_tanimoto_search(); my blog posts shows the important parts - I link to the full definition for the C code.

eigenspace|5 years ago

@avx does a ton more than just use AVX instructions. It'll reorder and unroll loops when advantageous, swap out some functions for more vectorizable version of those functions and a few other tricks.

Julia uses avx instructions by default if your code is amenable to it.