top | item 40395381

(no title)

lunaticd | 1 year ago

author here. I thought there might be a machine instruction for this but wasn't sure, I also didn't know Julia had a count_ones that counted the 1s.

Thanks! With this the timings are even faster. I'll update the post.

discuss

order

lunaticd|1 year ago

julia> @code_typed hamming_distance(Int8(33), Int8(125)) CodeInfo( 1 ─ %1 = Base.xor_int(x1, x2)::Int8 │ %2 = Base.ctpop_int(%1)::Int8 │ %3 = Base.sext_int(Int64, %2)::Int64 │ nothing::Nothing └── return %3 ) => Int64

julia> @code_llvm hamming_distance(Int8(33), Int8(125)) ; Function Signature: hamming_distance(Int8, Int8) ; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:13 within `hamming_distance` define i64 @julia_hamming_distance_16366(i8 signext %"x1::Int8", i8 signext %"x2::Int8") #0 { top: ; @ /Users/lunaticd/code/tiny-binary-rag/rag.jl:14 within `hamming_distance` ; ┌ @ int.jl:373 within `xor` %0 = xor i8 %"x2::Int8", %"x1::Int8" ; └ ; ┌ @ int.jl:415 within `count_ones` %1 = call i8 @llvm.ctpop.i8(i8 %0) ; │┌ @ int.jl:549 within `rem` %2 = zext i8 %1 to i64 ; └└ ret i64 %2 }

it lowers to the machine instruction now.

I also tried 8 Int64s vs 64 Int8s and it doesn't seem to make a difference when doing the search.

EDIT: apologize for the formatting

shiandow|1 year ago

I think you may need to update the figures in the rest of the article. At some point you mention it should take around 128ns but with the new benchmark that's probably closer to 64*1.25=80ns.

sitkack|1 year ago

I had Opus translate your code to Rust

    fn hamming_distance_u8(x1: u8, x2: u8) -> usize {
        (x1 ^ x2).count_ones() as usize
    }