top | item 46730655

(no title)

icelancer | 1 month ago

> sub 900 not possible though.

I performed a similar analysis to you and found it very difficult to imagine sub-1000. Your comment I think convinced me that it may be possible, though. Interesting.

I'm below the threshold for recruiting but not below Claude at the moment. Not sure where I am going wrong.

discuss

order

amirhirsch|1 month ago

Here’s some other hints: combine hash stages 2 and 3, it can be two muladds and a XOR

For the first several rounds (when every tree value is in use) Combine the stage 5 XOR with the subsequent round’s tree XORs. You can determine even/odd in hash stage 5 starting with a ^ (a>>16) without Xoring the constant, then you can only need one XOR, this saves you a ton of XORs

Create separate instruction bundles for the first round, rounds 1-5 (combining hash stages 5 XOR with next round tree XORs) and 6-9 (not every tree node is used anymore), round 10 round 11-14 and round 15 and combine them.

you can use add_imm in parallel to load consts. stage 0 you have to do load the tree first and the vals, by later stages when everything is in scratch, you could use 12 scalar XORs and 6 vector XORs on scratch. once you vload vals, you can start to do XORs but can only advance so much at a time, so I’m starting to work on getting hash stages moving to different rounds faster to hide the initial vloads and get to the heavy load section sooner and spread the load pain.