(no title)
korbin | 1 year ago
If you carefully organize the nonce at the end and use all 55 bytes, you can pre-hash the first ~20/64 rounds of state and the first several rounds of W generation and just base further iterations off of that static value (this is known as a "midstate optimization.")
> If you limit your variable portion to a base16 alphabet like A-P
The more nonce bits you decide to use, the less you can statically pre-hash.
In FPGA, I am using 64 deep, 8-bit-wide memories to do the alphabet expansion. I am guessing in CUDA you could something similar with `LOP3.LUT`.
No comments yet.