top | item 35154935

(no title)

neilmovva | 3 years ago

Our FHE scheme uses lots of Number Theoretic Transforms (NTTs), which are pretty computationally expensive. NTT is a good candidate for acceleration, and there is quite a bit of interest from the zk community in doing so (https://www.zprize.io/prizes/accelerating-ntt-operations-on-...).

From a hardware perspective, NTT can be done in parallel, but has a fairly large working set of data (~512 MB) with lots of unstructured accesses. This is too big to fit in even the largest CPU L3 caches, so DRAM bandwidth is still relevant. It may be eventually be feasible to build an ASIC with this much on-chip memory, but in the meantime, GPUs do a pretty decent job with their massive HBM bandwidth.

discuss

order

byteware|3 years ago

interesting prize, I wonder why they fix that it has to be radix-2 NTT, using higher radix speeds things up an order of magnitude on GPU (granted I am using a 256 bit field, so it might be more memory bound)