ricksunscreen | 3 years ago | on: Google’s fully homomorphic encryption compiler – a primer
ricksunscreen's comments
ricksunscreen | 3 years ago | on: Google’s fully homomorphic encryption compiler – a primer
If you can live with the limitations of other schemes FHE can be much faster. On a single core of an M1 Macbook Air, multiplying 2 BFV encrypted 4096-bit values takes 4ms and adding them takes only 15us. Additionally, key sizes aren't horrendous (<500kB). One downside with this scheme are that bootstrapping takes minutes so it isn't really practical. This limits the amount of computation you can do before you exceed your noise budget and the ciphertext decrypts to garbage. The other downside is that arithmetic circuits make some computations far more difficult (e.g. comparisons).
ricksunscreen | 3 years ago | on: The Rise of Fully Homomorphic Encryption
My impression is that there are many parallels computing at large where custom hardware is becoming more and more prevalent. You can run arbitrary C programs with FHE by building a CPU out of binary gates and running on that, but it will run at 1Hz[2] emulated on 8GPUs. So, computing fibonacci(5) takes like 16 minutes. Conversely, you could create an arithmetic circuit that does it in like 16us. However, working with circuits is hard, let alone the additional requirements FHE imposes.
Today, our compiler lets you write Rust code that turns into an arithmetic circuit in the BFV scheme. It also manages parameter selection, which is another annoying part of FHE: choose parameters too small and decryption will fail due to excessive noise, but larger parameters slow down the computation and make ciphertexts larger.
Overall, the FHE has a ton of promise, but is currently in the chicken and egg phase whereby there isn't much commercially available because there isn't a market because there isn't anything available. We're trying to be an egg and grow along with a market. FHE is a big area and there's a ton to explore, like multi-key encryption[3]. FHE is currently nascent, but I believe its future is bright where it can be appropriately used.
[1]: https://docs.sunscreen.tech/
[2]: https://www.usenix.org/conference/usenixsecurity21/presentat...
ricksunscreen | 3 years ago | on: The Rise of Fully Homomorphic Encryption
To give you sense of performance, today you can multiply 2 encrypted 8192-bit values in BFV with typical (not optimal) scheme parameters in 17ms on a single core of an M1 Macbook Air. This is the most expensive operation by a wide margin. The ciphertexts for these parameters is about 0.5MB and the keys are maybe a meg or two.
The algorithm you want make fast for most schemes is the Number Theory Transform (NTT), which is basically a Fast Fourier Transform (FFT) for finite fields. This algorithm has only O(nlog(n)) operations, so the computational intensity relative to memory accesses is fairly low. This stands in contrast to something nice like matrix multiplication where matrices are O(n^2) but require O(n^3)[1] computation. Unfortunately due to Amdahl's law, you have to make not just NTT fast, but all the other boring O(n) operations schemes need to do.
If you want to make FHE fast enough to justify an ASIC, you'll have to avoid data movement and basically keep everything in on-chip SRAM. Waiting 400 clock cycles for data is a non-starter. For algorithms with bootstrapping, your bootstrapping key might be 100MB, so you'll probably want a chip with like 512MB of on-chip memory to hold various keys, ciphertexts, etc. You then need to route and fan-out that data as appropriate.
You then need to also pack a ton of compute units that can quickly do NTTs on-chip, but are also versatile to do all the other "stuff" you need to do in FHE, which might include multiplication and addition modulo some value, bit decomposition, etc. And you'll probably doing operations on multiple ciphertexts concurrently as you traverse down an arithmetic or binary circuit (FHE's computational model). Figuring out the right mix of what an ALU is and how programmable it needs to be is tricky business.
For larger computations, maybe you stream ciphertexts in and out of DRAM in the background while you're computing other parts of the graph.
Making an FHE accelerator is neither easy nor cheap (easily a 50-100M+ investment), but I think it is possible. My SWAG is that you might be able to turn the 17ms latency into like 50-100us but with way more throughput to execute a large circuit graph(s).
[1]: Strassen algorithm git out of here
ricksunscreen | 3 years ago | on: The Rise of Fully Homomorphic Encryption
Suppose Bob has an array of data he arranges into an mxn matrix, A. This data is not encrypted, but is encoded appropriately. Note that many FHE schemes allow you to compute ciphertext-plaintext operations.
Alice can send him 2 vectors x and y encrypted under her key, where x and y are all zero except for single 1. Bob homomorphically computes Ax = b. Since x is all zeros except for element i, the operation Ax effectively selects the ith column of A. Bob then computes dot(b, y). Since y is all zeros except for a 1 at element j, the dot product effectively selects the jth row of y. Bob sends the dot product back to Alice, which due to FHE is still encrypted under her key.
Alice decrypts the result and has looked up the j,ith element in A without Bob learning Alice's query or which data was involved in processing her search.
The default program on the Sunscreen[1] compiler playground shows this exact algorithm.
Disclaimer: I am an employee of Sunscreen.
ricksunscreen | 3 years ago | on: The Rise of Fully Homomorphic Encryption