Cryptographers solve decades-old privacy problem

[+] ChrisArchitect|2 years ago|reply

The paper:

Doubly Efficient Private Information Retrieval and Fully Homomorphic RAM Computation from Ring LWE

https://eprint.iacr.org/2022/1703

[+] kelsey9876543|2 years ago|reply

[deleted]

[+] llwj|2 years ago|reply

How efficient is it now? The last time I checked, FHE required minutes of computation and gigabytes of memory to store tiny amounts of data, and since it does not hold IND-CCA, I could not find any use cases.

[+] sangel|2 years ago|reply

Very inefficient. Like wildly so. Specifically if you have a very small database and you preprocess it with their techniques, the resulting database is petabytes in size. But the results are very beautiful.

There are no obvious ways to improve on this work, so it is not a matter of engineering. We really do need another breakthrough result to get us closer to practicality.

[+] jengels_|2 years ago|reply

Pretty efficient! E.g. a recent paper describes a system to do fully private search over the common crawl (360 million web pages) with an end to end latency of 2.7 seconds: https://dl.acm.org/doi/10.1145/3600006.3613134

[+] rhindi|2 years ago|reply

FHE in general is efficient enough for many applications now. You can see some benchmarks here: https://docs.zama.ai/tfhe-rs/getting-started/benchmarks

[+] godelski|2 years ago|reply

Not an answer, but a question that I hope someone can answer. Is the lack of speed because of a lack of optimization or compute? Is this something that could be fixed by an accelerator?

It's often hard to compare things in an accurate way. I mean many current methods might already have hardware specific acceleration and researchers aren't always good at optimization (why should they be?). So is the problem of FHE a bigger lack of just not enough people (specialists) putting in time to optimize it or is it so computationally intensive that we need hardware to get better before it becomes practical? I mean look at llama.cpp and how much faster it is or stable diffusion? Sure, both kinda slow (diffusion's no GAN and SD's not straight diffusion) but very usable for many applications.

[+] _ks3e|2 years ago|reply

Based on my recollection of a conversation with the authors after their STOC talk: the RAM scheme is not efficient enough to be competitive with circuit-based FHE schemes; for problems with low RAM usage, existing circuit-based methods are more efficient, and problems with higher RAM usage are infeasible to run on any FHE method right now.

They were 50/50 on whether or not this technique could be made feasible in the same way that, say, the CKKS scheme is.

[+] catilinam|2 years ago|reply

Isn’t FHE by definition not INDCCA? Weak objection imo

[+] josh2600|2 years ago|reply

I remain unconvinced of the practical applications of solutions like this.

These seem like academic toys.

Pre-processing a dynamic database is a non-starter. No database that is used for any real world use case is static at scale, especially not internet traffic.

I’m not expecting FHE that’s fast in my lifetime, but I’ve been wrong before.

Context: I led a team which built the first open source production-grade implementation of hardware-accelerated PIR which is currently at use at scale.

[+] enkid|2 years ago|reply

I don't think the purpose of this sort of research is to be immediately applicable, it more shows a direction that could be useful in the future. Shor's algorithm has not been used practically, but it's hard to imagine modern cryptography without "post quantum" being an important topic.

[+] kevincox|2 years ago|reply

I think you are right that this paper isn't practically useful. But that is much like saying that making one chip off of a block of granite isn't art. It isn't, but the first chip enables further research until it is practical.

[+] CreRecombinase|2 years ago|reply

That’s certainly not true in scientific computing. For us, dynamic is very much the exception rather than the rule.

[+] karulont|2 years ago|reply

> hardware-accelerated PIR which is currently at use at scale.

Can you tell us more about that?

[+] desdenova|2 years ago|reply

Good to see some people are actually solving this problem, unlike all those startups using it as a marketing buzzword.

[+] captn3m0|2 years ago|reply

The first time I can across PIR was via the 2014 blog post from Signal on how Private Contact Discovery was a hard problem and how it required PIR to be solved. https://signal.org/blog/contact-discovery/

Maybe this will help Signal get to a viable solution in a few years.

[+] hanniabu|2 years ago|reply

I assume you're referring to the blockchain industry, but they've advanced cryptography a lot, specifically with zkproofs.

[+] nixpulvis|2 years ago|reply

> But using their private lookup method as scaffolding, the authors constructed a new scheme which runs computations that are more like the programs we use every day, pulling information covertly without sweeping the whole internet.

Doesn’t Google already accomplish this? Or is the key here that Google doesn’t do the sweeping in a covert way? So search boils down to two problems: building the index and using the index.

Or am I confused?

[+] wolverine876|2 years ago|reply

It understand it's inefficient, but could it be used by a well-resourced organization where confidentiality is extremely high-value and the database is small, such as intelligence, military plans, Apple's product plans, etc.?

[+] narinxas|2 years ago|reply

there are much cheaper ways, chief amongst them is to use paper and pencils

[+] paulpauper|2 years ago|reply

You run Google’s algorithm yourself and secretly pull data from the internet when necessary.

yeah isn't this the same offline viewing? why do you need a new algorithm for that? we've known about this forever. obviously, if you download the database and access it, this is more secure than being online , but way slower. Regarding the library example, this too has easy solutions: checkout many books at once, have a stranger check out the book, etc.

This article seems to do a poor job explaining what exactly this solves, only that it's revolutionary. The metaphors are not helpful.

[+] Out_of_Characte|2 years ago|reply

How I interpreted this is that you download the google search algorithm, then scan their database over the internet with that algorithm. Which would take ages or millenia depending on how much data google has.

You wouldn't need to download the entire database and there would be alot of uncertainty over which search result you'd actually used.

[+] fyokdrigd|2 years ago|reply

so, not private nor efficient nor solution to homomorphic data.

basically a polynomial factor hash of the index keys... basically you will need the entire db plus the larger hashed keys. doesn't help at all implementing any of the outlandish claims.

guess the merit is in proving the poly fact they build on as secure. but that is not a game changer as there are better alternatives and nobody solved the claimed problems with them.

[+] asasidh|2 years ago|reply

Would a solution like mixing address the issue with acceptable tradeoffs ? https://arxiv.org/abs/1905.12264

[+] malaya_zemlya|2 years ago|reply

Wouldn't ChatGPT in homomorphic encryption mode do something like that already? You put in a query - the system processes it in mysterious ways, and returns the answer in constant time.

[+] Uptrenda|2 years ago|reply

When I was researching this problem 10 years ago it was impractical then and the situation seems completely (brace for it) the same. Woah!

[+] unknown|2 years ago|reply

[deleted]

[+] jdefr89|2 years ago|reply

This is super misleading and sort of ambiguous. Are they claiming they eliminated any side channel information that can be used? I am confused here. The problem is anyone can collect various side channel information and use it as Meta data… Fully Homo. Encryption only helps after you’ve established secure links and to do that you will inevitably have to use a less secure link or system that which can be used in and of itself to make inferences about queries… The real issue is we don’t know if the “secure” systems we use that we don’t control actually give us the privacy they claim to…

[+] l33t7332273|2 years ago|reply

You can use differential privacy to protect from most (timing, memory, etc) side channels in distributed analytics.

[+] parentheses|2 years ago|reply

++ it's unclear what data this breakthrough is helping to keep private and how.

[+] unknown|2 years ago|reply

[deleted]

[+] j2kun|2 years ago|reply

It's an exciting time to be working in homomorphic encryption!

[+] noman-land|2 years ago|reply

Homomorphic encryption and zero knowledge proofs are the most exciting technologies for me for the past bunch of years (assuming they work (I'm not qualified enough to know)).

Having third parties compute on encrypted, private data, and return results without being able to know the inputs or outputs is pretty amazing.

[+] unknown|2 years ago|reply

[deleted]

[+] ChrisArchitect|2 years ago|reply

Original article submitted a few times weeks ago:

https://news.ycombinator.com/item?id=38164732

[+] unknown|2 years ago|reply

[deleted]

[+] unknown|2 years ago|reply

[deleted]

[+] I_am_tiberius|2 years ago|reply

As most likely lots of cryptographers read this, I have a question. What's an efficient way to store encrypted data in a database (only specific table columns) and decrypt them on the fly while querying the data? Using postgres with pgp_sym_encrypt(data, key) seems very slow.

132 comments