I say this as a lover of FHE and the wonderful cryptography around it:
While it’s true that FHE schemes continue to get faster, they don’t really have hope of being comparable to plaintext speeds as long as they rely on bootstrapping. For deep, fundamental reasons, bootstrapping isn’t likely to ever be less than ~1000x overhead.
When folks realized they couldn’t speed up bootstrapping much more, they started talking about hardware acceleration, but it’s a tough sell at time when every last drop of compute is going into LLMs. What $/token cost increase would folks pay for computation under FHE? Unless it’s >1000x, it’s really pretty grim.
For anything like private LLM inference, confidential computing approaches are really the only feasible option. I don’t like trusting hardware, but it’s the best we’ve got!
There is an even more fundamental reason why FHE cannot realistically be used for arbitrary computation: it is that some computations have much larger asymptomatic complexity on encrypted data compared to plaintext.
A critical example is database search: searching through a database on n elements is normally done in O(log n), but it becomes O(n) when the search key is encrypted. This means that fully homomorphic Google search is fundamentally impractical, although the same cannot be said of fully homomorphic DNN inference.
Even without bootstrapping FHE will never be as fast as plaintext computation: the ciphertext is about three orders of magnitude much larger than the plaintext data it encrypts, which means you have to have more memory bandwidth and more compute. You can’t bridge this gap.
Don't you think there is a market for people who want services that have provable privacy even if it costs 1,000 times more? It's not as big a segment as Dropbox but I imagine it's there.
I get that there is a big LLM hype, but is there really no other application for FHE? Like for example trading algorithms (not the high speed once) that you can host on random servers knowing your stuff will be safe or something similar?
I think the only thing that could make FHE truly world-changing is if someone figures out how to implement something like multi-party garbled circuits under FHE where anyone can verify the output of functions over many hidden inputs since that opens up a realm of provably secure HSMs, voting schemes, etc.
I'd also like to comment on how everything used to be a PCIE expansion card.
Your GPU was, and we also used to have dedicated math coprocessor accelerators. Now most of the expansion card tech is all done by general purpose hardware, which while cheaper will never be as good as a custom dedicated silicon chip that's only focused on 1 task.
Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.
I'd argue the only AI accelerators are something like what goes into modern SXM (sockets). This ditches the power issues and opens up more bandwidth. However only servers have the sxm sockets....and those are not cheap.
FHE is an important tool because right now companies can be coerced by governments to break encryption for specific targets. FHE removes the need for companies to have a back bone, they can simply shrug and say "We literally do not see the plaintext, ever". They can kinda do this with End to End encryption when they're simply the network/carrier, but cannot currently do this anytime they're processing the plaintext data.
I come from a values basis that privacy is a human right, and governments should be extremely limited in retailiatory powers against a just and democratic usage of powers against them. (things like voting, arts, media, free speech etc)
FHE might allow arbitrary computation, but I use most services because they have some data I want to use: their search index, their knowledge, their database of chemicals, my bank account transactions, whatever.
So unless Google lets me encrypt their entire search index, they can still see my query at the time it interacts with the index, or else they cannot fulfill it.
The other point is incentives: outside of some very few, high-trust high-stakes applications, I don't see why companies would go through the trouble and FHE services.
Here's an implementation of a fully private search engine using FHE that allows querying Wikipedia with the server remaining oblivious as to what you're reading: https://spiralwiki.com/
From what I understand, only the sensitive data needs to be encrypted (e.g. your bank transactions). It is still possible to use public unencryped data in the computation, as the function you want to compute doesn't have to be encrypted.
Exactly what I thought. In the end it really isn't in most of the big corps interest to not see your data/query. They need/want to see it so why would they degrade their ability to do so if they can just say no and you will have to rely on using their services without FHE. For banking applications cool, everyone else debatable if it will ever be accepted.
You're right about incentives, but wrong about the first part. Private lookups of a plaintext database are possible and have been for a while now (5+ years?). The problem is it often requires some nontrivial preprocessing of the plaintext database, or in the worst case a linear scan of the entire database.
I get the "client side" of this equation; some number of users want to keep their actions/data private enough that they are willing to pay for it.
What I don't think they necessarily appreciate is how expensive that would be, and consequently how few people would sign up.
I'm not even assuming that the compute cost would be higher than currently. Let's leave aside the expected multiples in compute cost - although they won't help.
Assume, for example, a privacy-first Google replacement. What does that cost? (Google revenue is a good place to start that Calc.) Even if it was say $100 a year (hint; it's not) how many users would sign up for that? Some sure, but a long long way away from a noticeable percentage.
Once we start adding zeros to that number (to cover the additional compute cost) it gets even lower.
While imperfect, things like Tor provide most of the benefit, and cost nothing. As an alternative it's an option.
I'm not saying that HE is useless. I'm saying it'll need to be paid for, and the numbers that will pay to play will be tiny.
An FHE Google today would be incredible expensive and incredibly slow. No one would pay for it.
The key question I think is how much computing speed will improve in the future. If we assume FHE will take 1000x more time, but hardware also becomes 1000x faster, then the FHE performance will be similar to today's plaintext speed.
Predicting the future is impossible, but as software improves and hardware becoming faster and cheaper every year, and as FHE provides a unique value of privacy, it's plausible that at some point it can become the default (if not 10 years, maybe in 50 years).
Today's hardware is many orders of magnitudes faster compared to 50 years ago.
There are of course other issues too. Like ciphertext size being much larger than plaintext, and requirement of encrypting whole models or indexes per client on the server side.
FHE is not practical for most things yet, but its venn diagram of feasible applications will only grow. And I believe there will be a time in the future that its venn diagram covers search engines and LLMs.
> Internet's "Spy by default" can become "Privacy by default".
I've been building and promoting digital signatures for years. Its bad for people and market-dynamics to have Hacker News or Facebook be the grand arbiter of everyone's identity in a community.
Yet here we are because its just that much simpler to build and use it this way, which gets them more users and money which snowballs until alternatives dont matter.
In the same vein, the idea that FHE is a missing piece many people want is wrong. Everything is still almost all run on trust, and that works well enough that very few use cases want the complexity cost - regardless of operation overhead - to consider FHE.
> I've been building and promoting digital signatures for years.
I agree with this wholeheartedly, and yet I do get the following question a lot "What's all that nonsense at the end of your emails". Any explanation is met with eye-rolls and 1000 yard stares. Have you managed to get laypeople on-board with any kind of client-side cryptography? how?
I think the opening example involving Google is misleading. When I hear "Google" I think "search the web".
The articles is about getting an input encrypted with key k, processing it without decrypting it, and sending back an output that is encrypted with key k, too. Now it looks to me that the whole input must be encrypted with key k. But in the search example, the inputs include a query (which could be encrypted with key k) and a multi-terabyte database of pre-digested information that's Google's whole selling point, and there's no way this database could be encrypted with key k.
In other words this technique can be used when you have the complete control of all the inputs, and are renting the compute power from a remote host.
Not saying it's not interesting, but the reference to Google can be misunderstood.
> Now it looks to me that the whole input must be encrypted with key k. But in the search example, the inputs include a query […] and a multi-terabyte database […]
That’s not the understanding I got from Apple’s CallerID example[0][1]. They don’t seem to be making an encrypted copy of their entire database for each user.
Homomorphically encrypted services don't need a priori knowledge of the encryption key. That's literally the whole point.
Consider the following (very weak) encryption scheme:
m, k ∈ Z[p], E(m) = m * k mod p, D(c) = c * k⁻¹ mod p
With this, I can implement a service that receives two cyphertexts and computes their encrypted sum, without knowledge of the key k:
E(x) + E(y) = x * k + y * k mod p = (x + y) * k mod p = E(x + y)
Of course, such a service is not too interesting, but if you could devise an algebraic structure that supported sufficiently complex operations on cyphertexts (and with a stronger encryption), then by composing these operations one could implement arbitrarily complex computations.
I don't know, when I hear Google I hear Gmail, Google docs, and every other service they have to know about people. My mom would probably think mostly about search, but then she would not read an article about HME
"The implications are big. The entire business model built on harvesting user data could become obsolete. Why send your plaintext when another service can compute on your ciphertext?"
Why do people always do this thing where they think inventing a technology has somehow changed economics? I think the implications are very small. There is value in people's user data and people are very eager to barter that value against cheaper services, we can tell because people continue to vote with their wallets and feet.
You could already encrypt or offer zero retention policies on large amounts of internet businesses and every major company has competitors that do, but they exist on the margins because most people don't take that deal.
Yeah, I can totally see companies rushing to implement homomorphically encrypted services that consume 1000000x more compute than necessary, are impossible to debug, and prevent them from analyzing usage data.
E2EE git was invented. I asked the creator if server can enforce protected branches or force pushes. He has no solution for evil clients. Maybe this could lead to E2EE Github?
CipherStash founder here: FHE isn't the only option here. Specialized searchable encryption schemes exist and are much faster than FHE. Different flavours can be combined to create a comprehensive search system which is very close to the performance of plaintext information retrieval. FHE remains an option for generalized computation but can be reserved for small datasets that have been narrowed down using fast searchable encryption.
I don't think FHE is the solution to PIR but it might well form a part of it when combined with more practical approaches.
It's a distraction to try and imagine homomorphic encryption for generic computing or internet needs. At least not for many more generations of moore's law and then even still.
However, where FHE will shine already is in specific high-value, high consequence and high confidentiality applies, but relatively low complexity computational calculations. Smart contracts, banking, potentially medical have lots of these usecases. And the curve of Moore's law + software optimizations are now starting to finally bend into the zone of practicality for some of these.
See what Zama https://www.zama.ai/ is doing, both on the hardware as well as the devtools for FHE.
Full homomorphic encryption is not the future for private internet, confidential VMs are. CVMs are using memory encryption and separation from the host OS. ARM has TEE, AMD has SEV and Intel has been fumbling around with SGX and TDX for more than a decade.
The idea that these will keep being improved on in speed reminds me of the math problem about average speed:
> An old car needs to go up and down a hill. In the first mile–the ascent–the car can only average 15 miles per hour (mph). The car then goes 1 mile down the hill. How fast must the car go down the hill in order to average 30 mph for the entire 2 mile trip?
Past improvement is no indicator of future possibility, given that each improvement was not re-application of the same solution as before. These are algorithms, not simple physical processes shrinking.
Is the downhill section a cliff? Google informs me terminal velocity of a car is 200-300mph, so to fall a mile at 300mph, the car will need 12 seconds, so let's round up to 15 seconds to account for the time it's accelerating.
To cover the full 2 miles at an average of 30mph, we need to complete the entire journey in 4 minutes, leaving 225 seconds for the ascent.
We know that the old car was averaging 15 miles per hour, but the speedo on an old car is likely inaccurate, and we only need to assume a 6% margin of error for the car to show 15 miles per hour and cover the mile in 225 seconds. You probably couldn't even tell the difference between 15 and 16 on the speed anyway, but let's say that we also fitted out the car with brand new tyres (so the outer circumference will be more than old worn tyres), and it's entirely possible.
So, let's say 240mph. That's the average speed of our mile freefall in 15 seconds.
41 mph, assuming the person asking the question was just really passionate about rounding numbers and/or had just the bare minimum viable measurement tooling available :)))
Assuming speed gets solved as predicted, for an application like search, the provider would have to sync a new database of “vectors” to all clients every time the index updates. On top of that, these DBs are tens if not hundreds of GB huge.
A very basic way of how it works: encryption is basically just a function e(m, k)=c. “m” is your plaintext and “c” is the encrypted data. We call it an encryption function if the output looks random to anyone that does not have the key
If we could find some kind of function “e” that preserves the underlying structure even when the data is encrypted you have the outline of a homomorphic system. E.g. if the following happens:
e(2,k)*e(m,k) = e(2m,k)
Here we multiplied our message with 2 even in its encrypted form. The important thing is that every computation must produce something that looks random, but once decrypted it should have preserved the actual computation that happened.
It’s been a while since I did crypto, so google might be your friend here; but there are situations when e.g RSA preserves multiplication, making it partially homomorphic.
a simple example of partial homomorphic encryption (not full), would be if a system supports addition or multiplication. You know the public key, and the modulus, so you can respect the "wrap around" value, and do multiplication on an encrypted number.
other ones I imagine behave kinda like translating, stretching, or skewing a polynomial or a donut/torus, such that the point/intercepts are still solveable, still unknown to an observer, and actually represent the correct mathematical value of the operation.
just means you treat the []byte value with special rules
If I understand correctly companies like OpenAI could run LLMs without having access to the users new inputs. It seems to me new users data are really useful for further training of the models.
Can they still train the models over encrypted data? If this new data is not usable, why would the companies still want it?
Let's assume they can train the LLMs over encrypted data, what if a large number of users inject some crappy data (like it has been seen with the Tay chatbot story). How can the companies still keep a way to clean the data?
> The only way to protect data is to keep it always encrypted on servers, without the servers having the ability to decrypt.
> If FHE is a possible option, people and institutions will demand it.
I don't think that privacy is a technical problem. To take the article's example, why would Google allow you to search without spying on you? Why would chatgpt discard your training data?
GPG has been around for decades. You can relatively easily add a plug-in to use it on top of gmail. Surely the protocol is not perfect, but could have been made better much more easily than it is to improve HPE, since a lot of its clunkiness can be corrected by UX. But people never cared enough that everything they write is read by Google to encrypt it. And since Google loves reading what you write, they'll never introduce something like HPE without overwhelming adoption and requirements by others.
As someone who knows basically nothing about cryptography - wouldn't training an LLM to work on encrypted data also make that LLM extremely good at breaking that encryption?
I assume that doesn't happen? Can someone ELI5 please?
Good encryption schemes are designed so that ciphertexts are effectively indistinguishable from random data -- you should not be able to see any pattern in the encrypted text without knowledge of the key and the algorithm.
If your encryption scheme satisfies this, there are no patterns for the LLM to learn: if you only know the ciphertext but not the key, every continuation of the plaintext should be equally likely, so trying to learn the encryption scheme from examples is effectively trying to predict the next lottery numbers.
This is why FHE for ML schemes [1] don't try to make ML models work directly on encrypted data, but rather try to package ML models so they can run inside an FHE context.
From my understanding of cryptography, most schemes are created with the assumption that _any_ function that does not have access to the secret key will have a probabilistically small chance of decoding the correct message (O(exp(-key_length)) usually). As LLMs are also a function, it is extremely unlikely for cryptographic protocols to be broken _unless_ LLMs can allow for new types of attacks all together.
> The entire business model built on harvesting user data could become obsolete.
This is far too optimistic. Just because you can build a system that doesn't harvest data, doesn't necessarily mean it's a profitable business model. I'm sure many of us here would be willing to pay for a FHE search engine, for example, but we're a minority.
How do you send a password reset email with this. Eventually your mail server will need the plaintext address in order to send the email. And that point can be leaked in a data breach.
It's idealistic to think this could solve data braches because businesses knowing who their customers are is such a fundamental concept.
> Privacy awareness of users is increasing. Privacy regulations are increasing.
I beg your unbelievable pardon, but no? This part of the equation is not addressed in the article, but it is by far and away the biggest missing piece for there to be any hope of FHE seeing widespread adoption.
Here's what I don't understand about homomorphic encryption and so struggle to trust in the very concept. If you can process encrypted data and get useful results, then a major part of the purpose of encryption is defeated, right? How am I wrong?
all great until you realize no one is allowed to export things to other regions if it works too well (crypto). Then besides that, the companies who now litterally live off of your personal data (most of big tech), wont suddenly drop their main source of income on behalf of the privacy of their users which clearly, they care nothing about.
unless replacement services are offered and adopted en masse (they wont be, u cant market against companies who can throw billions at breaking you), those giants wont give away their main source of revenue...
so even if technical challenges are overcome, there are more human and political challenges which will likely be even harder to crack...
Very cool, although I have some reservations about "... closest vector problem is believed to be NP-hard and even quantum-resistant". "Believed to be" is kind of different from "known to be".
If it makes you feel better, no cryptographic assumptions we use today are known to be NP-hard. Or maybe that makes you feel worse, not sure. But it doesn't really matter because NP-hardness is a statement about worst case inputs and cryptography needs guarantees about average case inputs since keys are generated randomly.
all modern encryption is currently held together by asymmetric encryption that are all based on "believed to be" foundations not "known to be" foundations
I think this should talk about the kinds of applications you can actually do with FHE because you definitely can't implement most applications (not at a realistic scale anyway).
It's simple conceptually: you find an encryption method Enc that guarantees `Sum(Enc(x), Enc(y)) = Enc(Sum(x, y))`. That's ultimately all there is to it. Then, you give the server enc_x and enc_y, the server computes the sum, and returns to you enc_sum. You then decrypt the value you got and that's x+y.
Since lots of functions behave in this way in relation to sums and products, you "just" need to find ones that are hard to reverse so they can be used for encryption as well.
Unfortunately this turns out to not work so simply. In reality, they needed to find different functions FHESum and FHEMultiply, that are actually much harder to compute (1000x more CPU than the equivalent "plaintext" function is a low estimate of the overhead) but that guarantee the above.
I interrupted this fascinating read to tell that "actually", quantum computers are great at multi-dimensional calculation if you find the correct algorithms. It's probably the only thing they will ever be great at. You want to show that finding the algorithm is not possible with our current knowledge.
anyway, making the computer do the calculation is one thing, getting it to spew the correct data is another.... But still, the article (which seems great at the moment) brushes it of a bit too quickly.
Ok, lets stop being delusional here. I'll tell you how this will actualy work:
Imagine your device sending Google an encrypted query and getting back the exact results it wanted — without you having any way of knowing what that query was or what result they returned. The technique to do that is called Fully Homomorphic Encryption (FHE).
blintz|7 months ago
While it’s true that FHE schemes continue to get faster, they don’t really have hope of being comparable to plaintext speeds as long as they rely on bootstrapping. For deep, fundamental reasons, bootstrapping isn’t likely to ever be less than ~1000x overhead.
When folks realized they couldn’t speed up bootstrapping much more, they started talking about hardware acceleration, but it’s a tough sell at time when every last drop of compute is going into LLMs. What $/token cost increase would folks pay for computation under FHE? Unless it’s >1000x, it’s really pretty grim.
For anything like private LLM inference, confidential computing approaches are really the only feasible option. I don’t like trusting hardware, but it’s the best we’ve got!
mti|7 months ago
A critical example is database search: searching through a database on n elements is normally done in O(log n), but it becomes O(n) when the search key is encrypted. This means that fully homomorphic Google search is fundamentally impractical, although the same cannot be said of fully homomorphic DNN inference.
reliabilityguy|7 months ago
ipnon|7 months ago
txdv|7 months ago
benlivengood|7 months ago
tonetegeatinst|7 months ago
Your GPU was, and we also used to have dedicated math coprocessor accelerators. Now most of the expansion card tech is all done by general purpose hardware, which while cheaper will never be as good as a custom dedicated silicon chip that's only focused on 1 task.
Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.
I'd argue the only AI accelerators are something like what goes into modern SXM (sockets). This ditches the power issues and opens up more bandwidth. However only servers have the sxm sockets....and those are not cheap.
asah|7 months ago
- FHE for classic key-value stores and simple SQL database tables?
- the author's argument that FHE is experiencing accelerated Moore's law, and therefore will close 1000x gap quickly?
Thx!
deknos|7 months ago
Tryk|7 months ago
maerF0x0|7 months ago
I come from a values basis that privacy is a human right, and governments should be extremely limited in retailiatory powers against a just and democratic usage of powers against them. (things like voting, arts, media, free speech etc)
perlgeek|7 months ago
So unless Google lets me encrypt their entire search index, they can still see my query at the time it interacts with the index, or else they cannot fulfill it.
The other point is incentives: outside of some very few, high-trust high-stakes applications, I don't see why companies would go through the trouble and FHE services.
thrance|7 months ago
shikon7|7 months ago
niclas-183|7 months ago
j2kun|7 months ago
bruce511|7 months ago
What I don't think they necessarily appreciate is how expensive that would be, and consequently how few people would sign up.
I'm not even assuming that the compute cost would be higher than currently. Let's leave aside the expected multiples in compute cost - although they won't help.
Assume, for example, a privacy-first Google replacement. What does that cost? (Google revenue is a good place to start that Calc.) Even if it was say $100 a year (hint; it's not) how many users would sign up for that? Some sure, but a long long way away from a noticeable percentage.
Once we start adding zeros to that number (to cover the additional compute cost) it gets even lower.
While imperfect, things like Tor provide most of the benefit, and cost nothing. As an alternative it's an option.
I'm not saying that HE is useless. I'm saying it'll need to be paid for, and the numbers that will pay to play will be tiny.
barisozmen|7 months ago
The key question I think is how much computing speed will improve in the future. If we assume FHE will take 1000x more time, but hardware also becomes 1000x faster, then the FHE performance will be similar to today's plaintext speed.
Predicting the future is impossible, but as software improves and hardware becoming faster and cheaper every year, and as FHE provides a unique value of privacy, it's plausible that at some point it can become the default (if not 10 years, maybe in 50 years).
Today's hardware is many orders of magnitudes faster compared to 50 years ago.
There are of course other issues too. Like ciphertext size being much larger than plaintext, and requirement of encrypting whole models or indexes per client on the server side.
FHE is not practical for most things yet, but its venn diagram of feasible applications will only grow. And I believe there will be a time in the future that its venn diagram covers search engines and LLMs.
athrowaway3z|7 months ago
I've been building and promoting digital signatures for years. Its bad for people and market-dynamics to have Hacker News or Facebook be the grand arbiter of everyone's identity in a community.
Yet here we are because its just that much simpler to build and use it this way, which gets them more users and money which snowballs until alternatives dont matter.
In the same vein, the idea that FHE is a missing piece many people want is wrong. Everything is still almost all run on trust, and that works well enough that very few use cases want the complexity cost - regardless of operation overhead - to consider FHE.
bigfishrunning|7 months ago
I agree with this wholeheartedly, and yet I do get the following question a lot "What's all that nonsense at the end of your emails". Any explanation is met with eye-rolls and 1000 yard stares. Have you managed to get laypeople on-board with any kind of client-side cryptography? how?
JumpCrisscross|7 months ago
FHE + AI might be the killer combination, the latter sharing the complexity burden.
teo_zero|7 months ago
The articles is about getting an input encrypted with key k, processing it without decrypting it, and sending back an output that is encrypted with key k, too. Now it looks to me that the whole input must be encrypted with key k. But in the search example, the inputs include a query (which could be encrypted with key k) and a multi-terabyte database of pre-digested information that's Google's whole selling point, and there's no way this database could be encrypted with key k.
In other words this technique can be used when you have the complete control of all the inputs, and are renting the compute power from a remote host.
Not saying it's not interesting, but the reference to Google can be misunderstood.
ElFitz|7 months ago
That’s not the understanding I got from Apple’s CallerID example[0][1]. They don’t seem to be making an encrypted copy of their entire database for each user.
[0]: https://machinelearning.apple.com/research/homomorphic-encry...
[1]: https://machinelearning.apple.com/research/wally-search
meindnoch|7 months ago
Consider the following (very weak) encryption scheme:
m, k ∈ Z[p], E(m) = m * k mod p, D(c) = c * k⁻¹ mod p
With this, I can implement a service that receives two cyphertexts and computes their encrypted sum, without knowledge of the key k:
E(x) + E(y) = x * k + y * k mod p = (x + y) * k mod p = E(x + y)
Of course, such a service is not too interesting, but if you could devise an algebraic structure that supported sufficiently complex operations on cyphertexts (and with a stronger encryption), then by composing these operations one could implement arbitrarily complex computations.
charles_f|7 months ago
Barrin92|7 months ago
Why do people always do this thing where they think inventing a technology has somehow changed economics? I think the implications are very small. There is value in people's user data and people are very eager to barter that value against cheaper services, we can tell because people continue to vote with their wallets and feet.
You could already encrypt or offer zero retention policies on large amounts of internet businesses and every major company has competitors that do, but they exist on the margins because most people don't take that deal.
meindnoch|7 months ago
aitchnyu|7 months ago
https://news.ycombinator.com/item?id=44530927
dandraper|7 months ago
I don't think FHE is the solution to PIR but it might well form a part of it when combined with more practical approaches.
tpurves|7 months ago
However, where FHE will shine already is in specific high-value, high consequence and high confidentiality applies, but relatively low complexity computational calculations. Smart contracts, banking, potentially medical have lots of these usecases. And the curve of Moore's law + software optimizations are now starting to finally bend into the zone of practicality for some of these.
See what Zama https://www.zama.ai/ is doing, both on the hardware as well as the devtools for FHE.
redleader55|7 months ago
glitchc|7 months ago
udev4096|7 months ago
gblargg|7 months ago
> An old car needs to go up and down a hill. In the first mile–the ascent–the car can only average 15 miles per hour (mph). The car then goes 1 mile down the hill. How fast must the car go down the hill in order to average 30 mph for the entire 2 mile trip?
Past improvement is no indicator of future possibility, given that each improvement was not re-application of the same solution as before. These are algorithms, not simple physical processes shrinking.
ralferoo|7 months ago
To cover the full 2 miles at an average of 30mph, we need to complete the entire journey in 4 minutes, leaving 225 seconds for the ascent.
We know that the old car was averaging 15 miles per hour, but the speedo on an old car is likely inaccurate, and we only need to assume a 6% margin of error for the car to show 15 miles per hour and cover the mile in 225 seconds. You probably couldn't even tell the difference between 15 and 16 on the speed anyway, but let's say that we also fitted out the car with brand new tyres (so the outer circumference will be more than old worn tyres), and it's entirely possible.
So, let's say 240mph. That's the average speed of our mile freefall in 15 seconds.
perching_aix|7 months ago
dcow|7 months ago
paulrudy|7 months ago
This is fascinating. Could someone ELI5 how computation can work using encrypted data?
And does "computation" apply to ordinary internet transactions like when using a REST API, for example?
dachrillz|7 months ago
If we could find some kind of function “e” that preserves the underlying structure even when the data is encrypted you have the outline of a homomorphic system. E.g. if the following happens:
e(2,k)*e(m,k) = e(2m,k)
Here we multiplied our message with 2 even in its encrypted form. The important thing is that every computation must produce something that looks random, but once decrypted it should have preserved the actual computation that happened.
It’s been a while since I did crypto, so google might be your friend here; but there are situations when e.g RSA preserves multiplication, making it partially homomorphic.
pluto_modadic|7 months ago
other ones I imagine behave kinda like translating, stretching, or skewing a polynomial or a donut/torus, such that the point/intercepts are still solveable, still unknown to an observer, and actually represent the correct mathematical value of the operation.
just means you treat the []byte value with special rules
Qision|7 months ago
Let's assume they can train the LLMs over encrypted data, what if a large number of users inject some crappy data (like it has been seen with the Tay chatbot story). How can the companies still keep a way to clean the data?
j2kun|7 months ago
Yes but then the model becomes encrypted.
IMO ML training is not a realistic application for FHE, but things like federated training would be the way to do that privately enough.
charles_f|7 months ago
> If FHE is a possible option, people and institutions will demand it.
I don't think that privacy is a technical problem. To take the article's example, why would Google allow you to search without spying on you? Why would chatgpt discard your training data?
GPG has been around for decades. You can relatively easily add a plug-in to use it on top of gmail. Surely the protocol is not perfect, but could have been made better much more easily than it is to improve HPE, since a lot of its clunkiness can be corrected by UX. But people never cared enough that everything they write is read by Google to encrypt it. And since Google loves reading what you write, they'll never introduce something like HPE without overwhelming adoption and requirements by others.
utf_8x|7 months ago
I assume that doesn't happen? Can someone ELI5 please?
strangecasts|7 months ago
If your encryption scheme satisfies this, there are no patterns for the LLM to learn: if you only know the ciphertext but not the key, every continuation of the plaintext should be equally likely, so trying to learn the encryption scheme from examples is effectively trying to predict the next lottery numbers.
This is why FHE for ML schemes [1] don't try to make ML models work directly on encrypted data, but rather try to package ML models so they can run inside an FHE context.
[1] It's not for language models, but I like Microsoft's CryptoNets - https://www.microsoft.com/en-us/research/wp-content/uploads/... - as a more straightforward example of how FHE for ML looks in practice
mynameismon|7 months ago
4gotunameagain|7 months ago
How do you train a model when the input has no apparent correlation to the output ?
vkaku|7 months ago
The problem is that the internet is a centralized system practically even though it is decentralized and some are fighting to keep it free.
Fight for decentralization instead, it will remove the need for unnecessary security and reduce the compute cost significantly.
Retr0id|7 months ago
This is far too optimistic. Just because you can build a system that doesn't harvest data, doesn't necessarily mean it's a profitable business model. I'm sure many of us here would be willing to pay for a FHE search engine, for example, but we're a minority.
charcircuit|7 months ago
It's idealistic to think this could solve data braches because businesses knowing who their customers are is such a fundamental concept.
johnisgood|7 months ago
j2kun|7 months ago
liampulles|7 months ago
I beg your unbelievable pardon, but no? This part of the equation is not addressed in the article, but it is by far and away the biggest missing piece for there to be any hope of FHE seeing widespread adoption.
JohnFen|7 months ago
ramchip|7 months ago
sim7c00|7 months ago
unless replacement services are offered and adopted en masse (they wont be, u cant market against companies who can throw billions at breaking you), those giants wont give away their main source of revenue...
so even if technical challenges are overcome, there are more human and political challenges which will likely be even harder to crack...
adamc|7 months ago
j2kun|7 months ago
cwmma|7 months ago
IshKebab|7 months ago
j2kun|7 months ago
zkmon|7 months ago
tsimionescu|7 months ago
Since lots of functions behave in this way in relation to sums and products, you "just" need to find ones that are hard to reverse so they can be used for encryption as well.
Unfortunately this turns out to not work so simply. In reality, they needed to find different functions FHESum and FHEMultiply, that are actually much harder to compute (1000x more CPU than the equivalent "plaintext" function is a low estimate of the overhead) but that guarantee the above.
baby|7 months ago
VMG|7 months ago
read the article again
DeathArrow|7 months ago
latentsea|7 months ago
orwin|7 months ago
anyway, making the computer do the calculation is one thing, getting it to spew the correct data is another.... But still, the article (which seems great at the moment) brushes it of a bit too quickly.
Jgoauh|7 months ago
[deleted]
harvie|7 months ago
Imagine your device sending Google an encrypted query and getting back the exact results it wanted — without you having any way of knowing what that query was or what result they returned. The technique to do that is called Fully Homomorphic Encryption (FHE).
pluto_modadic|7 months ago