top | item 35262043

(no title)

guites | 2 years ago

Hey! Glad to see flower getting attention on hn.

I've been working on a project for over a year that uses flower to train cv models on medical data.

One aspect that we see being brought up again and again is how we can prove to our clients that no unnecessary data is being shared over the network.

Do you have any tips on solving that particular problem? I.e. proving that no data apart from model weights are being transferred to the centralized server?

Thanks a lot for the project.

edit: Just to clarify I am aware of differential privacy, I'm talking more on a "how to convince a medical institution that we are not sending its images over the network" level.

discuss

order

cpmpcpmp|2 years ago

If you're concerned about data leakage, it's worth noting that model weights can very easily be used to reconstruct the original data that it was trained on: so it could be misleading to claim that user data isn't being shared over the network. To avoid this, you'd need to look into techniques like Secure Aggregation or local differential privacy. Flower does provide some of this, FWIW.

onethought|2 years ago

This doesn’t sound right, if they don’t know the structure of the NN how can the reconstruct from the weights alone? (Perhaps the structure is communicated within the weights?)

tanto|2 years ago

Hi guites, Thank you! That is undoubtedly something relatable. We have it on the screen and plan to provide helpful material and presentations helping to convince stakeholders. If you are up for a call to share the specific challenges, we could ideate with you.

guites|2 years ago

Would love to! You can grab my email on my profile. Could you ping me over there? Thanks

danieljanes|2 years ago

Thanks, glad you like it!

One approach to increase the transparency on the client side (and build trust with the organization where the Flower clien is deployed) is to integrate a review step that asks the someone to confirm the update that gets send back to the server.

On top of that, you should definitely use differential privacy. To quote Andrew Trask here: "friends don't let friends use FL without DP". Other approaches like Secure Aggregation can also help, depending on what kind of exposure your clients are concerned about.

My general take is that the best way to solve for transparency and trust is to tackle it on multiple layers of the stack.

guites|2 years ago

A review steps sounds like a good idea. Our implementation involves very little interaction on the client side, besides setting up the datasets etc, so maybe a way to log information sent for later inspection would help.

I'll be looking into secure aggregation as I'm not fully aware of how it works. As of now we rely on differential privacy only.

Thanks!

jorgeili|2 years ago

What about MPC + DP? Are you planning to integrate any SMPC algorithms on flower or do you find any limitations for not doing so.

I'm trying to apply federated learning to the medical domain too and I'm trying to define the best "stack" that guarantees privacy and compliance with regulations like the GDPR