I won't post to say "I haven't looked at the contents of the
file, but it's named 'cat.jpg'" either. I won't even post to
announce that the one hundred millionth file has been stored.
[...] This is because I have no way to obtain that information.
The contents of files [...] is all hidden from me by Tarsnap's
strong client-side encryption.
I agree that cat.jpg was a privacy violation and I do believe they did more than simply look at the filename. However, I'll take the unpopular position that this is within the limits of what one can reasonably expect from a website providing a service as 37signals does. Admins will, and are completely expected to, look at the data - if only to make sure everything is working. Looking at a file called cat.jpg because it's the gazillionth file is pushing the boundary a bit, but I still think this is OK. The moment I opt to use a hosted project management software, I implicitly accept that things like this (and potentially much worse) might happen.
Forgive me cperciva, but to me your post looks just like a giant plug for your own service. Client-side encryption is not warranted for everything, nor is it a reasonable goal for every app that shares data on the web. It's fine that Tarsnap does this, and frankly I would expect the same from a service like, say, DropBox - but it's not a reasonable expectation when it comes to the type of apps 37signals provides.
I do believe they did more than simply look at the filename.
We'll have to disagree there. I'd be very surprised if they did any more than looking at their log files -- most likely using tail -f -- as the 100 million mark approached.
Admins will, and are completely expected to, look at the data - if only to make sure everything is working.
How does looking at individual files help to confirm that things are working? Once you're operating at scale, looking at individual files doesn't tell you anything useful; if there's a big problem users will notice it before you do, and if there's a small problem the files you look at probably won't be in the affected set.
Forgive me cperciva, but to me your post looks just like a giant plug for your own service.
Was I plugging Tarsnap? Sure; I mention it every chance I get on my blog. But I didn't write that post because I wanted to plug Tarsnap; I wrote it because I saw the trust-is-fragile post on HN Daily and felt that revising their privacy policy wasn't the right response. (If I had noticed that post when it was first discussed here, that blog post would probably have been just a comment -- but since I was about 24 hours late to the party I figured that nobody would read a comment I made here.)
I totally disagree that this came off sounding like a plug for Tarsnap. This is one of those perspective-changing point which can only be made by using concrete examples.
That he happens to be an expert in the field of digital privacy and has a way to prove that he is such an expert shouldn't be held against him.
Reading his post was like an "oh shit, he's right" moment for me and using Tarsnap as an example was key in helping me understand it.
Forgive me cperciva, but to me your post looks just like a giant plug for your own service.
Considering that in the discussion around cat.jpg, many people here were talking about a secure back-up service which encrypts all data at the client side with auditable source-code as if it was an unrealistic, unobtainable goal, I have zero problems with that.
I just wonder why they didn't say it was made up -a joke with no basis in reality. Just some poetic license.
Would that be worse than admitting/pretending they actually saw a file called cat.jpg? If there was such a file, it could have been a JPG for catalog of some kind, etc.
I think they are responding to people's first expectations and that there was actually a file with the image of a cat. I doubt it and think it was just an attempt at being funny which backfired and they felt they had to take responsibility for the perceived breach of trust and that any other explanation, even if truthful would have been seen as a weak excuse.
There are considerable technical hurdles in writing a web application that doesn't store unencrypted data. However, in principle, I don't see any compelling reason an admin should have access to user data like uploaded files. Ensuring an encrypted file is backed up and available for use is no different to doing the same for an unencrypted file.
Unless you are using a service like tarsnap, your admins can and will peek at your data. If you use a service like tarsnap, and you lose your password, your data is deader than disco. Pick one - security, or an admin who can save your account.
And while it's theoretically possible to develop a rich web app without seeing user data, it just doesn't happen. You need realistic data to do testing. The most realistic data you can possibly get is your user's data. Guess what 99.999% of websites use for testing?
If you have sensitive information, use good encryption. Better still do what the professionals (i.e. the government) do, and leave it on an internal-network only computer, in a steel reinforced room. If you're paranoid, lock the hard drives in a safe when you leave the room. And use encryption.
But don't make a fuss when the admin peeks at your data, in a semi-random way. If they are stalking you specifically, or leak any damaging information, that's another matter. But if you just don't trust them, don't give them your data.
> Pick one - security, or an admin who can save your account.
There's a simple way to eat your cake and have it too, though: put a copy of your passwords in a safe-deposit box. Passwords don't strictly have to be private to protect you from would-be attackers—they just have to only be accessible to people who have absolutely no incentive to help any would-be attacker.
I agree completely. I run a service where thousands of files are uploaded a day, containing a persons location information (GPX/TCX logs from GPS devices). I have to use that data on a regular basis to further improve our ability to process these log files, which are generated by hundreds of separate pieces of software. The ability of my service to process these files requires my intervention semi-regularly. That wouldn't happen if, like some people are suggesting, I had to go to a safe deposit box to decrypt those files.
Humans can't be — and aren't — trusted to follow their stated intentions.
This is why you implement systems that prevent humans from doing wrong (either intentionally or unintentionally).
A commenter named Trevor even pointed this out to 37signals in their blog post as to how:
Did you know that Oracle provides Database Vault.
What it all allows you to do is set it up to prevent
event DBAs from viewing or modifying data.
Idea being, DBAs should be able to “administator” the
database, but should not be allow to either VIEW or even
MODIFY customer/employee data (e.g. credit card #, SSN ,
salary data, etc..)
There is another product Oracle provides which is called
Transparent Database Encryption . What it does is encrypt
your customer data on disk, but then when a database
select is issued – it unencrypts the data on the fly
without needing to modify your application code.
Unfortunately, no such products like this exists for MySQL.
Given the size of your company now and how much
sensitive customer data you are now storing, might be
worthwhile for you guys to seriously consider using
Oracle now.
This seems in slightly bad taste, as it feels like a slightly disingenuous jab at 37signals so you can plug your own service. 37signals is serving a completely different market, one that isn't going to peruse their source code. So that market is going to have to trust them to some extent.
Additionally, every service requires some level of trust. How am I to know that the source code you show me is what you're actually using? (obviously client-side encryption services are better in this area). How do I know you won't sell my personal information, or abuse my billing information?
This seems in slightly bad taste, as it feels like a slightly disingenuous jab at 37signals so you can plug your own service.
I plead guilty to taking advantage of the opportunity to mention my service (although most of my readers are already very much aware of tarsnap), but I would have written the blog post anyway.
that market is going to have to trust them to some extent.
Sure, but I still think there's a huge gap between "we don't log sensitive information" and "we have a policy which says that we shouldn't look at the data we've logged".
This response is easily within bounds. Tarsnap is opinionated software. So is 37signals' offerings. Let them have a debate. It is for the better education of us all.
Tarsnap's position here is assailable, and we will all benefit from the discussion.
as it feels like a slightly disingenuous jab at 37signals so you can plug your own service.
How do you feel Colin's point is in any way disingenuous? Do you think he doesn't believe what he says? Because that's the only way I could see it as being "disingenuous."
Personally, I don't think it's disingenuous to opportunistically state what you believe to benefit yourself, assuming you truly do believe it.
This seems to be the MO for this particular site. There is a blog post calling out someone's faults (privacy, security, etc), some basic misinformation and a plug for his service as being better.
I don't mind him wanted to do PR, but it does seem a bit distasteful. This was basically an ad couched in something that was supposed to look like content.
As one of the previous posters said, there are tradeoffs made when using a SaaS service and it is not possible to run a system like theirs while using strong client side, opaque encryption. Besides, comparing a backup system to a online file management system is apples to oranges.
When 37 signals start encrypting all their data their search tool is really gonna suck.
A backup service that just needs to move around opaque blobs can and should encrypt its data, an application that needs to be able to react to the type and contents of the data that is stored, not so much, it seems like cperciva would know this more than anyone, so the post seems pretty disingenuous
I don't disagree with this at all. But having the perspective that "The answer isn't to prove that they can be trusted; the answer is to ensure that their customers don't need to trust them" is worth keeping in the back of your mind...because I'm sure there are cases when that approach can be taken without breaking features.
Regardless, even if the load was higher like it use to be before current modern hardware, you are still essentially informing your customers that "speed is more important than securing their data" - which is a terrible approach to take.
TL;DR: If you are given the privilege to maintain a customer data, it's your obligation and responsibility to do so with the most care possible.
This brings up an interesting point about the benefit of client-side encryption. That's fine if you have a locally running app, but how do you do it with a web app? With some kind of browser plugin, perhaps? Does something like that exist today?
It's possible to do in principle at least, assuming all your users have modern browsers. You could use the Javascript file API to intercept file uploads and then to encrypt the data before it is sent to the server. You could then use XHRs to collect the encrypted binary data and decrypt it before presenting it to the user. If it was an image, you could use canvas to display the decrypted content.
You'd have to contend with what is probably a large performance hit, and I don't know of any libraries that do this so you'd need to spend a considerable amount of time writing one. I suspect that this approach would only be practical for very simple web applications. For instance, an encrypted image or file hosting web application might be a possibility.
I'm not convinced that such a strict approach to securing client data is always the best policy. The clients of 37signals are not the same as the clients of tarsnap. I would think that a client of 37signals is the sort that sometimes needs the help of an admin and that often that help would require looking at the client data.
My own company will never store sensitive data with an outside firm like 37signals but that is only because we have a great IT staff. For companies that don't have an IT staff, outsourcing to 37signals makes sense and is probably worth the tradeoff to trust them with data.
I think that the key point in this issue is "trust".
Just as you trust the bank to guard your money, and many of their employees have access to your current account balance, the convenience of using these kind of services need you to trust the organization.
There is a significant difference though: If the bank takes money from your account, you notice it. If the file storage provider makes a copy of your file, you don't notice it. You'll never know how the file leaked.
(sure, the bank could perform other tricks behind your back, like doing bad investments with the money you put in, but hey they'll get bailed out anyway...)
The key idea is the very general principle that you increase security by reducing the scope of resources that you must trust.
cperciva is giving 2 examples: (1) use a service provider that doesn't require your trust. (2) limit the exposure of customer sensitive information to your employees that you must trust to keep it private.
I agree this is a better strategy than simply updating a privacy policy, as far as actual security is concerned.
It's worth noting that banks and similar organizations put safeguards, controls and extensive auditing on the data that limits the data tourism that any employee can engage in. You trust the organization because the organization knows that humans are fallible and essentially doesn't trust its own workers.
It's hard to design systems that don't keep sensitive data in readable formats. Protecting filenames could be done by encrypting with a salted hash of the user password. Doing this correctly while allowing password changes is really tricky. Can you recommend a good set of guidelines for getting it right?
Further down in the comments they try to save themselves by saying "Of course we didn't look at it, it's just that the pic was called cat.jpg".
So, no, I don't believe them at all when they say they did never look at the file. They say--with confidence--it was a picture of a cat. Sorry but going from "cat.jpg" to such a conclusion is IMO quite a leap. It's just three letters, it could be a CAT scan, a screenshot of the Linux `cat` command, three DNA nucleotides, a picture of a tiger, something else named "cat" or something related to but not involving cats.
I don't know if I see a filename like that I'd say "It was called 'cat.jpg', so probably a picture of someone's cat." because it can be anything that somebody named "cat.jpg" for any number of reasons and I won't know for sure until I looked at it.
And even then, just them looking at the filenames is not right. Of course I understand that if it was `company-passwords.xls` or something more sensitive, they wouldn't have said anything. But already before they could judge whether the filename was sensitive or not, there really is no reason for why they needed to be looking at filenames in the first place!
Sure some admin can always go in as root and look at everything, but you don't need to tie the proverbial cat to the bacon, by putting the filenames right up someone's face who really has no business looking at them since they're just collecting statistics.
If you are looking for a service like tarsnap (client side encrypted file storage service) you should check out Wuala (not affiliated, just a personal recommendation). It has a nice (cross platform, Java) GUI client and all the features dropbox has plus encryption. It is operated by a Swiss company and therefore is subject to strong privacy laws.
You can also access it through your browser (Java applet) and they have native iOS and Android apps.
Why should I care that the company is Swiss? The point of client-side encryption is that I don't need to worry about who's hosting my data or what jurisdiction they're under.
If rules are made to be broken then by admission I don't think there will ever be a rule that states you should throw out the steering wheel if you're playing chicken.
Even though I completely agree that systems we build should have the least possible level of permissions required to do their job.
But the temptation to leave a backdoor open to peek once in a while, "just in case," is tempting and has it's own benefits...
This whole thing seems to be blown completely out of proportion, based entirely on hypothetical and unfair "what-if" scenarios based on the imaginary case where the 100 millionth file was something sensitive.
I'm imagining a group of friends and one of them mentions an interesting book he saw in X's house. The friends are immediately scandalized: what if instead of a book, you saw naked pictures of X's wife? Apparently you'll just blab anything you see, so you can't be trusted in people's houses anymore.
It's a completely innocent disclosure. That it would not have been innocent if the file had been different seems completely irrelevant. Either they would have been discreet in that case, or they would not have, but we can't tell which from this one instance.
This whole SOPA/PIPA thing seems to be blown completely out of proportion, based entirely on hypothetical and unfair "what-if" scenarios based on the imaginary case where the laws are used in ways that they weren't intended.
Before posting this comment, I went and checked the Tarsnap site, including the Security section, the design section and the FAQ and didn't find an answer to this question. My memory, from a past reading of your site, was that you kept keys on your side of the service, so that you could turn them over to Law Enforcement if they showed up. Is this still the case? (because if it is, then you can look at cat.jpg, even if you wouldn't post publicly about it.)
I believe that this has never been the case with Tarsnap -- the keys are stored only locally. (cperciva can delete your encrypted data but not decrypt it.)
37Signals could've completely avoided the controversy by just saying "we contacted the user who uploaded the 100 millionth file to tell them that the file they uploaded at 12:34 was the 100 millionth file and they wrote back and said it was a picture of their cat".
[+] [-] Udo|14 years ago|reply
Forgive me cperciva, but to me your post looks just like a giant plug for your own service. Client-side encryption is not warranted for everything, nor is it a reasonable goal for every app that shares data on the web. It's fine that Tarsnap does this, and frankly I would expect the same from a service like, say, DropBox - but it's not a reasonable expectation when it comes to the type of apps 37signals provides.
[+] [-] cperciva|14 years ago|reply
We'll have to disagree there. I'd be very surprised if they did any more than looking at their log files -- most likely using tail -f -- as the 100 million mark approached.
Admins will, and are completely expected to, look at the data - if only to make sure everything is working.
How does looking at individual files help to confirm that things are working? Once you're operating at scale, looking at individual files doesn't tell you anything useful; if there's a big problem users will notice it before you do, and if there's a small problem the files you look at probably won't be in the affected set.
Forgive me cperciva, but to me your post looks just like a giant plug for your own service.
Was I plugging Tarsnap? Sure; I mention it every chance I get on my blog. But I didn't write that post because I wanted to plug Tarsnap; I wrote it because I saw the trust-is-fragile post on HN Daily and felt that revising their privacy policy wasn't the right response. (If I had noticed that post when it was first discussed here, that blog post would probably have been just a comment -- but since I was about 24 hours late to the party I figured that nobody would read a comment I made here.)
[+] [-] latch|14 years ago|reply
That he happens to be an expert in the field of digital privacy and has a way to prove that he is such an expert shouldn't be held against him.
Reading his post was like an "oh shit, he's right" moment for me and using Tarsnap as an example was key in helping me understand it.
[+] [-] scott_s|14 years ago|reply
Considering that in the discussion around cat.jpg, many people here were talking about a secure back-up service which encrypts all data at the client side with auditable source-code as if it was an unrealistic, unobtainable goal, I have zero problems with that.
[+] [-] mc32|14 years ago|reply
Would that be worse than admitting/pretending they actually saw a file called cat.jpg? If there was such a file, it could have been a JPG for catalog of some kind, etc.
I think they are responding to people's first expectations and that there was actually a file with the image of a cat. I doubt it and think it was just an attempt at being funny which backfired and they felt they had to take responsibility for the perceived breach of trust and that any other explanation, even if truthful would have been seen as a weak excuse.
[+] [-] weavejester|14 years ago|reply
[+] [-] Dylan16807|14 years ago|reply
[+] [-] nbashaw|14 years ago|reply
[+] [-] wisty|14 years ago|reply
Unless you are using a service like tarsnap, your admins can and will peek at your data. If you use a service like tarsnap, and you lose your password, your data is deader than disco. Pick one - security, or an admin who can save your account.
And while it's theoretically possible to develop a rich web app without seeing user data, it just doesn't happen. You need realistic data to do testing. The most realistic data you can possibly get is your user's data. Guess what 99.999% of websites use for testing?
If you have sensitive information, use good encryption. Better still do what the professionals (i.e. the government) do, and leave it on an internal-network only computer, in a steel reinforced room. If you're paranoid, lock the hard drives in a safe when you leave the room. And use encryption.
But don't make a fuss when the admin peeks at your data, in a semi-random way. If they are stalking you specifically, or leak any damaging information, that's another matter. But if you just don't trust them, don't give them your data.
[+] [-] derefr|14 years ago|reply
There's a simple way to eat your cake and have it too, though: put a copy of your passwords in a safe-deposit box. Passwords don't strictly have to be private to protect you from would-be attackers—they just have to only be accessible to people who have absolutely no incentive to help any would-be attacker.
[+] [-] cullenking|14 years ago|reply
[+] [-] pnathan|14 years ago|reply
[+] [-] alberth|14 years ago|reply
A commenter named Trevor even pointed this out to 37signals in their blog post as to how:
[+] [-] cperciva|14 years ago|reply
[+] [-] ryanwaggoner|14 years ago|reply
Additionally, every service requires some level of trust. How am I to know that the source code you show me is what you're actually using? (obviously client-side encryption services are better in this area). How do I know you won't sell my personal information, or abuse my billing information?
[+] [-] cperciva|14 years ago|reply
I plead guilty to taking advantage of the opportunity to mention my service (although most of my readers are already very much aware of tarsnap), but I would have written the blog post anyway.
that market is going to have to trust them to some extent.
Sure, but I still think there's a huge gap between "we don't log sensitive information" and "we have a policy which says that we shouldn't look at the data we've logged".
[+] [-] sunir|14 years ago|reply
Tarsnap's position here is assailable, and we will all benefit from the discussion.
[+] [-] scott_s|14 years ago|reply
How do you feel Colin's point is in any way disingenuous? Do you think he doesn't believe what he says? Because that's the only way I could see it as being "disingenuous."
Personally, I don't think it's disingenuous to opportunistically state what you believe to benefit yourself, assuming you truly do believe it.
[+] [-] nbashaw|14 years ago|reply
[+] [-] jarito|14 years ago|reply
I don't mind him wanted to do PR, but it does seem a bit distasteful. This was basically an ad couched in something that was supposed to look like content.
As one of the previous posters said, there are tradeoffs made when using a SaaS service and it is not possible to run a system like theirs while using strong client side, opaque encryption. Besides, comparing a backup system to a online file management system is apples to oranges.
[+] [-] daleharvey|14 years ago|reply
A backup service that just needs to move around opaque blobs can and should encrypt its data, an application that needs to be able to react to the type and contents of the data that is stored, not so much, it seems like cperciva would know this more than anyone, so the post seems pretty disingenuous
[+] [-] latch|14 years ago|reply
[+] [-] alberth|14 years ago|reply
Encryption these days only adds 1-2% extra load.
Regardless, even if the load was higher like it use to be before current modern hardware, you are still essentially informing your customers that "speed is more important than securing their data" - which is a terrible approach to take.
TL;DR: If you are given the privilege to maintain a customer data, it's your obligation and responsibility to do so with the most care possible.
[+] [-] ragesh|14 years ago|reply
[+] [-] weavejester|14 years ago|reply
You'd have to contend with what is probably a large performance hit, and I don't know of any libraries that do this so you'd need to spend a considerable amount of time writing one. I suspect that this approach would only be practical for very simple web applications. For instance, an encrypted image or file hosting web application might be a possibility.
[+] [-] mcculley|14 years ago|reply
My own company will never store sensitive data with an outside firm like 37signals but that is only because we have a great IT staff. For companies that don't have an IT staff, outsourcing to 37signals makes sense and is probably worth the tradeoff to trust them with data.
[+] [-] janus|14 years ago|reply
Just as you trust the bank to guard your money, and many of their employees have access to your current account balance, the convenience of using these kind of services need you to trust the organization.
[+] [-] wladimir|14 years ago|reply
(sure, the bank could perform other tricks behind your back, like doing bad investments with the money you put in, but hey they'll get bailed out anyway...)
[+] [-] talaketu|14 years ago|reply
cperciva is giving 2 examples: (1) use a service provider that doesn't require your trust. (2) limit the exposure of customer sensitive information to your employees that you must trust to keep it private.
I agree this is a better strategy than simply updating a privacy policy, as far as actual security is concerned.
[+] [-] huggyface|14 years ago|reply
[+] [-] tlb|14 years ago|reply
[+] [-] bch|14 years ago|reply
[+] [-] mhartl|14 years ago|reply
[+] [-] tripzilch|14 years ago|reply
So, no, I don't believe them at all when they say they did never look at the file. They say--with confidence--it was a picture of a cat. Sorry but going from "cat.jpg" to such a conclusion is IMO quite a leap. It's just three letters, it could be a CAT scan, a screenshot of the Linux `cat` command, three DNA nucleotides, a picture of a tiger, something else named "cat" or something related to but not involving cats.
I don't know if I see a filename like that I'd say "It was called 'cat.jpg', so probably a picture of someone's cat." because it can be anything that somebody named "cat.jpg" for any number of reasons and I won't know for sure until I looked at it.
And even then, just them looking at the filenames is not right. Of course I understand that if it was `company-passwords.xls` or something more sensitive, they wouldn't have said anything. But already before they could judge whether the filename was sensitive or not, there really is no reason for why they needed to be looking at filenames in the first place!
Sure some admin can always go in as root and look at everything, but you don't need to tie the proverbial cat to the bacon, by putting the filenames right up someone's face who really has no business looking at them since they're just collecting statistics.
⊗
[+] [-] unknown|14 years ago|reply
[deleted]
[+] [-] codesuela|14 years ago|reply
[+] [-] teaspoon|14 years ago|reply
[+] [-] dchest|14 years ago|reply
[+] [-] agentultra|14 years ago|reply
Even though I completely agree that systems we build should have the least possible level of permissions required to do their job.
But the temptation to leave a backdoor open to peek once in a while, "just in case," is tempting and has it's own benefits...
[+] [-] mikeash|14 years ago|reply
I'm imagining a group of friends and one of them mentions an interesting book he saw in X's house. The friends are immediately scandalized: what if instead of a book, you saw naked pictures of X's wife? Apparently you'll just blab anything you see, so you can't be trusted in people's houses anymore.
It's a completely innocent disclosure. That it would not have been innocent if the file had been different seems completely irrelevant. Either they would have been discreet in that case, or they would not have, but we can't tell which from this one instance.
[+] [-] pyre|14 years ago|reply
[+] [-] nazar|14 years ago|reply
[+] [-] unknown|14 years ago|reply
[deleted]
[+] [-] nirvana|14 years ago|reply
[+] [-] spicyj|14 years ago|reply
[+] [-] evan_|14 years ago|reply
[+] [-] dvdhsu|14 years ago|reply