One caveat comes to mind about the approach this technique supports. Anonymizing your production data and distributing it to developer laptops is something you should think hard about before doing it, and approach very carefully if you do. Sometimes the sensitive information you should be protecting isn't just the users' creds and addresses. Sometimes it's the ways in which they've used your app, the content they've created, and the graph of other accounts with whom they've associated.
A typical anonymized DB dump is likely to share primary key values with production, and often an adversary knowing that user X posted something they'd rather keep private on your app will be able to simply look up their identity on your production server without privileged access.
Generating useful fake data is also hard, of course, and won't hit your edge cases like real production data, but then again, the code you're writing now won't really be exercised by the unwashed masses until you release it, so using production data is mostly a protection against regressions. If you've got sensitive content in your app, you should consider stronger test coverage in lieu of production dumps. Of course performance regressions can be hard to keep under wraps with automated testing (though you can defend against things like N+1 query problems), so YMMV.
Absolutely, the anonymisation method is an important consideration. You can probably safely drop most user-specific data, like any tracking you've done. Unless the dev is building functionality that actually depends on that data, such as data-mining, they're unlikely to need it.
I'll make another argument in favour of production dumps though. It gives developers a proper feel to how a website functions. For UI-oriented developers like myself, having just enough data to allow the website to function on a technical level, isn't enough. Your search code is going to feel very different with 20 million products vs a few dozen dummy products.
We're just about to embark on moving our uploaded files to S3 and were discussing yesterday how we should structure our dev/production environments. Thanks for the timely article!
[+] [-] jmileham|12 years ago|reply
A typical anonymized DB dump is likely to share primary key values with production, and often an adversary knowing that user X posted something they'd rather keep private on your app will be able to simply look up their identity on your production server without privileged access.
Generating useful fake data is also hard, of course, and won't hit your edge cases like real production data, but then again, the code you're writing now won't really be exercised by the unwashed masses until you release it, so using production data is mostly a protection against regressions. If you've got sensitive content in your app, you should consider stronger test coverage in lieu of production dumps. Of course performance regressions can be hard to keep under wraps with automated testing (though you can defend against things like N+1 query problems), so YMMV.
[+] [-] andrewingram|12 years ago|reply
I'll make another argument in favour of production dumps though. It gives developers a proper feel to how a website functions. For UI-oriented developers like myself, having just enough data to allow the website to function on a technical level, isn't enough. Your search code is going to feel very different with 20 million products vs a few dozen dummy products.
[+] [-] robertfw|12 years ago|reply
[+] [-] iso8859-1|12 years ago|reply