top | item 36636267

(no title)

nofinator | 2 years ago

> YP says it’s best for him not to run anything with sudo any more today, handing off the restoring to JN.

Then in the post-mortem about lack of backups:

> LVM snapshots are by default only taken once every 24 hours. YP happened to run one manually about 6 hours prior to the outage > Regular backups seem to also only be taken once per 24 hours, though YP has not yet been able to figure out where they are stored. According to JN these don’t appear to be working, producing files only a few bytes in size.

I have had (and inevitability will have again) bad days like poor YP. All I can count on is to maintain good habits, like making backups before undergoing production work like YP did.

discuss

order

capableweb|2 years ago

> like making backups before undergoing production work

The specific part you mention also brings up a really vital part of a backup system, testing that the backups generated actually can restored.

I've seen so many companies with untested recovery procedures where most of the time they just state something like "Of course the built-in backup mechanism work, if it didn't, it wouldn't be much of a backup, would it? Haha" while never actually tried to recover from it.

Although, to be fair, I've only seen one time out of the untested 10s where it had an actual impact and the backups actually didn't work, but the morale hit that the company ended up having made my brain really remember the fact to test your backups.