top | item 4717713

(no title)

Supreme | 13 years ago

Data should be backed up to staging nightly anyway. There should also be scripts in place to start this process at an arbitrary point in time and to import the data into the staging server. You do not need to match the hardware if you use cloud hosting since you can scale up whenever you want.

Here's where it gets really simple. Resize the staging instance to match live. Put live into maintenance mode and begin the data transfer to staging (with a lot of cloud providers, step #1 and #2 can be done in parallel). As soon as it finishes copying, take live down, point the DNS records at staging and wait for a few minutes. Staging is now live, with all of live's data. Problem solved. Total downtime: hardly anything compared to not being prepared. Total dataloss: none.

discuss

tinco|13 years ago

I fully agree that this is how it could, and perhaps should be done. But you assume they are already on cloud hosting, which they obviously aren't. Ofcourse this is also a choice that has to be made consciously. Especially since fogcreek has been around a lot longer than the big cloud providers.

You can look to Amazon to see that cloud architecture brings with it hidden complexity that also increases risk of downtime while you relinguish a lot of control on for example the latency and bandwidth between your nodes.

What I don't know by the way, is wether the total cost of ownership is larger for colocation or for cloud hosting.

dbecker|13 years ago

Why do you think they aren't doing this?

Possible explanations

1) Their engineers never thought of it

2) They considered it, and it is as simple as you think... but they don't care about uptime.

3) Implementing geographic redundancy is harder than you think given whatever other constraints or environment they face.

4) Some other explanation

#3 seems like the most likely explanation to me.