Persisting state between AWS EC2 spot instances

manigandham|8 years ago

Persistent storage remains a complicated problem. Attaching volumes on the fly with docker volume abstraction works well enough for most cloud workloads, whether on-demand or spot, but it's still easy to run into problems.

This is leading to rapid progress in clustered/distributed filesystems and it's even built into the Linux kernel now with OrangeFS [1]. There are also commercial companies like Avere [2] who make filers that run on object storage with sophisticated caching to provide a fast networked but durable filesystem.

Kubernetes is also changing the game with container-native storage. This seems to be the most promising model for the future as K8S can take care of orchestrating all the complexities of replicas and stateful containers while storage is just another container-based service using whatever volumes are available to the nodes underneath. Portworx [3] is the great commercial option today with Rook and OpenEBS [4] catching up quickly.

1. http://www.orangefs.org

2. http://www.averesystems.com/products/products-overview

3. https://portworx.com

4. https://github.com/openebs/openebs

manigandham|8 years ago

Also want to highlight that AWS will now allow spot instances to just be stopped instead of terminated, so only compute power is removed but data is persisted automatically as long as you use EBS root/attached volumes.

https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec...

objectivefs|8 years ago

Using a clustered/distributed filesystem definitively simplifies persisting the state between EC2 spot instances. It also makes it easier to scale out the work load when you need more instances accessing the same data. To add to your list: there is also ObjectiveFS[1] that integrates well with AWS (uses S3 for storage, works with IAM roles, etc) and EC2 spot instances.

[1]. https://objectivefs.com

solatic|8 years ago

OP is offering some very dangerous advice.

Twenty years ago, software was hosted on fragile single-node servers with fragile, physical hard disks. Programmers would read and write files directly from and to the disk, and learn the hard way that this left their systems susceptible to corruption in case things crashed in the middle of a write. So behold! People began to use relational databases which offered ACID guarantees and were designed from the ground up to solve that problem.

Now we have a resource (spot instances) whose unreliability is a featured design constraint and OP's advice is to just mount the block storage over the network and everything will be fine?

Here's hoping OP is taking frequent snapshots of their volumes because it sure sounds like data corruption is practically a statistical guarantee if you take OP's advice without considering exactly how state is being saved on that EBS volume.

colechristensen|8 years ago

Your response is fairly ridiculous.

A spot instance interruption isn't a system crash, it's a shutdown signal. Storing your important spot instance data on EBS is recommended by AWS. If your application can't handle a normal system shutdown without losing data, your application is at fault, not your system setup.

>exactly how state is being saved on that EBS volume

Files are written to a filesystem which is cleanly unmounted at shutdown when interruption happens.

otterley|8 years ago

Spot instances are shut down cleanly via the usual stop semantics (which includes all the shutdown handlers provided your OS supports them). Assuming your database software supports clean shutdowns via SIGTERM, everything should be fine.

jen20|8 years ago

This pattern is a lot safer if you use ZFS. Spot instances don't just disappear though, you get notification and have a chance to perform shutdown actions, except in the case of hardware failure - which is the same with non-spot instances.

bdcravens|8 years ago

Spot instances can now "stop" instead of "terminate" when you get priced out, persisting the attached EBS volumes:

https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec...

fredsted|8 years ago

This should really be at the top!

otterley|8 years ago

Even if you don't use spot instances, the technique of using separate EBS volumes to hold state is useful (and well-known). Ordinary on-demand instances can also be terminated prematurely due to hardware failure or other issues, so storing state on a non-root volume should be considered a best current practice for any instance type.

fulafel|8 years ago

There's a mechanism exactly for this purpouse in Linux: pivot_root. It's used in the standard boot process to switch from the initrd (initial ramdisk) environment to the real system root.

ec2-spotter classic uses this, but you can also make a pivoting AMI of your favourite Linux distribution.

One thing to watch out for is how to keep the OS automatic kernel updates working. AMIs are rarely updated and you're going to have a "damn vulnerable linux" if you don't get the updates just after booting a new image.

js4all|8 years ago

When you are using Kubernetes, you won't have to deal with this yourself. The Cluster will move pods from nodes that are stopped because the spot price is exceeded. Ideally place nodes at different bids. So there will be a performance hit but no outage. With the new AWS start/stop feature [1] nodes will come up again when the spot price sinks.

1) https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec...

yjftsjthsd-h|8 years ago

TLDR: Attach EBS volume and use that to store Docker containers.

I suppose it's a decent solution if you don't want to deal with prefixes.

Pirate-of-SV|8 years ago

To make this even more streamlined you'd tag the volumes and discover the volumes with `aws ec2 describe-volumes` and filter unattached volumes with the magic tag.

sevagh|8 years ago

There's a handful of tag-based automatic EBS volume attachers out there:

* https://github.com/sevagh/goat (my own) * https://github.com/UKHomeOffice/smilodon

stonewhite|8 years ago

We normally utilize spots with Spotinst + Elasticbeanstalk. Our billing looked great ever since.

This solution looks good, yet only applies to single instance scenarios. I presume this kind of thinking might move forward with EFS + chroot for an actual scalable solution that cannot be ran on Elasticbeanstalk.

archgoon|8 years ago

So I was pleasantly surprised to discover that for the last several years, spot instances have provided a mechanism that give you 2 minutes notice prior to shutdown:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-inte...

Learn something new everyday. :)

https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termi...

bdcravens|8 years ago

See my top-level comment - you can now set "shutdown" behavior to stop instead of terminate (though 2-minute notice still useful)

sciurus|8 years ago

The author goes to great lengths to come up with a way for the software that was running on a terminated spot instance to be relaunched using the same root filesystem on a new spot instance, but they never explain why they need to do exactly this. Maybe they already ran everything in Docker containers on CoreOS, so their solution isn't a big shift, but I strongly suspect they could find a simpler way to save and restore state if they got over this obsession with preserving the root filesystem their software sees.

olegkikin|8 years ago

If you don't care about reliability, why not just get a cheap and powerful VPS? Paying $90/month for that machine is madness. I pay $6/month for 6GB RAM, 4 cores, 50GB disk.

dagw|8 years ago

If you don't care about reliability, why not just get a cheap and powerful VPS?

Personally, because my needs aren't constant. I might need two cores for two months followed by 100 cores for a week.

yjftsjthsd-h|8 years ago

Perhaps integration with other AWS services?

deivid|8 years ago

Where? I'm using Digital Ocean and it'd be way more expensive for that kind of configuration.

blibble|8 years ago

where are you getting that for $6?

ramanan|8 years ago

Well, one easy way when using Ubuntu-like distributions is to simply place your `/home` folder on a separate (persistent) EBS volume [1].

With a few on-boot scripts to attach-volumes / start-containers, it should be fairly easy to get going as well.

[1] https://engineering.semantics3.com/the-instance-is-dead-long...

TrickyRick|8 years ago

This was exactly what I was thinking, why complicate things by replacing the root volume when one can simply mount the disk to any other directory and point the application there?

likelynew|8 years ago

I don't know why all the comments are saying this is bad idea. For me, one of thing for I use EC2 is deep learning. I just use spot GPU instance, attach overlayroot volume and launch jupyter notebook in it. Other things like google dataflow is not useful to me due to the price and the process of installing packages. I can also think of many other use cases for using some persistence volume for some manual task.

amq|8 years ago

Wouldn't it be simpler to have the smallest possible instance run an NFS server? This would also have an additional bonus of scalability.

Edit: or use AWS EFS

manigandham|8 years ago

NFS is nice but a single instance can easily become network bound, especially on AWS. It also introduces a single point of failure for that instance, and clustered NFS can be fragile.

otterley|8 years ago

EFS is far more expensive than EBS. Price it out; you'll see.

atmosx|8 years ago

EFS is also slower than EBS, for I/O intensive workloads is not recommended.

A positive thing with EFS is that it can be shared across AZ while EBS needs to be snapshotted and then imported to the other AZ.

raverbashing|8 years ago

Is it just me or to me spot instances should deal with work and not storage, and hence your (stateful) units of work should be in a Queue/DB? (in a non-spot instance)

Attaching and detaching volumes is a good idea but I wouldn't use that to keep state

tuananh|8 years ago

we use k8s at work. i just have to create PVC and when spot instance terminated along with the container; new container will be created and mount the PVC again automatically.

jdchernofsky|8 years ago

Or you could just use Spotinst: https://spotinst.com/

unknown|8 years ago

[deleted]

alex_duf|8 years ago

It sounds wrong to try to keep the state across two ec2 instances. If you find yourself in that situation, try pushing your state outside the ec2 instance a bit harder. (dynamodb, s3 etc...)

You will get a lot of benefit out of it, but may lose in performance, which is fine in 99% of the cases.

77 comments