Congrats to the team! Ben was one of the most brilliant students in my undergrad class and it's awesome to see him employ his talents in such a big way.
I worked on healthcare.gov last year and it's hard to overstate the potential impact of a tool like DCOS. At one point, we had 2000+ VMs, most manually configured, with no monitoring, and completely different configs between dev, test, and prod (not intentionally). Straightforward operations like migrating half of the database servers from one VLAN to the other took months, small mistakes like changing a database password could result in hours-long outages, and simply getting the data that Mesosphere displays automatically would sometimes take weeks and other times simply be impossible.
Of course, clean devops hygiene would have eliminated much of the pain in the first place, but not every organization has the expertise to do things right. In fact, most don't, and the solution for most organizations is good tooling that automates as much of the system as possible and provides good development discipline for the rest.
Having 2000+ VMs without automation or monitoring in place is pretty unusual. Especially for a site like healthcare.gov which must have had all sorts of HIPAA and data security requirements.
But it's honestly not hard to hook all those machines up to a system like Chef/Puppet/Saltstack/Ansible and start automating common tasks within a few days.
Migrating databases between networks and rotating passwords would generally be outside of the scope of a tool like Mesosphere. Again, this is something that can be easily handled by existing automation tools. With most databases though, there's nothing straightforward about migrating data to nodes on different vlans or password rotations. If it's a one-time task, I recommend hiring a database consulting firm to do the migration or rotation.
I think that have good defaults and enforcing best-practices is a good idea. But I think that a lot of this can be achieved with existing tools. IMO, it makes more sense for organizations to automate and orchestrate Mesos deployments via existing/mature DevOps tools like Chef/Puppet/Ansible/Saltstack. Would also be exciting to see deployments working via NixOps/NixOS.
Whats nice is that people are thinking about supercomputers again, even if they insist on calling them "the cloud".
First things first, look up beowulf (http://en.wikipedia.org/wiki/Beowulf_cluster) which is a suite of tools that implents a multi machine scheduler, and message passing interface. Whats nice is that if one host is overloaded, it can migrate the process to another. (however I'm not sure what performance is like nowadays)
In the world of VFX for movies, we've been dealing with schedulers for years. Programmes like alfred, tractor, qube and deadline can dispatch tasks, and deal with dependencies at massive scale.
The first thing to note is that "DCOS" is really a discrete set of parts; a scheduler, machine state enforcement, network config, Storage, and the underlying OS.
With careful planning, the state enforcement tool (puppet and the like) can take care of all of these tasks except a global scheduler.
The beuty of the VFX scheduler is that they understand dependencies really well. (I need x to complete before I run y, I need feature z to run) A lot of newer schedulers really don't understand this concept well.
Its really important to understand that puppet and the like cannot (without heavy engineering) act as a task dispatcher. The big feature of "DCOS" is task distribution.
In linux terms its like comparing the CPU scheduler to chmod. Yes you can make a program run by chmoding a file to +x, but its the scheduler that is responsible for making sure the programme has CPU time.
After reading the title I was a bit tempted to call BS since I think Kubernetes should have the rights to "The first Datacenter OS" tagline[1]. However, after reading up on the project details at https://mesosphere.com/learn I can see how Mesosphere came to the conclusion of being a DCOS, if not necessarily the first. Mesosphere goes a bit further than Kubernetes and offers a solution to the storage problem and attempts to address other "userland" concerns by shipping Apache Spark, Cassandra, Kafka, and Hadoop. So maybe it would be more accurate to call this a datacenter distro on top of the Kubernetes kernel?
Regardless, I think the concept of a datacenter OS will be the key to commoditizing IaaS providers and leveling the playing field in terms of features and usability for those who have not given up on the dream of running a "private" cloud.
Why will the DCOS work where others have failed?
Current solutions aimed at taming the datacenter operate at the machine/VM level, which exposes the OS for each machine, and completely punts on the application. Guess who gets to stitch it all back together? A DCOS is designed to manage applications directly, commonly via application containers, which means we can treat the OS running on the underlying machines like firmware and limit our interactions to basic updates and minimal configuration -- think CoreOS.
What about PaaS?
That's a topic worthy of a lengthy discussion, but I think it boils down to the lack of control found in most PaaS platforms[2]. In order for a PaaS offering to be successful it must make opinionated decisions about how to deploy and run applications; a bit too inflexible for most people. On the other hand, a DCOS seems to hit the sweet spot between IaaS and PaaS.
So, the commoditization of on-demand and highly scalable virtual computing infrastructures, together with the rising popularity of "containerization" for app and service composition, seems to be creating an "orchestration crisis", or an "orchestration business opportunity," depending on your vantage point.
Are we about to see the emergence of what might be termed "new wave mainframe" computing?
The problem with existing orchestration tools, and tools like chef, puppet, etc. is that they're all a bit piecemeal and complicated. What we need is a step up the abstraction hierarchy, and some standardisation.
We'll probably know when we've got there if companies no longer have any real idea of how many servers or VMs from different providers that they're utilising. They'll just know which applications they're running and how much it's costing them.
There are a lot of components to an operating system. It's not just the technology components, it's the product components and the business components. E.g., Does it have an API? Does it have an SDK? Does it have a user interface? Does it have an init system, a chron, a storage system, service discovery? Does it have an ecosystem of third party developers? I posit that the OS Checklist is fairly long and that no of the other systems you mention have the complete OS package.
Mesosphere's stack is in full production at major companies, including one of the largest financial services companies and one of the largest consumer electronics companies. General availability is next year, but paying customers are using it in production today--at very large scale.
The DCOS project looks amazing. The command line interface looks like a heroku toolbelt for your very own servers. Cool server usage visualizations too.
DCOS is an interesting description, as the idea of a data centre (to my tiny mind at least) is made more fuzzy by concepts like AWS AZs.
Do people expect that the 'DC' will span AZs, regions even? Or is the separation of these things valuable in some way?
How about the idea of dev vs prod environments? Will the isolation provided be strong enough that we'll happily drop everything onto a single cluster of machines?
[+] [-] brandonb|11 years ago|reply
I worked on healthcare.gov last year and it's hard to overstate the potential impact of a tool like DCOS. At one point, we had 2000+ VMs, most manually configured, with no monitoring, and completely different configs between dev, test, and prod (not intentionally). Straightforward operations like migrating half of the database servers from one VLAN to the other took months, small mistakes like changing a database password could result in hours-long outages, and simply getting the data that Mesosphere displays automatically would sometimes take weeks and other times simply be impossible.
Of course, clean devops hygiene would have eliminated much of the pain in the first place, but not every organization has the expertise to do things right. In fact, most don't, and the solution for most organizations is good tooling that automates as much of the system as possible and provides good development discipline for the rest.
[+] [-] 23david|11 years ago|reply
But it's honestly not hard to hook all those machines up to a system like Chef/Puppet/Saltstack/Ansible and start automating common tasks within a few days.
Migrating databases between networks and rotating passwords would generally be outside of the scope of a tool like Mesosphere. Again, this is something that can be easily handled by existing automation tools. With most databases though, there's nothing straightforward about migrating data to nodes on different vlans or password rotations. If it's a one-time task, I recommend hiring a database consulting firm to do the migration or rotation.
I think that have good defaults and enforcing best-practices is a good idea. But I think that a lot of this can be achieved with existing tools. IMO, it makes more sense for organizations to automate and orchestrate Mesos deployments via existing/mature DevOps tools like Chef/Puppet/Ansible/Saltstack. Would also be exciting to see deployments working via NixOps/NixOS.
[+] [-] KaiserPro|11 years ago|reply
Whats nice is that people are thinking about supercomputers again, even if they insist on calling them "the cloud".
First things first, look up beowulf (http://en.wikipedia.org/wiki/Beowulf_cluster) which is a suite of tools that implents a multi machine scheduler, and message passing interface. Whats nice is that if one host is overloaded, it can migrate the process to another. (however I'm not sure what performance is like nowadays)
In the world of VFX for movies, we've been dealing with schedulers for years. Programmes like alfred, tractor, qube and deadline can dispatch tasks, and deal with dependencies at massive scale.
The first thing to note is that "DCOS" is really a discrete set of parts; a scheduler, machine state enforcement, network config, Storage, and the underlying OS.
With careful planning, the state enforcement tool (puppet and the like) can take care of all of these tasks except a global scheduler.
The beuty of the VFX scheduler is that they understand dependencies really well. (I need x to complete before I run y, I need feature z to run) A lot of newer schedulers really don't understand this concept well.
[+] [-] KaiserPro|11 years ago|reply
In linux terms its like comparing the CPU scheduler to chmod. Yes you can make a program run by chmoding a file to +x, but its the scheduler that is responsible for making sure the programme has CPU time.
[+] [-] kelseyhightower|11 years ago|reply
After reading the title I was a bit tempted to call BS since I think Kubernetes should have the rights to "The first Datacenter OS" tagline[1]. However, after reading up on the project details at https://mesosphere.com/learn I can see how Mesosphere came to the conclusion of being a DCOS, if not necessarily the first. Mesosphere goes a bit further than Kubernetes and offers a solution to the storage problem and attempts to address other "userland" concerns by shipping Apache Spark, Cassandra, Kafka, and Hadoop. So maybe it would be more accurate to call this a datacenter distro on top of the Kubernetes kernel?
Regardless, I think the concept of a datacenter OS will be the key to commoditizing IaaS providers and leveling the playing field in terms of features and usability for those who have not given up on the dream of running a "private" cloud.
Why will the DCOS work where others have failed?
Current solutions aimed at taming the datacenter operate at the machine/VM level, which exposes the OS for each machine, and completely punts on the application. Guess who gets to stitch it all back together? A DCOS is designed to manage applications directly, commonly via application containers, which means we can treat the OS running on the underlying machines like firmware and limit our interactions to basic updates and minimal configuration -- think CoreOS.
What about PaaS?
That's a topic worthy of a lengthy discussion, but I think it boils down to the lack of control found in most PaaS platforms[2]. In order for a PaaS offering to be successful it must make opinionated decisions about how to deploy and run applications; a bit too inflexible for most people. On the other hand, a DCOS seems to hit the sweet spot between IaaS and PaaS.
[1] I'm sure you can make an argument for Joyent's SmartDataCenter (https://www.joyent.com/private-cloud) as well. [2] Deis (http://deis.io/overview) attempts to address this issue.
[+] [-] 23david|11 years ago|reply
[+] [-] Zariel|11 years ago|reply
[+] [-] fsaintjacques|11 years ago|reply
[+] [-] michaelsbradley|11 years ago|reply
Are we about to see the emergence of what might be termed "new wave mainframe" computing?
[+] [-] randomsearch|11 years ago|reply
Spot on, and very well put.
The problem with existing orchestration tools, and tools like chef, puppet, etc. is that they're all a bit piecemeal and complicated. What we need is a step up the abstraction hierarchy, and some standardisation.
We'll probably know when we've got there if companies no longer have any real idea of how many servers or VMs from different providers that they're utilising. They'll just know which applications they're running and how much it's costing them.
[+] [-] presspot|11 years ago|reply
[+] [-] superuser2|11 years ago|reply
[+] [-] larryweya|11 years ago|reply
[+] [-] presspot|11 years ago|reply
[+] [-] hendzen|11 years ago|reply
[0] - https://github.com/apache/mesos/tree/master/3rdparty/libproc...
[+] [-] corysama|11 years ago|reply
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] presspot|11 years ago|reply
[+] [-] cookrn|11 years ago|reply
[+] [-] preillyme|11 years ago|reply
[+] [-] bc1323|11 years ago|reply
[+] [-] tomcart|11 years ago|reply
Do people expect that the 'DC' will span AZs, regions even? Or is the separation of these things valuable in some way?
How about the idea of dev vs prod environments? Will the isolation provided be strong enough that we'll happily drop everything onto a single cluster of machines?
[+] [-] dang|11 years ago|reply
[+] [-] 23david|11 years ago|reply
"Mesosphere Announces First Data Center OS And $36M In Funding" - Techcrunch
"Mesosphere’s new data center mother brain will blow your mind" - GigaOM