I am very much a fan of hot-takes, but this one is trash --
> The money was wasted on hype. The same will eventually be said of Docker. I’ve yet to hear a single benefit attributed to Docker that isn’t also true of other VMs, but standard VMs allow the use of standard operating systems that solved all the hard problems decades ago, whereas Docker is struggling to solve those problems today.
Linux containerization (using the word "docker" for everything isn't right either) is an isolation + sandboxing mechanism, NOT a virtual machine. Even if you talk about things like LXC (orchestrated by LXD), that's basically just the addition of the user namespacing feature. A docker container is not a VM, it is a regular process, isolated with the use of cgroups and namespaces, possibly protected (like any other process) with selinux/apparmor/etc.
Containerization is almost objectively a better way of running applications -- there's only one question, do you want your process to be isolated, or not. All the other stuff (using Dockerfiles, pulling images, the ease of running languages that require their own interpreters since you package the filesystem) is on top of this basic value propostion.
An easy way to tell that someone doesn't know what they're talking about when speaking about containerization is if they call it a VM (and don't qualify/note that they're being fast and loose with terminology).
All this said -- I do think Docker will die, and it should die because Docker is no longer the only game in town for reasonably managing (see: podman,crictl) and running containers (see: containerd/cri-o, libcontainer which turned into runc) .
[EDIT] - I want to point out that I do not mean the Docker the company or Docker the project will "die" -- they have done amazing things for the community and development as a whole that will literally go down in history as a paradigm shift. What I should have written was that "docker <x>" where x is "image", "container", "registry", etc should be replaced by "container <x>".
I'm not going to support the general thesis of this article, but I want to address something you said.
You're right that containers are not VMs, but that's only really relevant as pedantry of technical details.
I think that what the author was trying to say (without really understanding it) was a comparison of containers to VMs as units of software deployment.
I don't think anyone is credibly using containers as a security measure on Linux, because if they think they are, they are in for several large surprises.
Rather, we're seeing the unbundling of software - it used to be that you deployed software to a physical machine with a full OS, then you could deploy it to a virtual machine with a full OS, then you could deploy the process, its dependencies and a minimal OS into a container.
I agree that Docker doesn't have a huge and profitable future ahead of it, because it's providing commodity infrastructure. Rather I think it's interesting to think about what the next level of software deployment decomposition will be, and I'd wager that it's FaaS (ie serverless).
It’s very easy to underestimate how helpful it can be when you first start working with it. It’s a black box that uses root to do everything and is a pain to debug. Because of this it becomes easy to hand wave it away.
Once you have a properly setup project going and your entire build process is mostly repeatable the benefits start becoming more obvious. Yes, you can do all the same things to a certain extent in a vm but it’s really hard to keep that streamlined and up to date. Having a script that sets up your stack in a vm on both windows and Mac then runs on Linux is a pretty big maintenance nightmare. A dockerfile works with a few commands and can be added to your repo.
It’s not without trade offs but I think if they can solve the issue of debugging in a better way then we’ll really see things solidify on this concept.
Harsh. Docker is a nice way to specify how you want your container to work. Sure release Bocker (A better Docker). But Bocker will just basically be Docker: a simple set of instructions to get a machine running.
That is all we want. A few instructions to get a system up. Devs are sick of setting up machines. Demand is there.
I agree that the industry will waste billions on startups. There's not really a market there for 'Docker' containers, and there never was.
docker-containers and it's related counterparts are abstractions. Useful abstractions don't necessarily equate to a new line of business.
The best attributes of containers (IMO) are packaging and distribution. What business and operators need is a repeatable, easy way to deploy applications across there infrastructure. Containers are one piece of that story.
The bigger piece, and IMO, where the business viability is, is the orchestration layer. Containers aren't very useful by themselves, you need a way to get your application online. That's where Kubernetes comes in.
You need to understand large organizations and their challenges to see what layer of the containerization stack holds the most value.
Long-term, I see 'linux containers' as we know them going away. The industry is going to move to something like [1]: lightweight, hardware-assisted VM/container hybrids. But, no matter what happens at the containerization layer, the orchestration layer is the piece that adds business value to end-users (eg, not AWS or other hosting providers).
So, last time I had a web dev job was back in 2016, so the whole container thing kind of passed me by. So I don't know the details- but, on the other hand, I also don't have any baggage about it being "a VM".
But, starting from that kind of "clean slate" state I have to say that if it takes exasperated internet posts, like your helpful comment, to explain why containers are not like VMs and how they are not like VMs... well then maybe they are not that much not like VMs to make them such a big new thing.
That goes for many things. Like, I don't get the difference between Volley and Beach Volley. One is played on the beach. So it's volley? Played on the beach?
> they call it a VM (and don't qualify/note that they're being fast and loose with terminology).
It's only confusing to people who are familiar just with the popularized forms of computer science terms.
Isolation and sandboxing is virtualization. In a container, the applications seem to have an operating system and machine to themselves.
A single Unix process and its address space is also a kind of virtual machine, creating the illusion that the process has a machine all to itself. Thanks to virtual memory, other processes are not even visible; they are in a different name space of pointers. That concept breaks for multi-process applications: processes are aware of each other through manipulations of shared resources like files. Or effects like not being able to bind a networking port because some other process is tying it up. The next level of virtualization is to have namespaces for resource-related namespaces in the system beyond the address space. As far as just the filesystem space goes, we can virtualize with tools like chroot. A group of applications can have their own global /etc configuration, their own version of the C library in /lib and so on. That's the beginning of "containerization".
Docker famously doesn't isolate very well, as known in infosec circles for years now. If you're unaware, search 'containers don't contain'.
MicroVMs start faster and provide better isolation.
Meanwhile, none of this is relevant unless you're building your own cloud platform, which is a huge waste of time for most companies.
MicroVMs, containers, VMs, zones and bare metal are places to execute code. Serverless makes all those distinctions irrelevant.
Sorry if you spent 2015 getting really into Docker. You bet on the wrong horse. It's OK, this happens in tech.
Edit: if it's unclear, I don't mean Docker itself is the wrong horse, I mean containerisation tools per se are the wrong horse - and a bad place to invest your time unless you work for a cloud provider
Deployment of Docker containers is nice, but deploying a VM as in Vagrant was also fine. I avoided learning anything about Docker for about 2 years because I thought it was just a fad.
However, I would add that for my own personal use, it's invaluable for development work. All that work that you do _before_ your CI or deployment.
1) When I'm working with a collection of tools that I need but are a complete mess with lots of state (think: compiler tools, LaTeX, things like that), then docker image build with its incremental way of running each command piece by piece, and saving the state after each RUN, is actually a life saver. You follow the steps of some instructions, and of course, as usual, there's one extra step not documented in the manual, so you add that to your Dockerfile. You make a mistake, no big deal, just change the command, the bad state is discarded, and you get to try again. You don't have to run the whole thing all over again. And it's instantaneous.
2) When I have to work with a client's codebase, as a consultant, you'd be surprised how many projects do not have a reproducible build, with Docker or anything else. So I end up building my own Dockerfile. The number of times I've heard "but you just have to run this setup script once" -- well, those scripts never work (why would they? nobody runs them anymore). Especially when it begins with `npm` or `pip` -- almost guaranteed to fail catastrophically, with some g++ compile error, or a segfault, or just a backtrace that means nothing. For example, I recently had to run an `npm` install command and it failed with `npm ERR! write after end`. I re-ran the container again, and again once more, and then it succeeded (https://gist.github.com/chrisdone/ea6e4ba3d8bf2d02f491b4a17f...). npm has a race condition (https://github.com/npm/npm/issues/19989; fixed in the latest version). I wouldn't have been able to confidently share with my client this situation unless I had that reproducibility.
3) It's trivial to share my Dockerfile with anyone else and they can run the same build process. I don't have to share an opaque VM that's hundreds of megs and decide where to put it and how long I want to keep it there, etc.
4) It's a small one; but speed. Spinning up or resuming a VirtualBox machine is just slow. I can run docker containers like scripts, there isn't the same overhead.
5) Popularity is a blessing; the fact that I _can_ share my Dockerfile with someone is a network effect. Like using Git, instead of e.g. darcs or bzr.
By the way, you can also do nice container management with systemd and git. There's nothing inherently technologically _new_ about Docker; it's the workflow and network effects; it lets me treat a system's state like a Git repo.
Nix has similar advantages, but I see Docker as one small layer above Nix.
I don't understand how these are comparable. Hadoop solved a hard problem that nobody had. Docker solves a simple problem that everyone has. It would make sense if you're talking about kubernetes and using it to build hundreds of microservices because it's currently in fashion. Whether you're using Docker, Packer, Ansible or whatever doesn't matter. They are all a solution to the same problem and saying one is better basically boils down to saying which brand of hammer is better.
> The same will eventually be said of Docker. I’ve yet to hear a single benefit attributed to Docker that isn’t also true of other VMs,
I bought a Raspberry Pi and using a few commands, installed pre-configured Docker ARM images for 8-9 different media applications that would've taken me days to setup and manage individually. I didn't have to worry about dependencies or compilations. It just worked.
Properly packaged Debian/raspbian apps are still an “apt install” away. Your use case, which is common, tells me that packaging / distributing may need some love, not that there’s a fundamental difference.
And the convenience does not come free - a random docker is almost as bad as a random executable.
Great but how does that help recoup the investment the article mentions?
I’m making no comment on the specifics of the Docker or Hadoop ecosystems as I have no skin in either game but history is full of useful tech that didn’t make money.
Cloudera earnt $145m last quarter and grew by 37% over the previous quarter. Other Hadoop startups like Databricks are doing well and Docker has gone from a 2 digit revenue to a 3 digit revenue company from 2017-2018. How has billions been wasted when we have successful companies doing well against the toughest competitors ever i.e. Google, Microsoft and Amazon.
Well that include revenues from Hortonworks also. With revenue rise losses are also rising to $85.5M. I feel in couple of year they will fold up or brought by some big cloud vendor.
You can't really compare VM with Docker. Managing containers with Docker + Kubernetes is far easier then managing VMs. Docker might be replaced with something else in future (i.e. rkt), but basic concept IMHO is here to stay.
Isn't this the way it normally works though, a bunch of investments don't work out - those are wasted - those that do work out, the people who did the investing get more money back.
Or to put it another way: There must have been some few Hadoop investments that worked out, the same will eventually be true of Docker.
> ...but standard VMs allow the use of standard operating systems that solved all the hard problems decades ago, whereas Docker is struggling to solve those problems today.
What are these supposed "hard problems" the author speaks of?
Seriously, who reads this kind of garbage? It's painfully clear to anyone with any experience with Docker that the author hasn't even skimmed the wikipedia page. Reminds me of the idiots blasting out blog posts about how BITCOIN IS THE FUTURE one month, then BITCOIN IS A SCAM the next.
I have come across quite a few articles that mention that for 99% of 'big' data problems, Hadoop and the like are an overkill. Simple tools, with a beefy machine is just as sufficient for the task.
Is that the reality of today?
Personally I too feel that distributed computing is an overkill for most 'big' data problems.
My beef with Hadoop, and other big data tools, is that for pretty much any task other than outlier detection, sampling works as good, and is cheaper and easy to manage and reason about.
Even Google, the king of big data, will sample your hits on Google Analytics if your site gets too much traffic.
> and number three – well it’s not even worth staying in business because there’s no money to be made,
I guess tell that to the 10s of companies doing PaaS. Or another 10s doing app monitoring / logging. They're successful companies, far from just "breaking even".
Yup. Tech faabionabilism is akin to Big4/MBB FUD flavor of the quarter... populism/marketing doesn't a necessity make. Docker, Kubernetes, gulp, Hadoop, mosh, Nix, SmartOS, serverless, cloud, virtualization, [insert tech fashion hype > utility here].
Speaking of Hadoop: my vehicle is parked outside one of the HQ's of another top 10 Hadoop startup. It's
one of the most expensive, nearly empty buildings in the highest rent areas of the Valley. (Money flushing sound here.)
Fun fact: one of the enterprise Hadoop CTOs is a broney.
[+] [-] hardwaresofton|6 years ago|reply
> The money was wasted on hype. The same will eventually be said of Docker. I’ve yet to hear a single benefit attributed to Docker that isn’t also true of other VMs, but standard VMs allow the use of standard operating systems that solved all the hard problems decades ago, whereas Docker is struggling to solve those problems today.
Linux containerization (using the word "docker" for everything isn't right either) is an isolation + sandboxing mechanism, NOT a virtual machine. Even if you talk about things like LXC (orchestrated by LXD), that's basically just the addition of the user namespacing feature. A docker container is not a VM, it is a regular process, isolated with the use of cgroups and namespaces, possibly protected (like any other process) with selinux/apparmor/etc.
Containerization is almost objectively a better way of running applications -- there's only one question, do you want your process to be isolated, or not. All the other stuff (using Dockerfiles, pulling images, the ease of running languages that require their own interpreters since you package the filesystem) is on top of this basic value propostion.
An easy way to tell that someone doesn't know what they're talking about when speaking about containerization is if they call it a VM (and don't qualify/note that they're being fast and loose with terminology).
All this said -- I do think Docker will die, and it should die because Docker is no longer the only game in town for reasonably managing (see: podman,crictl) and running containers (see: containerd/cri-o, libcontainer which turned into runc) .
[EDIT] - I want to point out that I do not mean the Docker the company or Docker the project will "die" -- they have done amazing things for the community and development as a whole that will literally go down in history as a paradigm shift. What I should have written was that "docker <x>" where x is "image", "container", "registry", etc should be replaced by "container <x>".
[+] [-] cmsj|6 years ago|reply
You're right that containers are not VMs, but that's only really relevant as pedantry of technical details.
I think that what the author was trying to say (without really understanding it) was a comparison of containers to VMs as units of software deployment.
I don't think anyone is credibly using containers as a security measure on Linux, because if they think they are, they are in for several large surprises.
Rather, we're seeing the unbundling of software - it used to be that you deployed software to a physical machine with a full OS, then you could deploy it to a virtual machine with a full OS, then you could deploy the process, its dependencies and a minimal OS into a container.
I agree that Docker doesn't have a huge and profitable future ahead of it, because it's providing commodity infrastructure. Rather I think it's interesting to think about what the next level of software deployment decomposition will be, and I'd wager that it's FaaS (ie serverless).
[+] [-] pixelrevision|6 years ago|reply
Once you have a properly setup project going and your entire build process is mostly repeatable the benefits start becoming more obvious. Yes, you can do all the same things to a certain extent in a vm but it’s really hard to keep that streamlined and up to date. Having a script that sets up your stack in a vm on both windows and Mac then runs on Linux is a pretty big maintenance nightmare. A dockerfile works with a few commands and can be added to your repo.
It’s not without trade offs but I think if they can solve the issue of debugging in a better way then we’ll really see things solidify on this concept.
[+] [-] ransom1538|6 years ago|reply
Harsh. Docker is a nice way to specify how you want your container to work. Sure release Bocker (A better Docker). But Bocker will just basically be Docker: a simple set of instructions to get a machine running.
That is all we want. A few instructions to get a system up. Devs are sick of setting up machines. Demand is there.
[+] [-] linuxftw|6 years ago|reply
docker-containers and it's related counterparts are abstractions. Useful abstractions don't necessarily equate to a new line of business.
The best attributes of containers (IMO) are packaging and distribution. What business and operators need is a repeatable, easy way to deploy applications across there infrastructure. Containers are one piece of that story.
The bigger piece, and IMO, where the business viability is, is the orchestration layer. Containers aren't very useful by themselves, you need a way to get your application online. That's where Kubernetes comes in.
You need to understand large organizations and their challenges to see what layer of the containerization stack holds the most value.
Long-term, I see 'linux containers' as we know them going away. The industry is going to move to something like [1]: lightweight, hardware-assisted VM/container hybrids. But, no matter what happens at the containerization layer, the orchestration layer is the piece that adds business value to end-users (eg, not AWS or other hosting providers).
1: https://katacontainers.io/
[+] [-] YeGoblynQueenne|6 years ago|reply
But, starting from that kind of "clean slate" state I have to say that if it takes exasperated internet posts, like your helpful comment, to explain why containers are not like VMs and how they are not like VMs... well then maybe they are not that much not like VMs to make them such a big new thing.
That goes for many things. Like, I don't get the difference between Volley and Beach Volley. One is played on the beach. So it's volley? Played on the beach?
[+] [-] kazinator|6 years ago|reply
It's only confusing to people who are familiar just with the popularized forms of computer science terms.
Isolation and sandboxing is virtualization. In a container, the applications seem to have an operating system and machine to themselves.
A single Unix process and its address space is also a kind of virtual machine, creating the illusion that the process has a machine all to itself. Thanks to virtual memory, other processes are not even visible; they are in a different name space of pointers. That concept breaks for multi-process applications: processes are aware of each other through manipulations of shared resources like files. Or effects like not being able to bind a networking port because some other process is tying it up. The next level of virtualization is to have namespaces for resource-related namespaces in the system beyond the address space. As far as just the filesystem space goes, we can virtualize with tools like chroot. A group of applications can have their own global /etc configuration, their own version of the C library in /lib and so on. That's the beginning of "containerization".
[+] [-] gerbilly|6 years ago|reply
No, not always. Why?.
At work I have a few coworkers pushing hard to dockerize (isolate?) everything.
This makes debugging when things go wrong a lot harder.
I see isolation as one of several qualities a process could have, that sometimes is valuable enough to be worth the sacrifice.
Isolation is not some absolute quality that is without significant tradeoffs.
[+] [-] nailer|6 years ago|reply
MicroVMs start faster and provide better isolation.
Meanwhile, none of this is relevant unless you're building your own cloud platform, which is a huge waste of time for most companies.
MicroVMs, containers, VMs, zones and bare metal are places to execute code. Serverless makes all those distinctions irrelevant.
Sorry if you spent 2015 getting really into Docker. You bet on the wrong horse. It's OK, this happens in tech.
Edit: if it's unclear, I don't mean Docker itself is the wrong horse, I mean containerisation tools per se are the wrong horse - and a bad place to invest your time unless you work for a cloud provider
[+] [-] chrisdone|6 years ago|reply
However, I would add that for my own personal use, it's invaluable for development work. All that work that you do _before_ your CI or deployment.
1) When I'm working with a collection of tools that I need but are a complete mess with lots of state (think: compiler tools, LaTeX, things like that), then docker image build with its incremental way of running each command piece by piece, and saving the state after each RUN, is actually a life saver. You follow the steps of some instructions, and of course, as usual, there's one extra step not documented in the manual, so you add that to your Dockerfile. You make a mistake, no big deal, just change the command, the bad state is discarded, and you get to try again. You don't have to run the whole thing all over again. And it's instantaneous.
2) When I have to work with a client's codebase, as a consultant, you'd be surprised how many projects do not have a reproducible build, with Docker or anything else. So I end up building my own Dockerfile. The number of times I've heard "but you just have to run this setup script once" -- well, those scripts never work (why would they? nobody runs them anymore). Especially when it begins with `npm` or `pip` -- almost guaranteed to fail catastrophically, with some g++ compile error, or a segfault, or just a backtrace that means nothing. For example, I recently had to run an `npm` install command and it failed with `npm ERR! write after end`. I re-ran the container again, and again once more, and then it succeeded (https://gist.github.com/chrisdone/ea6e4ba3d8bf2d02f491b4a17f...). npm has a race condition (https://github.com/npm/npm/issues/19989; fixed in the latest version). I wouldn't have been able to confidently share with my client this situation unless I had that reproducibility.
3) It's trivial to share my Dockerfile with anyone else and they can run the same build process. I don't have to share an opaque VM that's hundreds of megs and decide where to put it and how long I want to keep it there, etc.
4) It's a small one; but speed. Spinning up or resuming a VirtualBox machine is just slow. I can run docker containers like scripts, there isn't the same overhead.
5) Popularity is a blessing; the fact that I _can_ share my Dockerfile with someone is a network effect. Like using Git, instead of e.g. darcs or bzr.
By the way, you can also do nice container management with systemd and git. There's nothing inherently technologically _new_ about Docker; it's the workflow and network effects; it lets me treat a system's state like a Git repo.
Nix has similar advantages, but I see Docker as one small layer above Nix.
[+] [-] adds68|6 years ago|reply
[+] [-] SmokeGS|6 years ago|reply
[+] [-] imtringued|6 years ago|reply
[+] [-] chime|6 years ago|reply
I bought a Raspberry Pi and using a few commands, installed pre-configured Docker ARM images for 8-9 different media applications that would've taken me days to setup and manage individually. I didn't have to worry about dependencies or compilations. It just worked.
[+] [-] beagle3|6 years ago|reply
And the convenience does not come free - a random docker is almost as bad as a random executable.
[+] [-] Sharlin|6 years ago|reply
[+] [-] shatnersbassoon|6 years ago|reply
"other VMs"? The whole point of Docker is that it's not a VM...
[+] [-] debiandev|6 years ago|reply
Often the applications are packaged by random people on the Internet and do not receive security updates.
There's plenty of evidence showing how bad the problem is and there's no way around it.
You need the security team of a distribution to backport security fixes into a stable distribution and a large user community test them.
Only with this you can run apt-get upgrade without breaking things.
[+] [-] mprev|6 years ago|reply
I’m making no comment on the specifics of the Docker or Hadoop ecosystems as I have no skin in either game but history is full of useful tech that didn’t make money.
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] threeseed|6 years ago|reply
[+] [-] geodel|6 years ago|reply
[+] [-] sasavilic|6 years ago|reply
[+] [-] bryanrasmussen|6 years ago|reply
Or to put it another way: There must have been some few Hadoop investments that worked out, the same will eventually be true of Docker.
[+] [-] metaphor|6 years ago|reply
What are these supposed "hard problems" the author speaks of?
[+] [-] saagarjha|6 years ago|reply
[+] [-] some_random|6 years ago|reply
[+] [-] mananvaghasiya|6 years ago|reply
Oh boy
[+] [-] Jonnax|6 years ago|reply
Containerised applications are commonly used. At this point it's a proven technology with clear use cases.
[+] [-] alfiedotwtf|6 years ago|reply
[+] [-] mrosett|6 years ago|reply
[+] [-] skc|6 years ago|reply
As a result I find it difficult understand the hype.
[+] [-] kumarvvr|6 years ago|reply
Is that the reality of today?
Personally I too feel that distributed computing is an overkill for most 'big' data problems.
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] jgalt212|6 years ago|reply
Even Google, the king of big data, will sample your hits on Google Analytics if your site gets too much traffic.
[+] [-] viraptor|6 years ago|reply
I guess tell that to the 10s of companies doing PaaS. Or another 10s doing app monitoring / logging. They're successful companies, far from just "breaking even".
[+] [-] jressey|6 years ago|reply
[+] [-] harimau777|6 years ago|reply
[+] [-] bayareanative|6 years ago|reply
Speaking of Hadoop: my vehicle is parked outside one of the HQ's of another top 10 Hadoop startup. It's one of the most expensive, nearly empty buildings in the highest rent areas of the Valley. (Money flushing sound here.)
Fun fact: one of the enterprise Hadoop CTOs is a broney.