Ask HN: Who operates at scale without containers?

[+] smilliken|4 years ago|reply

My company runs without containers. We process petabytes of data monthly, thousands of CPU cores, hundreds of different types of data pipelines running continously, etc etc. Definitely a distributed system with lots of applications and databases.

We use Nix for reproducible builds and deployments. Containers only give reproducible deployments, not builds, so they would be a step down. The reason that's important is that it frees us from troubleshooting "works on my machine" issues, or from someone pushing an update somewhere and breaking our build. That's not important to everyone if they have few dependencies that don't change often, but for an internet company, the trend is accelerating towards bigger and more complex dependency graphs.

Kubernetes has mostly focused on stateless applications so far. That's the easy part! The hard part is managing databases. We don't use Kubernetes, but there's little attraction because it would be addressing something that's already effortless for us to manage.

What works for us is to do the simplest thing that works, then iterate. I remember being really intimidated about all the big data technologies coming out a decade ago, thinking they are so complex that they must know what they're doing! But I'd so often dive in to understand the details and be disillusioned about how much complexity there is for relatively little benefit. I was in a sort of paralysis of what we'd do after we outgrew postgresql, and never found a good answer. Here we are years later, with a dozen+ postgresql databases, some measuring up to 30 terabytes each, and it's still the best solution for us.

Perhaps I've read too far into the intent of the question, but maybe you can afford to drop the research project into containers and kubernetes, and do something simple that works for now, and get back to focusing on product?

[+] toast0|4 years ago|reply

I worked at WhatsApp, prior to moving to Facebook infra, we had some jails for specific things, but mostly ran without containers.

Stack looked like:

FreeBSD on bare metal servers (host service provided a base image, our shell script would fetch source, apply patches, install a small handful of dependencies, make world, manage system users, etc)

OTP/BEAM (Erlang) installed via rsync etc from build machine

Application code rsynced and started via Makefile scripts

Not a whole lot else. Lighttpd and php for www. Jails for stud (a tls terminator, popular fork is called hitch) and ffmpeg (until end to end encrypted media made server transcoding unpossible).

No virtualized servers (I ran a freebsd vm on my laptop for dev work, though).

When WA moved to Facebook infra, it made sense to use their deployment methodology for the base system (Linux containers), for organizational reasons. There was no consideration for which methodology was technically superior; both are sufficient, but running a very different methodology inside a system that was designed for everyone to use one methodology is a recipie for operational headaches and difficulty getting things diagnosed and fixed as it's so tempting to jump to the conclusion that any problem found on a different setup is because of the difference and not a latent problem. We had enough differences without requiring a different OS.

[+] freeqaz|4 years ago|reply

That's a very cool deployment method to just use rsync like that! Simple and very composable.

And now I feel self-conscious about my pile of AWS and Docker!

[+] wanderr|4 years ago|reply

Grooveshark didn't use any of that. We were very careful about avoiding dependencies where possible and keeping our backend code clean and performant. We supported about 45M MAU at our biggest, with only a handful of physical servers. I'm not aware of any blog posts we made detailing any of this, though. And if you're not familiar with the saga, Grooveshark went under for legal, not technical reasons. The backend API was powered by nginx, PHP, MySQL, memcache, with a realtime messaging server built in Go. We used Redis and Mongodb for some niche things, had serious issues with both which is understandable because they were both immature at the time, but Mongodb's data loss problems were bad enough that I would still not use them today.

That said, I'm using Docker for my current side project. Even if it never runs at scale, I just don't want to have to muck around with system administration, not to mention how nice it is to have dev and prod be identical.

[+] brimble|4 years ago|reply

> That said, I'm using Docker for my current side project. Even if it never runs at scale, I just don't want to have to muck around with system administration, not to mention how nice it is to have dev and prod be identical.

This is why I use docker, at work and for my own stuff. No longer having to give a shit whether the hosting server is LTS or latest-release is wonderful. I barely even have to care which distro it is. Much faster and easier than doing something similar with scripted-configuration VMs, plus the hit to performance is much lower.

[+] turkeywelder|4 years ago|reply

I miss Grooveshark so much - Licensing issues aside it was one of the best UIs for music ever. I'd love to hear more stories about the backend

[+] Aachen|4 years ago|reply

Man I miss Grooveshark still today. Spotify is okay but still a step down. Needing billion-dollar licensing schemes to even get started makes this such a hard market to actually get into and provide a competitively superior experience.

[+] mhitza|4 years ago|reply

What a great service. I'd be curious if you could go into details how the radio feature worked back then, because I found myself receiving worse suggestions when I used similar features in Spotify/Google Play Music.

[+] hungryforcodes|4 years ago|reply

Groove shark was the best.

[+] hexfish|4 years ago|reply

Thanks for giving some insight into this. Grooveshark was absolutely great!

[+] jimbob45|4 years ago|reply

I miss Grooveshark to this day. Thanks for building such an excellent product!

[+] pineconewarrior|4 years ago|reply

I loved Grooveshark! thanks for your work

[+] maxk42|4 years ago|reply

Back in 2010 I built and operated MySpace' analytics system on 14 EC2 instances. Handled 30 billion writes per day. Later I was involved in ESPN's streaming service which handled several million concurrent connections with VMs but no containers. More recently I ran an Alexa top 2k website (45 million visitors per month) off of a single container-free EC2 insurance. Then I spent two years working for a streaming company that used k8s + containers and would fall over of it had more than about 60 concurrent connections per EC2 instance. K8s + docker is much heavier than advertised.

[+] danielrhodes|4 years ago|reply

Docker is far heavier - the overhead is the flexibility and process isolation you get. I imagine that's really useful for certain types of workloads (e.g. an ETL pipeline), but is crazy inefficient for something single purpose like a web app.

[+] stickyricky|4 years ago|reply

> Handled 30 billion writes per day.

Writes to what?

[+] tptacek|4 years ago|reply

Ironically, here at Fly.io, we run containers (in single-use VMs) for our customers, but none of our own infrastructure is containerized --- though some of our customer-facing stuff, like the API server, is.

We have a big fleet of machines, mostly in two roles (smaller traffic-routing "edge" hosts that don't run customer VMs, and chonky "worker" hosts that do). All these hosts run `fly-proxy`, a Rust CDN-style proxy server we wrote, and `attache`, a Consul-to-sqlite mirroring server we built in Go. The workers also run our orchestration code, all in Go, and Firecracker (which is Rust). Workers and WireGuard gateways run a Go DNS server we wrote that syncs with Consul. All these machines are linked together in a WireGuard mesh managed in part by Consul.

The servers all link to our logging and metrics stack with Vector and Telegraf; our core metrics stack is another role of chonky machines running VictoriaMetrics.

We build our code with a Buildkite-based CI system and deploy with a mixture of per-project `ctl` scripts and `fcm`, our in-house Ansible-like. Built software generally gets staged on S3 and pulled by those tools.

Happy to answer any questions you have. I think we fit the bill of what you're asking about, even though if you read the label on our offering you'd get the opposite impression.

[+] po_ta_toes|4 years ago|reply

Hi there,

Very interesting read!

I work for a large news org, The team I'm in primarily uses elixir, which I know the people at fly.io love too!

Why did you decide not to containerise your own infrastructure?

We use some 'chonky' ec2s but are thinking about using containers.

Given that the BEAM has quite a large footprint, do you think it still a good candidate for containers, or would that introduces too much overhead?

[+] cpach|4 years ago|reply

Intriguing!

This makes me curious: How does one learn to design and build systems like this…?

Also: How do you folks at Fly decide what parts to use “as is” and what parts to build from scratch? Do you have any specific process for making those choices?

[+] q3k|4 years ago|reply

Depends what you mean by 'container runtime' or 'container orchestration tool'...

For example, Google's Borg absolutely uses Linux namespacing for its workloads, and these workloads get scheduled automatically on arbitrary nodes, but this doesn't feel at all like Docker/OCI containers (ie., no whole-filesystem image, no private IP address to bind to, no UID 0, no control over passwd...). Instead, it feels much closer to just getting your binary/package installed and started on a traditional Linux server.

[+] menage|4 years ago|reply

> no whole-filesystem image

At least in the past, almost all jobs ran in their own private filesystem - it was stitched together in userspace via bind mounts rather than having the kernel do it with an overlayfs extracted from layer tar files (since overlayfs didn't exist back then), but the result was fairly similar.

Most jobs didn't actually request any customization so they ended up with a filesystem that looked a lot like the node's filesystem but with most of it mounted read-only. But e.g. for a while anything running Java needed to include in their job definition an overlay that updated glibc to an appropriate version since the stock Google redhat image was really old.

[+] isseu|4 years ago|reply

Yeah was gonna post this. Not sure if is updated enough but it's worth reading the Borg paper https://research.google/pubs/pub43438/

[+] sitkack|4 years ago|reply

Install debs into a dynamically provisioned container, it will feel similar.

[+] unknown|4 years ago|reply

[deleted]

[+] alex_duf|4 years ago|reply

Hey former Guardian employee here.

The Guardian has hundreds of servers running, pretty much all EC2 instances. EC2 images are baked and derived from official images, similarly to the way you bake a docker image.

We built tools before docker became the de facto standard, so we could easily keep the EC2 images up to date. We integrated pretty well with AWS so that the basic constructs of autoscaling and load balancer were well understood by everyone.

The stack is mostly JVM based so the benefits of running docker locally weren't really significant. We've evaluated moving to a docker solution a few times and always reached the conclusion that the cost of doing so wouldn't be worth the benefits.

Now for a company that starts today I don't think I'd recommend that, it just so happen that The Guardian invested early on the right tooling so that's pretty much an exception.

[+] sparsely|4 years ago|reply

> We integrated pretty well with AWS so that the basic constructs of autoscaling and load balancer were well understood by everyone.

This is an underappreciated point I think sometimes. Once you have a team which is familiar with your current, working setup, the benefits of moving away have to be pretty huge for it to be worthwhile.

[+] bscanlan|4 years ago|reply

Intercom is pretty similar. We use EC2 hosts and no containers (other than for development/test environments and some niche third-party software that is distributed as Docker containers). Autoscaling groups are our unit of scalability, pretty much one per workload, and we treat the EC2 hosts as immutable cattle. We do a scheduled AMI build every week and replace every host. We use an internally developed software tool to deploy buildpacks to hosts - buildpacks are pre-Docker technology from Heroku that solves most of the problems containers do.

I wouldn't necessarily recommend building this from scratch today, it was largely put in place around 8 years ago, and there are few compelling reasons for us to switch.

[+] weego|4 years ago|reply

Also ex employee. Riff Raff is absolutely still an excellent mod for build and deploy. At the time I was there it the initial stack build via handwritten cloudformation script that was the friction and pain point.

[+] speleding|4 years ago|reply

> Now for a company that starts today I don't think I'd recommend that

I think for a company starting out that AMIs (Amazon Machine Images), which from your description is probably what you're using, is actually a much better way to go than docker containers, because you get a large part of the orchestration for free with the AWS EC2 auto-scaling and health detection without most of docker complexity.

(I would suggest using something like Terraform to set it up in a reproducible way though)

[+] rr808|4 years ago|reply

> Now for a company that starts today I don't think I'd recommend that

Any reason why? It sounds pretty good.

[+] kuon|4 years ago|reply

We use ansible on bare metal (no VM) to manage about 200 servers in our basement. We use PXE booting to manage the images. We use a customized arch linux image and we have a few scripts to select what feature we'd like. It's "old school" but it's been working fine for nearly 20 years (we used plain scripts before ansible, so we always used the "agentless" approach). Our networking stack uses OpenBSD.

[+] yabones|4 years ago|reply

That sounds really interesting. If you have a write-up about how it's built, the decisions that went into it, and the problems you've had to solve, I would absolutely read it!

[+] annoyingnoob|4 years ago|reply

I've used pretty much the same model to easily manage hundreds of servers. PXE boot, install an image, run ansible scripts to configure the specifics.

[+] armcat|4 years ago|reply

Not sure if this counts, but for more than a decade I was at a telecom vendor, working with radio base stations (3G, 4G and 5G). That (to me), is probably one of the most distributed systems on the planet - we worked across several million nodes around the globe. I've been out of the loop for a bit, but I know they now have vRAN, Cloud RAN, etc (basically certain soft-real time functions pulled out of base stations and deployed as VMs or containers). But back then, there was no virtualization being used.

The tech stack was as follows: hardware was either PowerPC or ARM based System-on-Chip variants; we initially used our own in-house real-time OS, but later switched to a just-enough Linux distro; management functions were implemented either in IBM's "real-time" JVM (J9), or in Erlang; radio control plane (basically messages used to authenticate you, setup the connection and establish radio bearers, i.e. "tunnels" for payload) was written in C++. Hard real-time functions (actual scheduling of radio channel elements, digital signal processing, etc) were written in C and assembly.

Really cool thing - we even deployed a xgboost ML model on these (used for fast frequency reselection - reduced your time in low coverage) - the model was written in C++ (no Python runtime was allowed), and it was completely self-supervised, closed-look (it would update/finetune its parameters during off-peak periods, typically at night).

Back then, we were always self-critical of ourselves, but looking back at it, it was an incredibly performant and robust system. We accounted for every CPU cycle and byte - at one point I was able to do a walkthrough (from-memory) of every single memory allocation during a particular procedure (e.g. a call setup). We could upgrade thousands of these nodes in one maintenance window, with a few secs of downtime. The build system we always complained about, but looking back at it, you could compile and package everything in a matter of minutes.

Anyway, I think it was a good example of what you can accomplish with good engineering.

[+] mumblemumble|4 years ago|reply

I don't know that I'd say "web scale", in part because I still don't think I know exactly what that means, but I used to work at a place that handled a lot of data, in a distributed manner, in an environment where reliability was critical, without containers.

The gist of their approach was radical uniformity. For the most part, all VMs ran identical images. Developers didn't get to pick dependencies willy-nilly; we had to coordinate closely with ops. (Tangentially, at subsequent employers I've been amazed to see how just a few hours of developers handling things for themselves can save many minutes of talking to ops.) All services and applications were developed and packaged to be xcopy deployable, and they all had to obey some company standards on how their CLI worked, what signals they would respond to and how, stuff like that. That standard interface allowed it all to be orchestrated with a surprisingly small volume - in terms of SLOC, not capability - of homegrown devops infrastructure.

[+] efficax|4 years ago|reply

Back in 2016 at least, Stack overflow was container free https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

No idea how much has changed since then

[+] 16bytes|4 years ago|reply

That's the first thing that came to mind as well, and Stack Overflow definitely operates at scale, but OP asked for people doing distributed systems. SO is famously monolithic, famously operating from just a handful of machines.

[+] onebot|4 years ago|reply

We use freebsd jails and a lightweight in house orchestration tool written in Rust. We are running hundreds of Ryzen machines with 64 cores. Our costs compared to running equivalent on Amazon is so much less. We estimate our costs are about 6x lower than AWS and we have far better performance in terms of networking, CPU, and disk write speed.

Jails has been a pleasure to work with! We even dynamically scale up and down resources as needed.

We use bare metal machines on Interserver. But there a quite a few good data centers worth considering.

[+] camtarn|4 years ago|reply

Don't know if they still use it (I suspect so!) but at least as of 2015 Amazon was using a homebrewed deployment service called Apollo, which could spin up a VM from an internally developed Linux image then populate it with all the software and dependencies needed for a single service. It later inspired AWS CodeDeploy which does the same thing.

I remember it being pretty irritating to use, though, since it wasn't particularly easy to get Apollo to deploy to a desktop machine in the same way it would in production, and of course you couldn't isolate yourself from the desktop's installed dependencies in the same way. I'm using Docker nowadays and it definitely feels a lot smoother.

This is a nice writeup: https://www.allthingsdistributed.com/2014/11/apollo-amazon-d...

[+] _16k4|4 years ago|reply

I've always thought of Apollo environments as containers before kernel features for containers existed. With enough environment variables and wrapper scripts taking the name of real binaries to populate stuff like LD_LIBRARY_PATH, Apollo makes a private environment that is only _slightly_ contaminated by the host.

[+] mijoharas|4 years ago|reply

Not too much of an update, but they were still using it in 2017.

[+] jake_morrison|4 years ago|reply

AWS has a fine stack for deploying "cloud native" apps on top of EC2 instances.

Build a base AMI using Packer and launch it to an Auto Scaling Group behind a load balancer. Deploy code to the ASG using CodeDeploy. Use RDS for the database.

This is a good match for languages that have good concurrency like Elixir. They benefit from deploying to big machines that have a lot of CPU cores, and keeping a common in-memory cache on the EC2 instance is more efficient than using an external cache like Elasticache. It also works well for resource-hungry systems with poor concurrency like Ruby on Rails. Putting these kinds of apps into big containers is just a waste of money.

Here is a complete example of that architecture using Terraform: https://github.com/cogini/multi-env-deploy

Similarly, bare metal can be really cost-effective. For $115/month, I can get a dedicated server with 24 VCPU cores (2x Intel Hexa-Core Xeon E5-2620 CPU), 64 GB RAM, 4x8 TB SATA, 30 TB traffic (see https://www.leaseweb.com/dedicated-servers#NL). That would be an order of magnitude more expensive on AWS with containers.

[+] Nextgrid|4 years ago|reply

I've been at a company where they weren't (yet) using containers nor K8S.

The build process would just create VM images with the required binaries in there and then deploy that to an autoscaling group.

It worked well and if you only ever intend to run a single service per machine then is the right solution.

[+] dolni|4 years ago|reply

At my workplace we use Docker to run services, but there is no container orchestration like Kubernetes. An AMI bakes in some provisioning logic and the container image. Autoscaling does the rest.

Even without orchestration, I argue containers are useful. They abstract the operating system from the application and allow you to manage each independently. Much more easily than you'd be able to otherwise, anyway.

Plus you can run that image locally using something like docker-compose for an improved developer experience.

[+] tailspin2019|4 years ago|reply

I’m in the process of moving to exactly this approach.

I’ve been trying to pick the right Linux distro to base my images on. Ubuntu Server is the low effort route but a bit big to redeploy constantly.

I’ve also been looking at the possibility of using Alpine Linux which feels like a better fit but a bit more tweaking needed for compatibility across cloud providers.

Unikernels are also interesting but I think that might be going too far in the other direction for my use case.

For others doing this, I’d be interested in what distros you’re using?

[+] cyberge99|4 years ago|reply

I believe tools like nomad and consul shine here.

Using nomad as a job scheduler and deployer allows you to use various modules for jobs: java, shell, ec2, apps (and containers).

I use it in my homelab and it’s great. That said, I don’t use it professionally.

I think Cloudflare is running this stack alongside firecracker for some amazing edge stuff.

[+] abadger9|4 years ago|reply

I have a private consulting company which has delivered some pretty sizable footprints (touches most fortune 500 companies via integration with a service) and i prefer deploying without containers. In fact i'll say I hate deploying with containers, which is what i do at my 9-5 and i've lost job opportunities at growth startups because someone was a devout follower of containers and i would rather be honest than using a technology i didn't care for.

[+] jedberg|4 years ago|reply

Netflix was container free or nearly so when I left in 2015, but they were starting to transition then and I think they are now container based.

At the time they would bake full machine images, which is really just a heavyweight way of making a container.

[+] zemo|4 years ago|reply

when I was at Jackbox we ran the multiplayer servers without containers and handled hundreds of thousands of simultaneous websocket connections (a few thousand per node). The servers were statically compiled Go binaries that took care of their own isolation at the process level and didn't write to disk, they were just running as systemd services. Game servers are inherently stateful, they're more like databases than web application layer servers. For large audience games I wrote a peering protocol and implemented a handful of CRDT types to replicate the state, so it was a hand-rolled distributed system. Most things were handled with chef, systemd, terraform, and aws autoscaling groups.

[+] pipe_connector|4 years ago|reply

Interesting -- how did you handle redeploys? Given that game servers are stateful (so you'd want to drain servers at their own pace instead of force them down at a specific time), it seems like redeploying a server without machinery to do things like dynamically allocate ports/service discovery for an upstream load balancer would be tricky.

[+] locusofself|4 years ago|reply

I work at Microsoft and we have a lot of big services that run on Windows Servers. There is orchestration with a system called "Service Fabric" that schedules the applications and handles upgrades sortof like kubernetes does, but for the most part there are no containers involed.

432 comments