Ask HN: If Kubernetes is the solution, why are there so many DevOps jobs?

[+] jmillikin|3 years ago|reply

  >   1) internal users: mainly developers by providing CI/CD
  >   2) external users: end users
  >
  > Nowadays we call people that do 1) DevOps and people that do
  > 2) SREs (so one could argue that the role of sys admins just
  > got more specialized).

Both are called sysadmins.

SRE is a specialized software engineering role -- you'd hire SREs if you wanted to create something like Kubernetes in-house, or do extensive customization of an existing solution. If you hire an SRE to do sysadmin work, they'll be bored and you'll be drastically overpaying.

DevOps is the idea that there shouldn't be separate "dev" and "ops" organizations, but instead that operational load of running in-house software should be borne primarily by the developers of that software. DevOps can be considered in the same category as Scrum or Agile, a way of organizing the distribution and prioritization of tasks between members of an engineering org.

---

With this in mind, the question could be reframed as: if projects such as Kubernetes are changing the nature of sysadmin work, why has that caused more sysadmin jobs to exist?

I think a general answer is that it's reduced the cost associated with running distributed software, so there are more niches where hiring someone to babysit a few hundred VMs is profitable compared to a team of mainframe operators.

[+] etruong42|3 years ago|reply

> so there are more niches where hiring someone to babysit a few hundred VMs is profitable

This makes a lot of sense. The same thing happened in the past with new technology, such as the electronic spreadsheet:

"since 1980, right around the time the electronic spreadsheet came out, 400,000 bookkeeping and accounting clerk jobs have gone away. But 600,000 accounting jobs have been added."

Episode 606: Spreadsheets!, May 17, 2017, Planet Money

[+] jupp0r|3 years ago|reply

SREs don't normally write Kubernetes alternatives. They are the people who operate/write automation that interacts with/advise teams how to run their software on/ Kubernetes to solve business problems like ensuring availability.

[+] throwaway787544|3 years ago|reply

> but instead that operational load of running in-house software should be borne primarily by the developers of that software

Go back and read a few DevOps books and blogs by the founders of it. We will always need separate disciplines for dev and ops, just like we need mechanical engineers and mechanics/racecar drivers. But we need them to work together and communicate better to solve problems better and not throw dead cats over walls.

You can of course give devs more powerful tools, more access and agency to enable them to develop the software better. Sentry.io is a great example of what is needed; makes everyone's life easier, devs can diagnose issues and fix bugs quickly without anyone in their way. That doesn't require operations work because it's just simplifying and speeding up the triage and debug and fix and test phases. It's the fundamental point of DevOps.

[+] jghn|3 years ago|reply

> DevOps is the idea that there shouldn't be separate "dev" and "ops" organizations,

I agree with this definition of DevOps. However the vast, vast, vaaaast majority of real life uses of the term "DevOps" I've seen are just rebranded sysadmins. Sometimes it at least implies a more engineering approach to their coding. But in these institutions the Devs and Ops are very much separate groups of people, unfortunately.

[+] shawnz|3 years ago|reply

essentially Jevons paradox

[+] binarymax|3 years ago|reply

Kubernetes is a Google scale solution. Lots of teams said “hey if Google does it then it must be good!”…but forgot that they didn’t have the scale. It caught on so much that for whatever reason it’s now the horrendous default. I’ve worked on at least 3 consulting projects that incorporated K8s and it slowed everything down and took way too much time, and we got nothing in return - because those projects only needed several instances, and not dozens or hundreds.

If you need less than 8 instances to do host your product, run far away anytime anyone mentions k8s

[+] nokya|3 years ago|reply

Exactly.

I am consulting with a startup right now that chose to go everything docker/k8s. The CTO is half-shocked/half-depressed by the complexity of our architecture meetings, although he used to be a banking software architect in his previous assignments. Every question I ask ends up in a long 15 minutes monologue by the guy who architected all of it, even the most simple questions. They are soon launching a mobile app (only a mobile app and its corresponding API, not even a website) and they already have more than 60 containers running and talking to each other across three k8s clusters and half of them interact directly with third-parties outside.

Even as I am being paid by the hour, I really feel sad for both the CTO and the developers attending the meeting.

k8s is definitely not for everyone. Google has thousands hardware systems running the same hypervisor, same OS, same container engine and highly specialized stacks of micro-services that need to run by the thousands. And even, I am not sure that k8s would satisfy Google's actual needs tbh.

Ironically, there are some companies that highly benefit from this and they are not necessarily "large" companies. In my case, k8s and devops in general made my life infinitely easier for on-site trainings: those who come with a poorly configured or decade-old laptop can actually enjoy the labs at the same pace than every other attendee.

[+] KaiserPro|3 years ago|reply

> Kubernetes is a Google scale solution

The problem is that its _not_ a google scale solution. Its something that _looks_ like a google scale solution, but is like a movie set compared to the real thing.

for example: https://kubernetes.io/docs/setup/best-practices/cluster-larg...

no more than 5k nodes.

Its extra ordinarily chatty at that scale, which means it;ll cost on inter-vpc traffic. I also strongly suspect that the whole thing is fragile at that size.

Having run a 36k node cluster in 2014, I know that K8s is just not designed for high scale high turnover vaguely complicated job graphs.

I get the allure, but in practice K8s is designed for a specific usecase, and most people don't have that usecase.

for most people you will want either ECS(its good enough, so long as you work around its fucking stupid service scheme) or something similar.

[+] throwaway7865|3 years ago|reply

We’ve moved a small-scale business to Kubernetes and it made our lives much easier.

Anywhere I’ve worked business always prioritizes high availability and close to zero downtime. No one sees a random delivered feature. But if a node fails at night - everybody knows it. Clients first of all.

We’ve achieved it all almost out of the box with EKS. Setup with Fargate nodes was literally a one-liner of eksctl.

Multiple environments are separated with namespaces. Leader elections between replicas are also easy. Lens is a very simple to use k8s IDE.

If you know what you’re doing with Kubernetes (don’t use EC2 for nodes, they fail randomly), it’s a breeze.

[+] jmillikin|3 years ago|reply

Kubernetes can't (currently) scale to Google sizes. It's designed for small- or medium-sized businesses, which might have 50,000 VMs or fewer.

There are entire SaaS industries that could fit into a single Google/Facebook/Amazon datacenter.

[+] systemvoltage|3 years ago|reply

Scaling is oversold and under criticized.

Folks, listen, if StackOverflow can run on this: https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...

So can your doctor's appointment website, your little ML app or Notion clone.

"But...". No. You ain't gonna need it. Do some load testing, prove it to yourself. Now, multiply the load by 100x, reserve AWS resources and you're good to go.

[+] datavirtue|3 years ago|reply

We just moved our single instance web app with 50 users to K8. Not kidding. I totally bailed on that one. It started with moving it to the cloud (meh) and ended up with K8 in the cloud. A few of the guys wanted to pad the resume so we just turned our heads.

[+] dosethree|3 years ago|reply

You don't even end up spending time on Kubernetes, because k8s is just part of the solution, a container scheduler. You have to bring logs, monitoring, a container registry ,as well as a CI system with custom jobs and do integration of everything.

[+] 908B64B197|3 years ago|reply

> Kubernetes is a Google scale solution. Lots of teams said “hey if Google does it then it must be good!”…but forgot that they didn’t have the scale.

It's also a Google engineer caliber solution. Lots of teams said “hey if Google engineers do it then it must be good!”…but forgot that they didn’t have the same in-house talent as Google.

[+] dusanh|3 years ago|reply

Asking as someone who has only tipped his toes into devops lately and is looking to learn K8, what is considered a reasonable "lightweight" alternative to Kubernetes these days?

[+] mancerayder|3 years ago|reply

Threads like the below are why DevOps jobs exist and why Kubernetes infrastructure skills pay so much and why there's such a large demand.

Yes, it's quite complicated.

No, an API to control a managed EKS/GCK cluster + terraform + Jenkins/Azure DevOps/etc. does not mean that magically the developer can 'just deploy' and infrastructure jobs are obsoleted. That's old AWS marketing nonsense predating Kubernetes.

There's a whole maintenance of the CI/CD factory and its ever demanding new requirements around performance, around Infosec requirements, around scale, and around whatever unique business requirements throw a wrench in the operation.

Sticking to ECS I guess is a valid point. What Kubernetes gives you is a more sophisticated highly available environment built for integration (Helm charts and operators and setups that when they work give you more levers to control resources allocations, separations of app environments, etc.)

And as an aside, I've been doing this for 20 years and long before Kubernetes, before Docker, hell, before VMs were used widely in production, I observed the developer mindset: Oh but it's so easy, just do X. Here, let me do it. Fast forward a year of complexity later, you start hiring staff to manage the mess, the insane tech debt the developers made unwittingly, and you realize managing infrastructure is an art and a full time job.

A story that is visible with many startups that suddenly need to make their first DevOps hire, who in turn inherit a vast amount of tech debt and security nightmares.

Get out of here with, it's just API calls. DevOps jobs aren't going away. It's just the DevOps folks doing those API calls now.

[+] mynameisash|3 years ago|reply

Reading the comments here validates my experience. When K8s was pitched as a way to make this all run smoothly, I thought, "Great! I'll write my code, specify what gets deployed and how many times, and it'll Just Work(tm)." I built a service which had one driver node and three workers. Nothing big. It deployed Dask to parallelize some compute. The workload was typically ~30 seconds of burst compute with some pretty minor data transfer between pods. Really straightforward, IMO.

Holy smokes, did that thing blow up. A pod would go down, get stuck in some weird state (I don't recall what anymore), and K8s would spin a new one up. Okay, so it was running, but with ever-increasing zombie pods. Whatever. Then one pod would get in such a bad state that I had to nuke all pods. Fortunately, K8s was always able to re-create them once I deleted them. But I was literally deleting all my pods maybe six or seven times per day in order to keep the service up.

Ultimately, I rewrote the whole thing with a simplified architecture, and I vowed to keep clear of K8s for as long as possible. What a mess.

[+] adra|3 years ago|reply

This can probably be chalked up to youre-doing-it-wrong (sorry) but not knowing your precise scenario, it's hard to know what went wrong. Maybe really old versions misbehaved (only started a few years ago and its been smooth sailing), but I've never seen your problem on any of our stuff and we have dozens of different services on a bunch of languages/frameworks, and none of them just give up for no reason ( though a lot often die for predictable and self-induced reasons).

I think there was some jank on AWS CNI drivers at one point that delayed pod init, but that's probably the most wtf that I've personally bumped into thankfully.

[+] whoopdeepoo|3 years ago|reply

We've had great success running celery applications in k8s, so it's surprising to hear dask was a problem for you. Especially considering dask recommends k8s as a deployment option.

[+] nogbit|3 years ago|reply

Love Dask. Very robust and therefore very easy to get wrong. When you need a longer term solution that uses Dask, it pays to architect things well, in advance vs on the fly in a sandbox.

[+] unknown|3 years ago|reply

[deleted]

[+] jeffwask|3 years ago|reply

First. DevOps is a culture not a job most places have so many DevOps roles because they are doing it wrong.

In the olden days of 10 years ago, most operations teams worked around the clock to service the application. Like every day there would be someone on my team doing something after hours usually multiple. Tools like Kubernettes, Cloud (AWS, GCP, Azure) have added significant complexity but moved operations to more of a 9 to 5 gig. Less and less do I see after hours deployments, weekend migrations, etc. Even alert fatigue goes way down because things are self healing. This is on top of being able to move faster and safer, scale instantly, and everything else.

Operations side used to be a lot of generalists admin types and DBA's. With today's environment, you need a lot more experts. AWS alone has 1 trillion services and 2.4 billion of those are just different ways to deploy containers. So you see a lot more back end roles because it's no longer automate spinning up a couple servers, install some software, deploy, monitor and update. It's a myriad of complex services working together in an ephemeral environment that no one person understands anymore.

[+] fertrevino|3 years ago|reply

The number of places that get the meaning of DevOps wrong is too high. So much that it is often easier to use it wrong in order to express an idea.

[+] etruong42|3 years ago|reply

New technology sometimes creates more work even though it makes the previous work easier. When the electronic spreadsheet was introduced in the 1980s, even though it made accountants more productive, the number of accountants GREW after the electronic spreadsheet was introduced. Sure, one accountant with an electronic spreadsheet could probably do the work of 10 or 100 accountants who didn't have the electronic spreadsheet, but accounting become so efficient that so many more firms wanted accountants.

"since 1980, right around the time the electronic spreadsheet came out, 400,000 bookkeeping and accounting clerk jobs have gone away. But 600,000 accounting jobs have been added." Planet Money, May 17, 2017, Episode 606: Spreadsheets!

[+] devonkim|3 years ago|reply

Kubernetes in a sense is very similar to Linux back in the 2000s - it was nascent technology in a hot market that was still absolutely evolving. The difference now is that everyone knows the battle for the next tier of the platform is where people will be able to sell their value (look at RedHat selling to IBM for the saddled legacy of maintaining an OS as a tough growth proposition). For a while people thought that Hadoop would be the platform but it never grew to serve a big enough group's needs back in 2013-ish and coupled with the headaches of configuration management containerization hit and it's now combined at the intersection of OS, virtualization, CI, and every other thing people run applications on in general. It may be the most disruptive thing to our industry overall since the advent of Linux in this respect (people thought virtualization was it for a while and it's shown to have been minor comparatively).

A lot of this stuff really is trying to address the core problem we've had for a long time that probably won't ever end - "works fine on my computer."

[+] majewsky|3 years ago|reply

In my opinion, the main benefit of Kubernetes for large companies is that it allows for a cleaner separation of roles. It's easier to have a network team that's fully separate from a storage team that's fully separate from a compute team that's fully separate from an application development team because they all work around the API boundaries that Kubernetes defines.

That's valuable because, on the scale of large companies, it's much easier to hire "a network expert" or "a storage expert" or even "a Gatekeeper policy writing expert" than to hire a jack of all trades that can do all of these things reasonably well.

The corollary from this observation is that Kubernetes makes much less sense when you're operating at a start-up scale where you need jacks of all trades anyway. If you have a team of, say, 5 people doing everything from OS level to database to web application at once, you won't gain much from the abstractions that Kubernetes introduces, and the little that you gain will probably be outweighed by the cost of the complexities that lurk behind these abstractions.

[+] moshloop|3 years ago|reply

High Availability, Scalability, Deployments, etc are NOT the goal of Kubernetes, they are features that are not exclusive to Kubernetes, nor is Kubernetes necessarily better at them then others.

The goal of Kubernetes is to improve the portability of people by introducing abstraction layers at the infrastructure layer - These abstractions can seem overly complex, but they are essential to meet the needs of all users (developers, operators, cloud providers, etc)

Before kubernetes in order for a developer to deploy an application they would need to (send email, create terraform/cloudformation, run some commands, create ticket for loadbalancer team, etc) - these steps would rarely be same between companies or even between different teams in the same company.

After kubernetes you write a Deployment spec, and knowing how to write a deployment spec is portable to the next job. Sure there are many tools that introduce opinionated workflows over the essentially verbose configuration of base Kubernetes objects, and yes your next job may not use them, but understanding the building blocks, still make it faster than if every new company / team did everything completely differently.

If you only have a single team/application with limited employee churn - then the benefits may not outweigh the increased complexity.

[+] habitue|3 years ago|reply

The thing you're noticing is the usual thing that happens when new labor saving technology is invented:

1. What people expect: less work needs to be done to get what you had before.

2. What people don't expect: more is expected because what used to be hard is now simple

So while it may have taken a few weeks to set up a pet server before and as a stretch goal you may have made your app resilient to failures with backoff retry loops etc. Now that's a trivial feature of the infrastructure, and you get monitoring with a quick helm deploy. The problems haven't disappeared, you're just operating on a different level of problems now. Now you have to worry about cascading failures, optimizing autoscaling to save money. You are optimizing your node groups to ensure your workloads have enough slack per machine to handle bursts of activity, but not so much slack that most of your capacity is wasted idling.

Meanwhile, your developers are building applications that are more complex because the capabilities are greater. They have worker queues that are designed to run on cheap spot instances. Your CI pipelines now do automatic rollouts, whereas before you used to hold back releases for 3 months because deploying was such a pain.

Fundamentally, what happens when your tools get better is you realize how badly things were being done before and your ambition increases.

[+] rconti|3 years ago|reply

Because everything has gotten bigger and more complicated.

It's like asking "if the computer saves us all so much work, why do we have more people building computers than we ever had building typewriters"?

Something can "save labor" and still consume more labor in aggregate due to growth.

[+] pixl97|3 years ago|reply

Jevons Paradox

[+] jljljl|3 years ago|reply

I think this Kelsey Hightower quote has summarized my experience working with Kubernetes:

> Kubernetes is a platform for building platforms. It's a better place to start; not the endgame.

https://twitter.com/kelseyhightower/status/93525292372179353...

Everywhere I've worked, having developers use and develop Kubernetes directly has been really challenging -- there's a lot of extra concepts, config files, and infrastructure you have to manage to do something basic, so Infra teams spend a lot of resources developing frameworks to reduce developer workloads.

The benefits of Kubernetes for scalability and fault tolerance are definitely worth the cost for growing companies, but it requires a lot of effort, and it's easy to get wrong.

Shameless plug: I recently cofounded https://www.jetpack.io/ to try and build a better platform on Kubernetes. If you're interested in trying it out, you can sign up on our website or email us at `demo [at] jetpack.io`.

[+] zelphirkalt|3 years ago|reply

The short answer is: Because of Kubernetes.

The longer answer is: When you switch to Kubernetes, you are introducing _a lot_ of complexity, which, depending on your actual project, might not be inherent complexity. Yes, you get a shiny tool, but you also get a lot of more things to think about and to manage, to run that cluster, which in turn will require, that you get more devops on board.

Sure, there might be projects out there, where Kubernetes is the right solution, but before you switch to it, have a real long hard thinking about that and definitely explore simpler alternatives. It is not like Kubernetes is the only game in town. It is also not like Google invents any wheels with Kubernetes.

Not everyone is Google or Facebook or whatever. We need to stop adopting solutions just because they get hyped and used at big company. We need to look more at our real needs and avoid introducing unnecessary complexity.

[+] caymanjim|3 years ago|reply

The premise of your question is invalid. Have you ever tried setting up a Kubernetes cluster and deploying apps in it? Kubernetes doesn't save work, it adds work. In return, you get a lot of benefits, but it wasn't designed to reduce human work, nor was it designed to eliminate devops jobs. It was designed for scalability and availability more than anything. Most people using Kubernetes should be using something simpler, but that's a separate problem.

[+] planetafro|3 years ago|reply

I don't know my dude, all 3 major clouds offer "canned" k8s services that you can set up in a ridiculously short amount of time with Terraform and your CI platform of choice.

I agree with some other comments in this thread about a general fervor in the Enterprise space to "modernize" needlessly. This conversation usually lands on the company copying what everyone else is doing or what Gartner tells them to do. Cue "DevOps".

100 percent agree with your comments on something simpler. I can't tell you how many times I've debated with our Analytics teams to just use Docker Compose/Swarm.

[+] bombcar|3 years ago|reply

Because many many companies herd pets using kubernets.

The number of single-server setups with kubernetes thrown in for added complexity and buzzwords I’ve found is way too dang high.

[+] MrBuddyCasino|3 years ago|reply

> "If this old way of doing things is so error-prone, and it's easier to use declarative solutions like Kubernetes, why does the solution seem to need sooo much work that the role of DevOps seems to dominate IT related job boards? Shouldn't Kubernetes reduce the workload and need less men power?"

Because we're living in the stone age of DevOps. Feedback cycles take ages, languages are not typed and error prone, pipelines cannot be tested locally, and the field is evolving rapidly like FE javascript did for many years. Also I have a suspicion that the mindset of the average DevOps person has some resistance to actually using code, instead of yaml monstrosities.

There is light at the tunnel though:

- Pulumi (Terraform but with Code)

- dagger.io (modern CI/CD pipelines)

Or maybe the future is something like ReplIt, where you don't have to care about any of that stuff (AWS Lambdas suck btw).

[+] jkukul|3 years ago|reply

I agree with this 100%. We're in the infancy of DevOps.

Ironically, "DevOps" started as a philosophy that developers should be able to do operations, e.g. deploy, monitor their apps without relying on external people (previously called Sys Admins, etc). Yet, we're at a stage where the "DevOps" role has become the most prevalent one. IMO things have temporarily gotten slightly worse to get much better later.

From the productivity standpoint, it is not acceptable that a Machine Learning engineer or a Full Stack Developer are expected to know Kubernetes. Or that they need to interact with a Kubernetes person/team. It is an obstacle for them to produce value.

Kubernetes is not THE solution. It's just an intermediate step. IMO, in the long run there'll be very few people actually working with technologies like Kubernetes. They'll be building other, simpler tooling on top of it, to be used by developers.

You already named few examples. I can name few more:

  - Railway.app
  - Vercel
  - Render
  - fly.io
  - probably many more under way

[+] eeZah7Ux|3 years ago|reply

dagger.io: "Developed in the open by the creators of Docker"

Hard pass.

[+] rahen|3 years ago|reply

Kubernetes can really help bringing more scalability.

All you need is to rewrite your application (think microservices), reduce cold latency (get rid of anything VM based such as Java, or rewrite in Spring or Quarkus), use asynchronous RPC, and decouple compute and storage.

Then you need an elastic platform, for instance Kubernetes, with all the glue around such as Istio, and Prometheus, and Fluentd, and Grafana, Jaeger, Harbor, Jenkins, maybe Vault and Spinnaker.

Then you can finally have your production finely elastic, which 90% of companies do not need. Microservices are less performant, costlier, and harder to develop than n-tiers applications and monoliths, and way harder to debug. They're just better at handling surges and fast scaling.

If what you want is:

- automated, predictable deployments

- stateless, declarative workloads

- something easy to scale

Then Docker Compose and Terraform is all you need.

If you also need orchestration and containers are your goal, then first try Docker Swarm. If you need to orchestrate various loads and containers are a mean and not a goal, then try Nomad.

Finally, if you will need most resources Kubernetes has to offer (kubectl api-resources), then yes, opt for it. Few companies actually have a need for the whole package, yet they have to support its full operational cost.

Most companies just pile up layers, then add yet a few more (Java VMs on top of containers on top of an orchestrator on top of x86 VMs on top of(...)), and barely notice the miserable efficiency of the whole stack. Well it's using Kubernetes, it's now "modernized".

[+] tapoxi|3 years ago|reply

From my experience, Kubernetes drastically reduces the number of DevOps people required. My current place has a team of 5, compared to a similarly sized, vmware-centric place I worked at a decade ago with a team of 14.

But DevOps means many things because it's not clearly defined, which also makes it difficult to hire for. It's a "jack-of-all-trades" role that people somehow fell into and decided to do instead of more traditional software engineering.

Also, from what I've experienced from our internship program, CS programs are really bad at covering these fundamentals. Students aren't learning such basics as version control, ci/cd, cloud platforms, linux, etc.

[+] oxplot|3 years ago|reply

Someone put it nicely when they said Kubernetes is like an operating system for containers. If you take linux as an analogy, it's clearly a non-trivial investment to learn linux and learn enough to be effective and efficient in it. Further time perhaps needed to achieve the productivity, functionality and performance of what you were used to on Mac or Windows.

Kubernetes definitely achieves this goal well, and in a relatively portable way. But just like any other engineering decision, you should evaluate the trade offs of learning a completely new OS just to get a simple web site up, versus running a nginx instance with bunch of cgi scripts.

[+] rednerrus|3 years ago|reply

Kubernetes is super linux.

[+] jedberg|3 years ago|reply

DevOps is a philosophy, not a job role. It's the idea that developers deploy and operate their own code. An SRE is often someone who helps make that happen, by building the tools necessary for developers to operate their own code.

In a small organization, you can get away with a sysadmin running a Kubernetes cluster to enable that. In a larger org you'll need SREs as well as Operations Engineers to build and maintain the tools you need to enable the engineers.

[+] ransom1538|3 years ago|reply

This an underrated comment right here. I think the entire industry is confused -- but they could have just read this comment.

416 comments