The Ops Identity Crisis

[+] _qc3o|9 years ago|reply

I look forward to the day when every software developer has half a clue about monitoring, logging, high availability, configuration management, orchestration/scheduling, performance tuning, build/deployment pipelines, data management/archiving, security and exploit mitigation, etc.

The full-stack developer like the serverless/ops-less future is a pipe dream. Most technology organizations barely even know how to build the software to begin with let alone figure out the right way to operate it.

[+] snovv_crash|9 years ago|reply

Not everybody develops applications that are hosted. Who do you think writes your hardware drivers, IDEs, display managers, browsers, command line tools?

The list you presented is about as one-sided as saying everybody needs to know how to write shaders or bluetooth drivers - it is niche to the work you do, and just because somebody isn't well versed in it doesn't mean that they don't have a large body of industry-specific knowledge that doesn't even feature on your radar.

[+] toomuchtodo|9 years ago|reply

The startup scene is rife with overconfidence that you have enough waking hours to know everything you listed in one role.

Regret I have but one upvote for you.

[+] lathiat|9 years ago|reply

As someone in the hosting industry, you should try pair the average Wordpress "developer" with the customer who wants 1,000 people to hit the site and place an order at the same time. Or just run a site at all.

I'm not sure how these guys survive on your average web host. Our team can (and) do a lot of low level and language level debugging to figure out issues. Most cPanel resellers would send you packing. I can't imagine the frustration the end customers often experience bouncing these issues around for weeks.

In other news, I'm clearly highly over qualified!

[+] photonwins|9 years ago|reply

And blatant disregard for performance analysis & capacity planning. Application is slowing down at 5000 QPS? Let's upgrade to 64 core 128GB Server and while we are at it, let's throw in a bunch of SSDs too. </s>

[+] formula1|9 years ago|reply

The world isnt guided by people who learn every facet and mechanism of every subject. They are lead by those that see an opportunity and have the humility, audacity and will power to learn just enough or make just enough to make a successful product. They dont care about standards, quality nor best practices.

The day when every developer has a hood understanding of distributed systems and devops is when there is a simple way to use it, we start learning programming concepts from s young age or when every developer graduates from MIT and only big companies are allowed to hire people.

I personally think its absurd to be on a high horse in the fastest pace ecosystem out there. We constantly need to look for te newest technology and are competing with highly specialized minds (and eventually AI) that can generally do our work 10x better than we can.

I mean these last couples months have been a PR disaster for docker which has been seen as a "standard" for years now. Though technology specific aspects is different than abstract concepts, very few flowcharts actual ACT as production code

[+] sroussey|9 years ago|reply

There are plenty of us. But there are more of them. Best of luck differentiating between the two. And I am being sincere.

[+] ownagefool|9 years ago|reply

As someones whos worked within both disciplines, I don't actually think it's terribly difficult to learn all those things. The problem is, we're all too busy learning the new shiney without ever really delving deeper down the stack.

I find it a bit sad that such smart people jump on the microservices bandwagon without learning how to prove their application is working or deal with the fact that the network will fail.

[+] jsudhams|9 years ago|reply

In small setup yes the developer knowing this will help,but in large companies I would assume this is what solution architect and CTO's do. If we do 80/20 heck even 20/80 with Infra experts and app architect and decide on day one the characteristics and limitations of the software to be developed then you don't need every dev to know the infra and every infra guy to know code. In my experience seeign other super support professional it is that knowledge of how app and tech works make it easy to support. But yeah i training people with this kind of knowledge. I would say 1 in 5 is capable but only 1 in 10 or 20 only make it due to time it requires to know all and able to do ad-hoc work. May people hate ad-hoc challenges everyday . But there some some who take it as day to day.

[+] emmelaich|9 years ago|reply

The only way to make that happen is to make them responsible for service.

It sharpens the mind when they know that.

Unfortunately this takes a culture change that management need to drive.

At the moment the typical developer contract might be only 6 months or a year so they are not invested in the proper running of their code.

[+] FLUX-YOU|9 years ago|reply

>I look forward to the day when every software developer has half a clue about monitoring, logging, high availability, configuration management, orchestration/scheduling, performance tuning, build/deployment pipelines, data management/archiving, security and exploit mitigation, etc.

Just keep piling on job requirements until all developers need 20 years of college to get a junior position. Everyone seems to have written an "X things every developer must know" article.

[+] bbcbasic|9 years ago|reply

When do they learn this stuff? More weekend reading and side projects? Aka free overtime.

[+] sigil|9 years ago|reply

It feels to me like title-centric corporate culture might be a part of the problem here. Are you an SRE? An SWE? Do you Build? Or do you Run? Are you a Software Developer or a Software Engineer? They're interchangeable to the author, but she includes both on the off chance that you the reader are sensitive to some fine splitting / quantum structure...

I for one welcome the blurring boundaries. It feels weird to say, sure, I wrote this code with pathological runtime behavior, but it's your problem now, Ops Person. How do you learn to avoid that mistake if someone else absorbs the pain for you (ie learns your lesson)? Personally I feel a bit cheated when this happens.

At some point in your career, you might get the nagging suspicion there are important lessons to be learned outside of your current role -- lessons that will make you N times better at your current role. If so, heed the call, take the extra responsibility, and level yourself up. Easier to do at small- to medium-sized companies, but not impossible at large companies either. Take the initiative!

[+] pmoriarty|9 years ago|reply

"Nothing is impossible for the man who doesn't have to do it himself." -- A H Weiler

[+] spc476|9 years ago|reply

At work, I'm a developer (of backend call processing stuff). I work closely with QA (one guy right now) to get the call processing back end code tested (answering questions about the product, the SS7 stack (which nobody, not even me, wants to muck with---it's nasty) and the regression tests which I wrote back when I was in QA). When it comes time to deploy an update to call processin into production, I'm there, at 2:00 am along with the ops team and QA in the deployment (to answer questions and check to see if everything is running right). If anything is wrong, anyone can initiate a rollback to the previous version (I've initiated a rollback once).

In our situation, our customers are the various Monopolistic Phone Companies, so there's quite a bit of work that goes into a deployment (some fairly nasty SLAs and what not), so I'm glad there's an ops team to deal with most of that red tape. Yes, it's annoying that I can deploy as often as I would like, but I understand the reasoning behind it (and there's more in production than just call processing, like billing, provisioning and updates to our smart phone application).

[+] dmourati|9 years ago|reply

I think the article gets this mostly right. I describe Dev and Ops as a continuum and DevOps as the concept that each side needs to get better at doing the others job.

Noops and/or serverless are both newer terms that are early on the hype cycle. I wouldn't get too bent out of shape about them.

My advice in this regard has been the following.

Devs: start thinking more about running your code in production and solving user needs. Works on my laptop is over. There are a whole slew of things that you need to understand to build at scale. Talk to your Ops people. Especially those who have the good sense to show you mutual respect.

Ops: Move up the stack. The days of providing just OS support and load balancing are gone. You have to learn to code. Spend time in IDEs, work in source control, do code testing. Learn from your Dev counterparts. Especially those who have the good sense to learn from you.

[+] perlgeek|9 years ago|reply

This article seems to assume that everything runs in the cloud. If not, Ops still have a big part in operating the self-service infrastructure that Susan talks about.

Another point I'd like to make is that if developers need to take on more operational responsibility, it creates another barrier on entry. As somebody with a family, I don't want to be regularly on-call. And on the technical side, developers already need to know about the problem domain, the programming language, algorithm and data structures, design patterns, testing, version control and so on. Piling more operational knowledge onto this heap seems hardly promising to on-board more developers, and finding good developers outside the big tech hubs is already a pain.

[+] donavanm|9 years ago|reply

I've been a bit perplexed by the industries obsession with "operations" for the past decade. Constantly striving to decide what is it, who does it, and whether it's shunned or exalted. My current workplace has convinced me that "ops" is simply a friction organizations pay, like "tech debt." How you prioritize and minimize it is a business decision, not a life calling.

I suspect my employer is actually the largest (many many thousands of employees) and oldest (15-20 years) practitioner of the "you deploy and own what you write" method. There are nearly zero people in "operations" roles, compared to tens of thousands of Developers. The "Systems" folks who might be called SE or SRE or PE or DevOps somewhere else sieve in to three roles: 1) Specializing in development below the application & (slightly) above hardware/os 2) Developing tooling and systems around infrastructure & distributed systems management 3) Saying #2, but mostly driving manual or adhoc actions

Groups that naively pursue #3 seem to implode or catch fire after 12-18 months. They can succeed if its an intentional choice; employees are a resource and not all problems require further investment. Investing in #2 drives down the "ops" cost on other Developers and/or improves returns on infrastructure investment. Category #1 improves medium & long term returns on the software & services that group develops.

A while back the job role was changed to accentuate that the goal is Build value by Development. The specific flavor of development is less important. My job role is Systems Development Engineer and I work with Software Development Engineers. SDE collaborating with SDE. When I need to go beyond my domain knowledge I might consult a NDE.

In short I agree with the article summary. There will never be a "post ops" world. But getting over the "ops" title obsession feels good to me.

[+] digi_owl|9 years ago|reply

(Web) devs are user facing and sexy, admins stomping the server room aisles are not (or so the reasoning seems to go).

This appears to have been a prevailing notion since the dot-com days, but has acquired new energy since "cloud computing" became a management buzzword.

[+] llama052|9 years ago|reply

I've always seen operations folks in a different light then most developers. Maybe I'm an exception but I've always viewed operations people as having in-depth knowledge of infrastructure on the lower layers (Session layer and lower). They are able to walk through the inner-workings of network concepts, operating systems, etc. They can implement and design systems that work in a highly available, scalable way.

Not saying developers CANT do this, but a majority of them don't want to, and don't have the first idea on how to. Granted some "10x" developers who have full stack knowledge can do just as well, but let's be honest most people aren't 10x developers, they want to mess with the code stack and that's it.

This reminds me of the all the job posting you see online that want you to have

          expert knowledge of C++
          expert in SQL DBA 
          expert in networking 
          expert in web design
          expert in big data

Yeah you might be able to find someone who can do all of the above, but odds are they won't do them all well.

I don't think the expectation should be set that developers should have to manage the systems stack, as well as manage the code.

Personally, I can't imagine having the expectation set that I need to be on call, and work on code commits with deadlines, while debugging networking issues in production.

Even with configuration toolsets, and infrastructure moving to "code-to-deploy" solutions. Things will still break, unusual things will still happen. Taking developers out of their zone to focus on problems will slow down the entire company.

Ops can can always push to 'strive to automate themselves out of their jobs' but I'd argue that this is an endless job which always has outlets that you can continue to strive and build into.

Of course, this is talking from my personal experience, I've never worked in a big company like Uber, so the environment might be entirely different from my own.

[+] donavanm|9 years ago|reply

> Personally, I can't imagine having the expectation set that I need to be on call, and work on code commits with deadlines, while debugging networking issues in production.

This seems to be the meme of "the supporting infrastructure always breaks and causes the pager engagements!" I have seen those orgs and times where networking or facility failures are the leading cause of outages. Long term those are symptoms of chronic underinvestment and tech debt accumulation.

I also see, on a daily basis, a massive business where the absolute leading cause of "outage minutes" is software defect or deficiency. With investment in supporting infrastructure you're paged 10 times for a defect your team "developed" for every one unavoidable dependency failure. Big companies can make those (huge) investments themselves. Small companies can pay someone else for access to theirs. In any case its always a business decision, not a certainty.

[+] davidgerard|9 years ago|reply

I explain my job to nongeeks as "computer roadie". My job is to make sure everything is in order, devs' job is to get up there and be Eric Clapton.

People can do both (many roadies are really quite capable musicians). But they're fundamentally different mindsets, and expertise in one is not transferable to the other.

[+] toast0|9 years ago|reply

> Personally, I can't imagine having the expectation set that I need to be on call, and work on code commits with deadlines, while debugging networking issues in production.

"sorry about the deadline, I had to make the fine network work" gets you out of making deadlines.

The nice thing about being on call for your own code is that fixing your stuff at all hours is a natural consequence of writing fragile stuff. The not nice thing is that not all of the reasons your stuff breaks are your fault and some issues aren't realistically preventable.

[+] morgante|9 years ago|reply

> Personally, I can't imagine having the expectation set that I need to be on call, and work on code commits with deadlines, while debugging networking issues in production.

Really? As a dev, the vast majority of my projects/roles have required all 3 to some extent. I have been on call for the past 4 years.

Honestly, people really need to admit that the age of the sysadmin (who is not also a strong programmer) is over. I might not have enough deep knowledge to answer network/OS trivia off the top of my mind, [0] but I'm perfectly capable of running a production build for my code (with monitoring, configuration management, etc.) and finding appropriate resources in the instances where problems are outside my domain.

[0] https://news.ycombinator.com/item?id=12701272

[+] advisedwang|9 years ago|reply

There is a trade off between development velocity and reliability. Not just because making more changes often breaks more things, but also in how much investment is made in infrastructure, monitoring. It even shows up in design choices - making a service multi-regional is more effort but results in a more stable system.

A big part of SRE is motivation for reliability. Part of the reason to have SRE is to have engineers motivated by stability, not development velocity. Have somebody who knows the better choice for reliability, even if at the time a decision is made the other way. This necessitates both a separate profession and organisation separation (e.g. for product readiness reviews or for higher level stability decision making).

(NB: I am a Googler that works closely with SRE, but not an SRE myself)

[+] dkarapetyan|9 years ago|reply

I think this is a false dichotomy. If developers are incentivized to make stable software then they'll make stable software but that's not the case. Software engineers that work on products are promoted based on number of features they ship, not how many production outages they don't cause. It's like the senator that lobbies to put bolted doors on plane cockpits before 9/11. That senator will get zero credit for anything. Fundamentally it is harder to measure the effectiveness of preventive measures so most organizations don't and instead settle for number of features shipped.

[+] solipsism|9 years ago|reply

I don't know who writes/says dumb things like "ops is dead" but ideally we wouldn't waste time arguing against such reductive statements. Whatever the zeitgeist, there will always be homogeneous groups (all developers know/do roughly the same things), structurally heterogenous groups, and organically heterogenous groups.

[+] drinchev|9 years ago|reply

I have network knowledge, linux knowledge, debugging knowledge, some best practices about logging knowledge and still ( as a developer ) I think dev-ops needs to be separated to a different role.

Indeed, you don't need a dev-ops person for your MVP or small web-app, but once you can afford it and the users you have require you to do `no-downtime-deployment` mechanism or database replication or server cluster, I think it's too much to ask your developers to do that. Anyway they will have to spend months on reading about the latest and greatest practices and in the end will have something half-baked, compared to what a dev-ops engineer can do for the same amount of time.

[+] tristor|9 years ago|reply

The entire conversation around "ops-less" systems ignores that in most cases the Ops folks are stronger engineers than the SWEs. If a company wants to ditch Ops they'll need to pay far more money for SWEs, expect a higher competency in engineering tasks, and good luck ever outsourcing again.

[+] kod|9 years ago|reply

Citation needed. I've personally never seen an org where ops were stronger enngineers than dev.

[+] jzelinskie|9 years ago|reply

If you liked this post and want to know the full details to this change in ops, please read the SRE book[0]. It's a great read for both devs and ops and can immediately help you make changes to company policy for the better.

[0]: http://shop.oreilly.com/product/0636920041528.do

[+] pmyjavec|9 years ago|reply

The naming of these technologies and paradigms is unfortunate in my view, it's now made Ops sound like a bad thing. Noops, Serverless?

As a long time DevOps / Sys Admin / Generalist I feel disappointed lately because there seemed to be a very brief golden age, which in my opinion was DevOps done right, and that was just getting polished and accepted, then it feels like for little good reason except for marketing or something, that was just thrown out the window ? It was really getting results in my last org, basically meeting half-way with devs felt like the sweet spot and now it's going to extremes.

I was really into investing my time into the DevOps / SRE role, now it just feels demotivating, as any good Ops knows it's a tough job that requires dedication, but is it worth the effort anymore ? Will people still want to hire ops? Should I just move into Software Engineering (which I can do), full time ?

I think she hit the nail on the head to be honest.

[+] digi_owl|9 years ago|reply

I get the impression the problem is that it's with ops/admins as it is with safety inspectors. When they do their job right, nothing spectacularly fails, and thus management starts wondering why they have this salary expense on the quarterly spreadsheet.

developers on the other hand go hand in hand with marketing, and thus is easily noticed when they do their thing right.

[+] latch|9 years ago|reply

I agree. Quality is improved as developers start to understand, get involved in, and own the operation side of running their system. Ops ought to be an enabling force.

But, I do wish that anytime anyone writes about ops or infrastructure, they put a big headline:

    There's a 99% chance you just just Scale Up and use Bash.

[+] scurvy|9 years ago|reply

When it comes down to brass tacks, developers don't want to be on call. They want to work on the things that they want to work on, when they want to work on them. Look at the proliferation of 20% projects ("keep them happy and let them do what they want 1 day a week rather than what they get paid to do"), and the outright refusal to fix their own bugs. How many developers do you know who would turn their noses up at being placed on a sustained engineering team rather than creating new features? There's a stigma there. You think that stigma won't be there at 3AM Pacific when Europe starts hitting their new feature to the breaking point?

I've been an ops engineer in a "lean startup" where developers were on-call for their services. It didn't work super well because people ignored their phones or put phones on silent at night. As a backstop, they put me (ops engineer) as the fallback secondary notification because they knew I would wake up. Ergo, everyone ignored everything and it all rolled to me. They'd wake up at 7AM, find everything ablaze (because they ignored all my phone calls), then would fix it and go about the rest of their day.

Let's face it though. They probably don't want to be on call. Why force them to do this? There's the concept of ownership and closing the pain loop, but most people really don't understand what it truly means to be on call. "Sorry honey, can't go to the movies tonight I'm on call." "Sorry bro, can't get wasted tonight I'm on call." Only huge organizations with huge dev teams can go through a developer on-call rotation. Most leaner (smaller) companies have 1-2 devs per project, and it's unreasonable to expect that developer to be on-call 24x7.

This stuff works at Uber, Facebook, and Google. But the vast majority of the world isn't Uber, Facebook, and Google.

Also, I'd expect more pay as a developer if the job required on-call shifts. I don't think companies are willing to pay even more than they already are.

[+] throw2016|9 years ago|reply

These cycles keeps repeating themselves. Some marketing driven term gets traction and then people start believing the hype, repeating it as some sacred truth and dismissing experience as grey beards.

A few years down the road when things don't go according to plan some other term gets traction and rinse repeat.

HN especially is guilty of perpetuating hype when one would expert a far greater degree of scrutiny.

When you get to the nitty gritty of scaling from networking, distributed storage, failover, high availability, security and managing state that's entire domains of expertise and experience that devops glosses over.

[+] skarap|9 years ago|reply

To rephrase the article: everyone should and can go ops-less, you just need the devs to take over the ops roles.

[+] n72|9 years ago|reply

Anyone know the best place to learn more about kind of basic ops that a dev should know? I've done deployments, spun up AWS instances, configured load balancers etc., but my problem is that I don't know what I don't know. For example, when starting to debug, I get in there and muddle around, but there may be far more efficient way of doing it which I just don't know about. I'll watch ops guys use ps, netstat, etc., which are things I don't use, but presume are useful.

[+] jimjimjim|9 years ago|reply

back before the agile dark ages, things like performance, logging/instrumentation, stability guidelines, security guidelines and deployment abilities were able to be specified as non-functional requirements and the product was QA'd for these just as much as the features that were added.

[+] kasey_junk|9 years ago|reply

Blaming agile for this seems odd. I've worked on agile teams where all of those were set as requirements.

[+] pmoriarty|9 years ago|reply

It's gotten a lot easier to learn and practice ops over the years.

No longer do you need access to a university or government lab to get your hands on Unix. Nor do you have to scour obscure corners of university libraries to get your hands on some wizard manual that finally makes sense of some bit of it for you.

Tons of free, quality tutorials are available online, and you can get help on forums and in chat groups. Online book stores are overflowing with books on just about everything you'd want to know.

Unix (or Linux) has become a lot easier to use in many ways, you can practice on VMs, and anyone who wants it can have root on their own machine. The tools have gotten a lot better too (though both the tools and the OS's have increased in complexity, layers, and interaction with other systems). Cloud providers make spinning up machines, network infrastructure, and various services easier than ever.

Computer literacy has become many orders of magnitude more common than it once was, and a lot of devs grow up being admins of their own Linux systems.

In many ways, it's never been easier to learn ops, to some extent. The same could be said for development, with languages, tools and training being far more available than they once were.

That does definitely reduce the need for a dedicated ops team or a dedicated dev team to some degree. But just as in medicine sometimes you need a specialist who's had the training and a lifetime of practice in that speciality, and when a generalist's knowledge is not enough, I think there'll always be roles for ops and roles for devs.

All other things being equal, a dev who mostly does development and dabbles in ops just isn't going to get the level of professional skill in ops as someone who focuses mostly on ops does.. just as someone who focuses mostly on ops and dabbles in development is probably not going to be able to achieve the same level of development skill that someone who does a lot of development day in and day out will.

It's like someone being both a brilliant brain surgeon and a brilliant hand surgeon. They do have something in common: they're both medical specialities that treat the human body and they both require going to medical school, but being great at both is still rare, and if I ever have hand surgery I'd usually prefer to be treated by someone who's done thousands of hand surgeries and specializes in that, not one who's mostly a brain surgeon who's occasionally operated on hands.

Some people are able to straddle both specialities and do an excellent job at both, but those are relatively rare, because the amount of knowledge and experience you need to do really master both is still quite large, despite everything. This knowledge also changes quite rapidly, so you have to spend a lot of time keeping up with new languages, frameworks, tools, services, etc. That's a lot to ask for even for one speciality, never mind two.

[+] antod|9 years ago|reply

I agree with your points about it's never been easier to learn. But I personally think that needs to be balanced with the notion that I don't think things have ever been this complex before.

The last 5 years or so have seen an explosion (cambrian?) in the number of tools and platforms in use - most of which are immature to put it politely.

By the time failure modes of these new tools are well understood and fixed, the market has moved on to the next new hotness.

The knowledge and experience of old grizzled Unix greybeards of the past might've been harder to gain, but it seemed to have served them for much longer before becoming obsolete.

[+] gaius|9 years ago|reply

No longer do you need access to a university or government lab to get your hands on Unix. Nor do you have to scour obscure corners of university libraries to get your hands on some wizard manual that finally makes sense of some bit of it for you.

It's funny, I can still remember a conversation I had it about 1995 with a colleague, we were certain that with this new Linux thing, *BSD at so on, now that everyone could get their own Unix to play with, the specialized sysadmin and the dedicated C programmer were totally obsolete, everyone would have these skills. This was around the time remember that a "real" workstation would cost 20 grand at least, and the compiler would cost as much again...

That obviously didn't happen, 20 years later, so I think we can reasonably conclude that "access to systems" was never actually the problem.

[+] k__|9 years ago|reply

All the ops over 40 I know didn't even see a uni from the inside.

Most IT people here started doing something different and tgen switched to IT.

The younger the people, the more academical they are.

[+] unknown|9 years ago|reply

[deleted]

93 comments