top | item 27384701

How a Jenkins Job Broke our Jenkins UI

135 points| FraBle90 | 4 years ago |slack.engineering | reply

149 comments

order
[+] zug_zug|4 years ago|reply
Since everybody seems to be hating on jenkins so much, I'll speak up. IMO jenkins is one of the most valuable tools at any startup and every good engineer uses either jenkins or something very similar.

There's no comparison between managed/logged/permissioned/distributed jobs that jenkins provides for free vs building an overwrought service or an insufficient crontab. However it's a power-tool and I think a lot of people go in expecting something dead-simple and pretty and are put off by something that they need to invest in learning.

Just for example, in 2 days I built an uptime checker that ran every 60-seconds via jenkins and triggered slack alarms/sms on problem, and it just worked (< 2 days total maintenance), for years. An equivalent service (Pingdom) quoted 10k a year.

[+] codeduck|4 years ago|reply
I've been using jenkins for over 10 years.

There is no finer system for turning quick CI hacks into entrenched technical debt.

Jenkins gives you the freedom to do things without the burden of asking whether you should do those things. Jenkins permits you to munge a whole lot of separate domains into a single, creaky system.

There are better tools available for most aspects of CICD and automated testing these days.

[+] munk-a|4 years ago|reply
I think Jenkins is a very powerful tool but I would strongly disagree that it's quite valuable in a startup environment. Jenkins has struck me as a philosophical sibling as C++ - it offers a plethora of foot guns to the user that can also be used to accomplish rather good things. I would advise Jenkins to companies with projects that don't require agile responses to user requirements where the functionality desired at the end is very clear from the beginning - since the times I see Jenkins fail the most are when the configuration of jobs on it are being updated and reconfigured rapidly.

Jenkins tends to suffer from extremely poor state management and experimenting with things can cause a permanent loss of value to companies if backups are not properly configured - compared with a system like ansible where recipes will grow and be regularly committed to a repository Jenkins doesn't have a nature version control system and as a result strikes me as extremely brittle.

I almost wish the UI for jenkins simply didn't exist and it actually was just a whole mess of config files and shell scripts so that it could be locked down to a much better degree.

[+] clutchdude|4 years ago|reply
As someone who runs a large build cluster at a major corp, Jenkins is clunky and outdated. It's a pain to deploy and keep maintained. I'd rather stub my toe than figure out why some event isn't being properly processed in the bowels of Jenkins.

However, when coupled with Kubernetes, Jenkins is more powerful than almost any other CI tool.

You can orchestrate any iteration of build types or actions while easily providing the underlying resources transparently to your users.

Notice that almost every other CI tool mentioned here is either 3rd party hosted or severely limited in some fashion when compared to Jenkins.

[+] aranelsurion|4 years ago|reply
Seems to me people hating on Jenkins are not hating on the idea of a central automation service/task runner, they hate on Jenkins itself.
[+] jordanbeiber|4 years ago|reply
You could use a tool desgned for it and probably be better of, since upgrading jenkins and plugins, as everyone knows... is not always exactly pain free.

Regarding your batch scheduling solution - take hashicorp nomad for a spin: a single go binary and your scheduled job can be declared in 10 lines of yaml. Won’t miss a beat.

[+] orlovs|4 years ago|reply
Don't abuse CI/CD system as workflow automation tool. There are better and much easier alternatives.
[+] orf|4 years ago|reply
Jenkins is endlessly flexible, but at the end of the day its complexity adds such an upfront cost that less flexible tools win outright when the metric you care about is the “ability to get shit done without giving a fuck about groovy sandboxing”.

A system like Gitlab-CI is less flexible than Jenkins but because it makes 99% of the use cases you have for a CI system take 99% less effort it wins hands down.

Combine it with a Kubernetes-based executor and you have a scalable, isolated, reproducible and flexible CI system that requires basically no maintenance and most importantly is about as approachable and understandable as you can possibly get for a CI system. It’s simple shell commands vs AbstractProxyCSPFactoryGroovyBean classes.

[+] bob1029|4 years ago|reply
After a while it got easier for us to write our own code to automate things than to try and piece together a wordpress-like component system.

We still have Jenkins running somewhere, but I can't remember the last time I needed to run a job on it.

[+] noisy_boy|4 years ago|reply
> Combine it with a Kubernetes-based executor and you have a scalable, isolated, reproducible and flexible CI system that requires basically no maintenance and most importantly is about as approachable and understandable as you can possibly get for a CI system

You can also setup a docker image with all the Jenkins related bits installed (including any custom/specific setup) and then that can be integrated with whatever Kubernetes or Docker setup you need. Jenkins also has concept of slave executors and they can be deployed to any build nodes (fixed or on-demand).

> It’s simple shell commands vs AbstractProxyCSPFactoryGroovyBean classes.

No idea what this is about - you don't need to write Java or Groovy classes to run Jenkins jobs. One of the most common use of Jenkins is to run steps via scripts (Shell/Python/Perl/whatever).

Sure the UI looks a bit dated but from what I have seen, it hardly needs much maintenance either.

[+] fnord77|4 years ago|reply
we've had great luck with gitlab-ci. Jenkins is just a nightmare.
[+] oblio|4 years ago|reply
The thing is, Gitlab & co. almost all mandate Docker.

You don't use Docker, the list of options narrows down dramatically.

You need to support non-Linux OSes, same thing.

If your needs are limited, yeah, something other than Jenkins is better. If they're not... I don't think there's a better alternative to Jenkins.

[+] schoolornot|4 years ago|reply
Jenkins has always rubbed me the wrong way. The quality of plugins, the dated UI, Groovy. It just feels out of place and bloated every time I use it. Being a Java app doesn't help either. It reminds me of Jira where you have to hack that death to make it "fit in" to common workflows. I'll take any of the alternatives over it.
[+] radicaldreamer|4 years ago|reply
Despite everything you said being true, it's still one of the few open-source, free solutions in the mobile build space.

I'm wondering if Buildkite or something newer is comparable today, but for a long time, Jenkins was one of the only non-custom ways of building an in-house iOS/Android build system.

[+] lelandbatey|4 years ago|reply
To throw my hat into the ring: I've found the best "runnable jobs" / "digraph of jobs" system I've ever used to be Gitlabs built in CI system. The UI is nice, it's got tight integration into our existing code workflows meaning there's a close-but-not-too-close association between code and jobs meaning it's easy to say that "the code and the jobs running that code live 'in the same place'" if you want that, and it still provides all the pipeline features you'd want if you want to use your CI/jobs as a cornerstone of your infra; features like the ability to schedule pipelines, to trigger pipelines via other pipelines, remote triggering of pipelines via cURL request, pipeline DAGs for a 'make-like' build experience, etc.

I've absolutely loved using Gitlab CI for the last several years, and I highly recommend it.

[+] encoderer|4 years ago|reply
We (Cronitor) are seriously considering diving in here. People already install Cronitor to remotely monitor their background jobs, we have a lot of the UI and data platform challenges solved, and we are building a control plane to allow you to securely invoke your jobs remotely (via polling). I'm not totally convinced we should go this path vs better and deeper monitoring and metrics capabilities, but we're contemplating it because, honestly, Jenkins needs to be replaced.

(Readers, if that sounds interesting, and you have some Kubernetes and Golang experience, we are hiring!)

[+] 2OEH8eoCRo0|4 years ago|reply
What are good alternatives?
[+] Jenk|4 years ago|reply
Jenkins (when it was still just Hudson) was the forerunner for CI. It made a lot of mistakes but only because nothing else was there at the time to learn from, in my humble opinion. The only alternative in the early days was CruiseControl (which became GoCD) and of the two, Jenkins was far better and more advanced.

I made extensive use of Hudson/Jenkins from 2007 to about 2015 and it has flaws, sure, but I know it made some pretty difficult tasks sane for me, and was pretty straightforward building CD pipelines.

I didn't (and still don't) like the decision to adopt groovy but it is better than configuring via UI and I like that it is imperative, at least.

Some of the jobs I built using Jenkins pretty rapidly include full CD from svn push/git commit to production with all the bells, whistles, gates, and stages in-between, to managing failovers, and even my early foray into IAC with rudimentary remote exec scripts and the likes of chef.

I think it became a victim of its own success. It was _flooded_ with contributions that yanked Jenkins in all kinds of directions with no clear owner for direction and maintenance, which lead to frankly some horrific (but "working") architecture within and a jumbled mess of extensions vs plugins vs patches and all kinds of horrific UI changes, often all conflicting.

I believe it was Gojko Adzic that wrote up a blog article (that I can't find) about 10 years ago listing some of the truly horrendous code in Jenkins source. Stuff like abstract classes typecasting themselves to derivatives to access members.

Looking back Jenkins was and is clunky, messy, uncertain of its purpose in life. But so was just about everything to do with CI at the time. Would I use Jenkins again in 2021 and beyond? Probably not but it definitely added value to the build technosphere.

[+] unscaled|4 years ago|reply
I think you're talking about this blog post: https://web.archive.org/web/20110410011410/https://gojko.net...

I took a peek at the code and it looks like nothing has improved since then. For instance, the Hudson class is now deprecated, and became an empty shell inheriting from Jenkins - which is still a singleton with a public constructor, only now you have to know that the instances of it created must actually be Hudson instances since they're being downcasted to that all over place... Ouch.

[+] TomBombadildoze|4 years ago|reply
Good gravy, this is a lot to unpack. It's an alarming story from the very beginning, and a cautionary tale of how tempting it is to do everything with Jenkins, even though it's an appropriate tool for absolutely nothing in the Year of our Lord 2021.

> As part of our automation setup, we continuously run integrity jobs to inspect our Jenkins nodes.

Why on earth would you self-host this in Jenkins? This is a monitoring and alerting problem.

> These jobs check system configurations and properties and look to see if any node is failing those checks.

What year is it? We've solved this with immutable infrastructure or system integrity monitoring. Or both.

> The checks automatically mark Jenkins nodes as offline when any of those checks fail and notifies our Mobile Build & Release team via a Slack message.

"Mark" offline? Why not just terminate it? And why do we care if build nodes come and go? These should be cattle, not pets. If they all die at once, that's bad. If they're cycling in and out, that's business as usual.

> When our Jenkins UI stopped working, we noticed two things:

> 1. We had recently upgraded Jenkins and all its plugins to a newer version

Did they just now learn what an awful idea this is? All of this at once, really?

This isn't so much a Jenkins problem (though let's be clear, Jenkins is a problem) as it is a remedial engineering problem. The top takeaways should be "choose appropriate tools for the task at hand" and "don't make reckless decisions with brittle systems".

[+] gwilikers|4 years ago|reply
> "Mark" offline? Why not just terminate it? And why do we care if build nodes come and go? These should be cattle, not pets. If they all die at once, that's bad. If they're cycling in and out, that's business as usual.

Given that they are for mobile builds, there might be some macOS nodes in there for iOS builds. These might be in-house machines they maintain -- or, if they use a cloud provider, there might be costs to just killing and spinning up nodes. For example, for EC2 Mac instances:

> EC2 Mac instances are available for purchase as Dedicated Hosts through On Demand and Savings Plans pricing models. Billing for EC2 Mac instances is per second with a 24-hour minimum allocation period to comply with the Apple macOS Software License Agreement.

[+] hinkley|4 years ago|reply
I think it's a frog boiling problem.

I start with building my code, then deploying it, then verifying the deployment, a few smoke tests, regression tests, pretty soon all of those concepts are crowding in on the brainspace of monitoring.

It's just one more thing, why slow down to learn a new tool and convince people to use it?

These days it's getting easier for me to requisition a machine to run a dev tool on. That hasn't always been the case, and I'm sure it's not the case everywhere.

[+] c7DJTLrn|4 years ago|reply
It's horrifying that Jenkins is still the industry standard. The whole thing is poorly documented and full of cruft and vulnerabilities. But there is nothing out there as flexible to my knowledge.
[+] TheGuyWhoCodes|4 years ago|reply
Most of the flexibility comes from the plugins which are a security risk. Either they are just abandoned, have very old dependencies or just don't sanitize inputs.

You'd think Cloudbees would take over abandoned plugins, integrate into the main code or just remove them from the repository for safety but they just let them rot.

We had one of the plugins we use brake after an upgrade because of the dependency hell in Jenkins so we ended up contributing to the plugin to remove the dependency. Thankfully the maintainer was still around to verify our fix and update the plugin repository (we obviously built it locally and tested).

To think that anyone could adopt an abandoned plugins (which could have 1M installs) and just insert some malicious code, with minimum or no oversight is really scary.

[+] jalk|4 years ago|reply
Its the Nagios of CI tools ;-)
[+] colek42|4 years ago|reply
I have to estimate time and materials for a lot of DevOps contracts. I estimate twice the hours, or more for Jenkins CI work vs GitLab CI even if my engineer is an expert in groovy. The complexity of Jenkins adds a huge amount of risk.
[+] bdefore|4 years ago|reply
Circa 2009, back when it was Hudson (I think?) I once had an idea to rename MyJenkinsProject to !MyJenkinsProject in order to move it to the top of the list alphabetically. When I hit save, the UI explained that this wasn't possible _and that I shouldn't be putting dangerous characters in my project names_. Not to be pushed over so easy, I tried again with a skull and crossbones unicode symbol () in the name. The UI immediately became unresponsive and wouldn't start again until the project was removed.

edit: Interesting HN also stripped out the character: https://www.fontspace.com/unicode/char/2620-skull-and-crossb...

[+] stkdump|4 years ago|reply
Lexicographic ordering puts non-ascii unicode characters after ascii characters
[+] cbushko|4 years ago|reply
I ran Jenkins at 2 companies for probably a total of 13 years.

I will do everything in my lower to never use it again as I feel that I wasted so much time fighting it.

Plugins, dependency hell, slow UI and so many other terrible things about it. Having to back the entire thing up on every change because you may never get it back into a usable state if you change something. That even happens if you have scripts in git to set the whole thing up. What a waste of time.

In contrast, I had just as good of a system running in Gitlab in a week and more importantly the other developers are able to pick it up and extend as they wish.

[+] jordanbeiber|4 years ago|reply
We moved to drone after many years of a love/hate, abusive relationship with Jenkins (junkins was the common name).

Drone has this awesome feature where you can have it hook out and receive a pipeline on the fly. We now generate our pipelines in an api and this way we can write the logic in ”something other than groovy” - typescript in our case.

No pipelines required in repo and everything as code without ugly hacks.

Never looked back even once.

[+] Game_Ender|4 years ago|reply
I would simply stay away from Jenkins if you are getting started from scratch. I have used it for months, and the amount of effort you have to poor into it, and still get scaling and outage issues is not really acceptable.

In contrast when using Buildkite [0] you get essentially all the power and flexibility of Jenkins, but without the crushing technical debt, inherent flaws and complexity. Benefits I have seen of Buildkite over Jenkins:

- Never have to worry about scaling a Jenkins master again

- Build history lasts forever, so no need to setup a system to save logs

- Everything can be in your repo, so you are always confident of changes and testing is easy

- The UI is easy to understand and you can link directly to failed log lines

- Control your own workers, and easily setup autoscaling without any drama

- Flexible plugin and annotation system allows for extensibility

0 - https://buildkite.com/features

[+] zmmmmm|4 years ago|reply
> It gets executed in a special Groovy sandbox to increase the security posture

Not a jenkins user but I am really curious about what the perceived security issue is here? Why are there all these layers of protection placed on what is presumably internal infrastructure? I can't think of similar protections being applied to other CI systems where you can run arbitrary bash commands and containers (aka: do anything you like). It seems to be one of the most common pain points, but I can't quite understand why it's there in the first place.

[+] yebyen|4 years ago|reply
Because you can embed credentials in a Jenkins installation (and when you use Jenkins, we do.)

It inevitably gets connected to resources which can cost money, and since it's an internal infrastructure system, it will inevitably be connected with resources which contain replicas of private information. "Because why not, it's an internal system which means it's perfectly safe, and we should really be testing with real user data, you know, for realism."

And once you cross that bridge, you have wholly and truly gone completely radioactive. This is the intersection of financial risk/attack surface and private customer data, and execution of unproven code. If we send new Groovy scripts through an approval process, we can at least narrow the risk of accident or intentional disclosure by manually vetting scripts before they are run for the first time after changes, (or effective sandboxing so they cannot escape and access for example, any secrets that were not intended for them.) But, roundly, one can argue there's not much to be done, as it is an internal system, and each control we place in the way can easily be seen as an obstacle to getting the job done straightforwardly.

Then think of this also: maintaining Jenkins is known to be an operational burden, to say the least. If you have different departments with their own independent need for Jenkins or something like it, neither of these departments is going to want to own the Jenkins instance if it means being generally responsible for uptime and upkeep. You can bet that someone higher up is going to see this as a tremendous opportunity to consolidate and save money. They're going to wind up running on the same environment together, operated by someone who has no idea what either of these departments is really up to.

If you're lucky it's only two tenants, and that shared-service admin department responsible for the Jenkins instance is going to actively pursue the common interest of keeping things secure on behalf of everyone. What's more likely is, shared services platform team only ever hears from their Jenkins customers when something has ceased to function and their Admin access is needed, and for those people it's basically hair-on-fire can't-work-until the admin guy can be reached. After ten or fifteen times and some political pressure, their boss says this is taking too much of our department's time and we're not adding value, so they decide to shirk the admin duties, and give admin credentials to a "designee" from each team.

Now through diffusion of responsibility, nobody is in charge, and everyone is too afraid of breaking anyone else's stuff to ever upgrade even the most basic stuff.

That's right, now you have potentially private customer data, from multiple departments, with access to secrets that may control resources which can be scaled up to cost money, in an environment where code runs for the first time (potentially even code from third party contributors, but not likely if it's an internal instance... right? ...) code running before it has been tested by anyone, in a multi-tenant environment that nobody wants to pay or spend time to maintain, with competing czars that don't really have much incentive to talk to each other, where nobody has enough visibility to safely upgrade anything for fear it will break something and step on somebody else's toes, and if it goes down or stops functioning for basically any possible reason because of any of those teams, the maintenance of business is completely halted until we take care of it; really it's basically even worse than having cats and dogs living in the same house.

Sure, you can implement policy to solve any one of these things individually, but if you write enough policy to make it truly safe, then people will hate you for all the red tape required to work with it, "why can't we do the easy thing which is technically possible" and as you can see, there is so much wrong possible that even the best policy is going to have to concede some things, to make this tremendous expense worthwhile "otherwise why are we even using (footgun) if I can't easily shoot myself in the foot whenever I need to?"

[+] liveoneggs|4 years ago|reply
Jenkins is great because you install it (easy), start it (easy), and anyone (junior to senior) can have it doing useful work within the hour. It is a single place where multiple people can collaborate and self-service. There are special purpose tools for going deep into single aspects of it (scheduled jobs, CI/CD, deployment) but as an all-rounder it's really terrific and the price is right.
[+] Cpoll|4 years ago|reply
I use Jenkins as a glorified cron runner in some contexts. One of the things I hate the most is that it's difficult to define jobs as code (the 'Job DSL' plugin works in most cases, but if you're using certain plugins it's hard or impossible to configure them).

What does everyone else use if self-hosting is a requirement but you don't have an enterprise budget?

[+] Aeolun|4 years ago|reply
Gitlab? Hasn’t done me wrong yet, even if it feels kind of overkill if you need only CI/CD.
[+] lmm|4 years ago|reply
For the cron-like part, Rundeck. My builds are still in Jenkins but it's much easier to maintain when it's just doing builds, not workflow orchestration.
[+] 0xbadcafebee|4 years ago|reply
I feel so bad for them. Jenkins is really a technology anti-pattern. I keep saying I'll do this, but I really need to write a series of blog posts to elucidate all the ways in which Jenkins is just bad for your business. If you can use any alternative, do it, and for Bob's sake, pay for a solution. Stop trying to cobble together some crap with shitty free tools when CI/CD is critical to the velocity, quality, and reliability of your products.
[+] mdeeks|4 years ago|reply
As a counter point, we had major issues with just paying for it via CircleCI. Excessive downtimes, a UI so slow it was nearly unusable, etc. We decided at the time we couldn't bet our company on it and we moved to Jenkins. We have far more control, build time and build costs are lower, uptime is better (All of this at the expense of initial dev time and maintenance of course). Generally we are a SaaS company, but paying for a solution just doesn't always work when you get larger and when there are limited options out there.
[+] aranchelk|4 years ago|reply
Modern Jenkins using JenkinsFiles and cloud provisioning can work pretty well.

All things being equal, it's my preference to not use Jenkins for new projects, but anti-pattern is IMO a significant overstatement.

[+] nawgz|4 years ago|reply
Can you expand more on how it would qualify as an "anti-pattern"? I agree it is slow, has issues with its built-in coverage and capabilities, and has an oldschool UI; but it is at its fundamental core a pipeline runner. It is a decent pipeline runner even, which when it comes down to it is the core of each other CI product [that I've seen].

So to hear it described as an "anti-pattern", when realistically it seems to BE the pattern - just poorly executed, is a bit unintuitive to me.

[+] coredog64|4 years ago|reply
I'll note that Cloudbees offers commercial support for Jenkins. Having said that, I had a former employer that was a Cloudbees customer and the support we got was typically "Have you tried turning it off and back on?" That drove us back to OSS Jenkins. Although, AIUI, said employer has now moved to Harness.
[+] spondyl|4 years ago|reply
So just to clarify, they rolled out the latest version to production which broke? What's the staging environment for then?
[+] nhoughto|4 years ago|reply
Always surprised when I see people still using jenkins, must be because of history? you wouldn't choose it today..?
[+] geodel|4 years ago|reply
It could just be my bias against this company. But I find their "Engineering blogs" about fixing their poor product / process rather lame. I get this vibe of "How we fix our inconsistent metric generation problem by using AtomicInteger instead of Integer"
[+] 2rsf|4 years ago|reply
I have never heard the word runbook but I am going to borrow it. I can confirm that uncontrolled updates are the number one of instability in our Jenkins environment, where plugins seems to be the most vulnerable, to Jenkins core updates, other plugins and their own updates