Here's 12 Sysadmin/DevOps (they're synonyms now!) challenges, straight from the day job:
1. Get a user to stop logging in as root.
2. Get all users to stop sharing the same login and password for all servers.
3. Get a user to upgrade their app's dependencies to versions newer than 2010.
4. Get a user to use configuration management rather than scp'ing config files from their laptop to the server.
5. Get a user to bake immutable images w/configuration rather than using configuration management.
6. Get a user to switch from Jenkins to GitHub Actions.
7. Get a user to stop keeping one file with all production secrets in S3, and use a secrets vault instead.
8. Convince a user (and management) you need to buy new servers, because although "we haven't had one go down in years", every one has faulty power supply, hard drive, network card, RAM, etc, and the hardware's so old you can't find spare parts.
9. Get management to give you the authority to force users to rotate their AWS access keys which are 8 years old.
10. Get a user to stop using the aws root account's access keys for their application.
11. Get a user to build their application in a container.
12. Get a user to deploy their application without you.
After you complete each one, you get a glass of scotch. Happy Holidays!
Github Actions left a bad taste in my mouth after having it randomly removed authenticated workers from the pool, after their offline for ~5 days.
This was after setting up a relatively complex PR workflow (always on cheap server starts up very expensive build server with specific hardware) only to have it break randomly after a PR didn't come in for a few days. And no indication that this happens, and no workaround from GitHub.
There are better solutions for CI, GitHub 's is half baked.
It really depends if the machine is hosting anything that you don't want some users to access. If the machine is single-purpose and any user is already able to access everything valuable from it (DB with customer data, etc) or trivially elevate to root (via sudo, docker access, etc) then it's just pointless extra typing and security theatre.
Is this really like that? Isn't there any Unix/DBA anymore? I associate DevOps to what at my time we called "operations" and "development". We had 5 teams or so:
1) Developers, who would architect and write code, 2) Operations who would deploy, monitor and address customer complaints, 3) Unix (aka SYS) administrators, who would take care of housekeeping of well, the OS (and web servers/middleware), 4) DBA who would be monitoring and optimizing Oracle/Postgres, and 5) Network admins, who would take care of Load Balancers, Routers, Switches, Firewalls (well, there were 2 security experts for that also)
So I think DevOps would be a mix of 1&2, to avoid the daily wars that would constantly happen "THEY did it wrong!"
Can somebody clear my mind, please!? It seems I was out of it for too long?!
I know its a common view that sysadmin/devops are the same these days, but witha current sysadmin role nothing youve mentioned sounds relevant. Let's give you my list:
1. Patch Microsoft exchange with only a three hour outage window
2. Train a user to use onedrive instead of emailing 50mb files and back and forth
3. Setup eight printers for six users. Deal with 9gb printer drivers.
4. Ask an exec if he would please let you add mfa to their mailbox.
5. Sit there calmly while that exec yells like a wwe wrestler about the ways he plans to ruin you in response
6. Debate the cost of a custom mouse pad for one person across three meetings
7. Deploy any standard windows app that expects everyone be an administrator without making everyone an administrator
8. Deploy an app that expects uac disabled without disabling uac
9. Debug some finance persons 9000 line excel function
9. Get management to give you the authority to force users to rotate their AWS access keys which are 8 years old.
Saying "keys which are 8 years old" implies you're worried about the keys themselves, which is just wrong. (Their security state depends on monitoring)
You can definitely make a strong argument that the organization needs practice rotating, so I would advise reframing it as an org-survivability-planning challenge and not a key-security issue.
> Get a user to use configuration management rather than scp'ing config files from their laptop to the server.
Damn, this one I'm guilty of. Though, I'm not real Sysadmin/DevOps, I'm just throwing something together and deploying it on a LAN-only VM for security reasons (I don't trust the type of code I would write)
Q: 3. Get a user to upgrade their app's dependencies to versions newer than 2010.
A: Calculate the average age in years of all dependencies calculated by: (max(most recent version release date, date of most recent CVE on library) - used version release date). Sleep for that many seconds before the app starts.
A lot of these problems seem pretty solveable, if you're the admin of the machine (or cloud system) and the user isn't.
If you don't want a user to log in as root, disable the root password (or change it to something only you know) and disable root ssh. If you want people to stop sharing the same login and password across all servers, there's several ways to do it but the most straightforward one seems like it would be to enforce the use of a hardware key (yubikey or similar) for login. If people aren't using configuration management software and are leaving machines in an inconsistent state, again there are several options but I'd look into this NixOS project: https://github.com/nix-community/impermanence + some policy of rebooting the machines regularly.
If you don't like how users are making use of AWS resources and secrets, then set up AWS permissions to force them to do so the correct way. In general if someone is using a system in a bad or insecure way, then after alerting them with some lead time, deliberately break their workflow and force them to come to you in order to make progress. If the thing you suggest is actually the correct course of action for your organization, then it will be worthwhile.
This is heartening - I'm about to start with the daily challenges today and document my experience and that sort of thing.
Any other suggestions? I have sysadmin experience as a homelabber and at work with a small company as a "tech lead" but have not yet had the chance to do it full time in a larger company. Currently focused on back-filling knowledge gaps and adding certs to support my existing experience.
We used to run terminal in browser using https://github.com/yudai/gotty and the entire dev team remapped their Ctrl+w to Ctrl+`. We did frontend and backend development with this setup almost for 1.5 years. Muscles memory and till this date, always have the fear if my actual terminal will get closed if I use Ctlr+w :P
Maybe I'm just extremely dumb, but I can't find how to edit files? Neither `vi` nor `nano` are installed, I don't have internet access to `apt-get update`, and I'm not about to learn `emacs` for this...
EDIT: Ah, ok, `vi` is installed on the server _itself_, just not in the Docker containers. So I guess I'm going to have to `docker cp` them in. Can do o7
Personal advice: don't use solutions repo. Googling the problem and then digging deep into the solutions will teach you hell lot more. Read the man pages of commands that turn up on Google, try them with different options, try to find different commands which can do almost the same thing may be a bit differently .... all these will help you learn things lot more.
The definition I liked best, which I _think_ came from one of the Google SRE books though I'm not certain, was: "SRE is what happens when you consider operations to be a software problem".
Nope, SREs keep applications running on a platform. Lots of metrics, tools to deploy apps in whatever rollout process the company has, etc.
In small companies, sysadmin might be a duty of the SRE team, but they definitely diverge if you have a large on-prem deployment or work with bespoke VMs in the cloud.
Without sharing too many spoilers... I solved the challenge but the check script was unhappy. The curl commands in the script worked fine, the earlier parts of the script failed, i.e. it didn't like how I'd decided to make that work.
This kind of thing annoys me. This is why CTFs are great, where the goal is to get the flag string. Obviously harder to do for sysadmin, but expecting a particular configuration when I managed to make it work without doing things exactly as they wanted is no better than a poorly written exam.
hello, thanks for the feedback. Just deployed a new image that only checks for the objective, not at what docker network somebody uses.
It is hard to have a checker that eliminates both false positives and false negatives in general, but we always try to minimize false negatives and we failed initially here.
Time pressures during christmas/holidays mean that the original calendars were becoming too stressful to handle. Seen several calendars switching to 12 consecutive days or 1 every 2 days challenges.
For math, the AMC 10 and AMC 12 tests have 25 questions each, some of them quite challenging. Both are high school level math, no calculus. Search "2025 amc 10" for this year's problems and solutions.
We have scenarios running on k8s, both on single VMs (the ones you can see in the scenario list) and we also have a beta/PoC k8s cluster where we currently run a couple of scenarios as single pod (a docker container) or as a full system (the "kubernetes playgrounds", which is kind of hidden while we test it).
Is this what you were wondering? we do have pending to introduce podman scenarios as well
0xbadcafebee|3 months ago
cobertos|3 months ago
Github Actions left a bad taste in my mouth after having it randomly removed authenticated workers from the pool, after their offline for ~5 days.
This was after setting up a relatively complex PR workflow (always on cheap server starts up very expensive build server with specific hardware) only to have it break randomly after a PR didn't come in for a few days. And no indication that this happens, and no workaround from GitHub.
There are better solutions for CI, GitHub 's is half baked.
jagged-chisel|3 months ago
Oh, good lord why?
n4bz0r|3 months ago
I've notified the authorities and social services.
betaby|3 months ago
daemonologist|3 months ago
Waterluvian|3 months ago
Most are obvious to most people. None are obvious to everybody.
Nextgrid|3 months ago
It really depends if the machine is hosting anything that you don't want some users to access. If the machine is single-purpose and any user is already able to access everything valuable from it (DB with customer data, etc) or trivially elevate to root (via sudo, docker access, etc) then it's just pointless extra typing and security theatre.
f1shy|3 months ago
Is this really like that? Isn't there any Unix/DBA anymore? I associate DevOps to what at my time we called "operations" and "development". We had 5 teams or so:
1) Developers, who would architect and write code, 2) Operations who would deploy, monitor and address customer complaints, 3) Unix (aka SYS) administrators, who would take care of housekeeping of well, the OS (and web servers/middleware), 4) DBA who would be monitoring and optimizing Oracle/Postgres, and 5) Network admins, who would take care of Load Balancers, Routers, Switches, Firewalls (well, there were 2 security experts for that also)
So I think DevOps would be a mix of 1&2, to avoid the daily wars that would constantly happen "THEY did it wrong!"
Can somebody clear my mind, please!? It seems I was out of it for too long?!
technion|3 months ago
1. Patch Microsoft exchange with only a three hour outage window 2. Train a user to use onedrive instead of emailing 50mb files and back and forth 3. Setup eight printers for six users. Deal with 9gb printer drivers. 4. Ask an exec if he would please let you add mfa to their mailbox. 5. Sit there calmly while that exec yells like a wwe wrestler about the ways he plans to ruin you in response 6. Debate the cost of a custom mouse pad for one person across three meetings 7. Deploy any standard windows app that expects everyone be an administrator without making everyone an administrator 8. Deploy an app that expects uac disabled without disabling uac 9. Debug some finance persons 9000 line excel function
alberth|3 months ago
athrowaway3z|3 months ago
You can definitely make a strong argument that the organization needs practice rotating, so I would advise reframing it as an org-survivability-planning challenge and not a key-security issue.
DoctorOW|3 months ago
Damn, this one I'm guilty of. Though, I'm not real Sysadmin/DevOps, I'm just throwing something together and deploying it on a LAN-only VM for security reasons (I don't trust the type of code I would write)
infogulch|3 months ago
A: Calculate the average age in years of all dependencies calculated by: (max(most recent version release date, date of most recent CVE on library) - used version release date). Sleep for that many seconds before the app starts.
JuniperMesos|3 months ago
If you don't want a user to log in as root, disable the root password (or change it to something only you know) and disable root ssh. If you want people to stop sharing the same login and password across all servers, there's several ways to do it but the most straightforward one seems like it would be to enforce the use of a hardware key (yubikey or similar) for login. If people aren't using configuration management software and are leaving machines in an inconsistent state, again there are several options but I'd look into this NixOS project: https://github.com/nix-community/impermanence + some policy of rebooting the machines regularly.
If you don't like how users are making use of AWS resources and secrets, then set up AWS permissions to force them to do so the correct way. In general if someone is using a system in a bad or insecure way, then after alerting them with some lead time, deliberately break their workflow and force them to come to you in order to make progress. If the thing you suggest is actually the correct course of action for your organization, then it will be worthwhile.
AstroJetson|3 months ago
Two pints of ale please!
UltraSane|3 months ago
unknown|3 months ago
[deleted]
melvinodsa|3 months ago
alexpotato|3 months ago
Feedback from candidates is that they find it a bit stressful during the actual interview but love the approach once it's completed.
The interview option also makes it trivial to just send to a candidate via Zoom chat, ask them to share their screen and "just works".
Happy to answer questions folks may have about how we use it.
zenoprax|3 months ago
Any other suggestions? I have sysadmin experience as a homelabber and at work with a small company as a "tech lead" but have not yet had the chance to do it full time in a larger company. Currently focused on back-filling knowledge gaps and adding certs to support my existing experience.
kralos|3 months ago
melvinodsa|3 months ago
tambourine_man|3 months ago
protomikron|3 months ago
fduran|3 months ago
CoolCold|3 months ago
scubbo|2 months ago
EDIT: Ah, ok, `vi` is installed on the server _itself_, just not in the Docker containers. So I guess I'm going to have to `docker cp` them in. Can do o7
Erwyn|3 months ago
gautamsomani|2 months ago
irusensei|3 months ago
phrotoma|3 months ago
oarmstrong|3 months ago
kortilla|3 months ago
In small companies, sysadmin might be a duty of the SRE team, but they definitely diverge if you have a large on-prem deployment or work with bespoke VMs in the cloud.
teddyh|3 months ago
thatxliner|3 months ago
fduran|3 months ago
I don't know of any other SaaS which gives you a VM with one click without any registration but we do it.
In any case thanks for the feedback, I've put a button on this /advent page for clarity, cheers
unknown|3 months ago
[deleted]
fragmede|3 months ago
ofrzeta|3 months ago
fduran|3 months ago
dontdoxxme|3 months ago
This kind of thing annoys me. This is why CTFs are great, where the goal is to get the flag string. Obviously harder to do for sysadmin, but expecting a particular configuration when I managed to make it work without doing things exactly as they wanted is no better than a poorly written exam.
fduran|3 months ago
It is hard to have a checker that eliminates both false positives and false negatives in general, but we always try to minimize false negatives and we failed initially here.
truekonrads|3 months ago
fduran|3 months ago
NooneAtAll3|3 months ago
nstart|3 months ago
aljaz823|3 months ago
[1] https://adventofcode.com/2025/about#faq_num_days
swyx|3 months ago
fyltr|3 months ago
rconti|3 months ago
tonyhart7|3 months ago
dubya|3 months ago
udev4096|3 months ago
fduran|3 months ago
We have scenarios running on k8s, both on single VMs (the ones you can see in the scenario list) and we also have a beta/PoC k8s cluster where we currently run a couple of scenarios as single pod (a docker container) or as a full system (the "kubernetes playgrounds", which is kind of hidden while we test it).
Is this what you were wondering? we do have pending to introduce podman scenarios as well
john-carter|3 months ago
[deleted]
zhouzhao|3 months ago
[deleted]
rvz|3 months ago
[deleted]
gryfft|3 months ago
mekoka|3 months ago