Hard agree. As LLMs drive the cost of writing code toward zero, the volume of code we produce is going to explode. But the cost of complexity doesn't go down—it actually might go up because we're generating code faster than we can mentally model it.
SRE becomes the most critical layer because it's the only discipline focused on 'does this actually run reliably?' rather than 'did we ship the feature?'. We're moving from a world of 'crafting logic' to 'managing logic flows'.
I dunno, I don't think in practice SRE or DevOPs are even really different from the people we used to call sys admins (former sysadmin myself). I think the future of mediocre companies is SRE chasing after LLM fires, but I think a competitive business would have a much better strategy for building systems. Humans are still by far the most efficient and generalized reasoners, and putting the energy intensive, brittle ai model in charge of most implementation is setting yourself up to fail.
But how much of current day software complexity is inherent in the problem space vs just bad design and too many (human) chefs in the kitchen? I'm guessing most of it is the latter category.
We might get more software but with less complexity overall, assuming LLMs become good enough.
SREs usually don't know the first thing about whether particular logic within the product is working according to a particular set of business requirements. That's just not their role.
I see it less as SRE and more about defensive backend architecture. When you are dealing with non-deterministic outputs, you can't just monitor for uptime, you have to architect for containment. I've been relying heavily on LangGraph and Celery to manage state, basically treating the LLM as a fuzzy component that needs a rigid wrapper. It feels like we are building state machines where the transitions are probabilistic, so the infrastructure (Redis, queues) has to be much more robust than the code generating the content.
I think there's two kinds of software-producing-organizations:
There's the small shops where you're running some kind of monolith generally open to the Internet, maybe you have a database hooked up to it. These shops do not need dedicated DevOps/SRE. Throw it into a container platform (e.g. AWS ECS/Fargate, GCP Cloud Run, fly.io, the market is broad enough that it's basically getting commoditized), hook up observability/alerting, maybe pay a consultant to review it and make sure you didn't do anything stupid. Then just pay the bill every month, and don't over-think it.
Then you have large shops: the ones where you're running at the scale where the cost premium of container platforms is higher than the salary of an engineer to move you off it, the ones where you have to figure out how to get the systems from different companies pre-M&A to talk to each other, where you have N development teams organizationally far away from the sales and legal teams signing SLAs yet need to be constrained by said SLAs, where you have some system that was architected to handle X scale and the business has now sold 100X and you have to figure out what band-aids to throw at the failing system while telling the devs they need to re-architect, where you need to build your Alertmanager routing tree configuration dynamically because YAML is garbage and the routing rules change based on whether or not SRE decided to return the pager, plus ensuring that devs have the ability to self-service create new services, plus progressive rollout of new alerts across the organization, etc., so even Alertmanager config needs to be owned by an engineer.
I really can't imagine LLMs replacing SREs in large shops. SREs debugging production outages to find a proximate "root" technical cause is a small fraction of the SRE function.
> SREs debugging production outages to find a proximate "root" technical cause is a small fraction of the SRE function.
According to the specified goals of SRE, this is actually not just a small fraction - but something that shouldn't happen.
To be clear, I'm fully aware that this will always be necessary - but whenever it happened - it's because the site reliability engineer (SRE) overlooked something.
Hence if that's considered a large part of the job.. then you're just not a SRE as Google defined that role
Very little connection to the blog post we're commenting on though - at least as far as I can tell.
At least I didn't find any focus on debugging. It put forward that the capability to produce reliable software is what will distinguish in the future, and I think this holds up and is inline with the official definition of SRE
Having worked on Cloud Run/Cloud Functions, I think almost every company that isn't itself a cloud provider could be in category 1, with moderately more featureful implementations that actually competed with K8s.
Kubernetes is a huge problem, it's IMO a shitty prototype that industry ran away with (because Google tried to throw a wrench at Docker/AWS when Containers and Cloud were the hot new things, pretending Kubernetes is basically the same as Borg), then the community calcified around the prototype state and bought all this SAAS/structured their production environments around it, and now all these SAAS providers and Platform Engineers/Devops people who make a living off of milking money out of Kubernetes users are guarding their gold mines.
Part of the K8s marketing push was rebranding Infrastructure Engineering = building atop Kubernetes (vs operating at the layers at and beneath it), and K8s leaks abstractions/exposes an enormous configuration surface area, so you just get K8s But More Configuration/Leaks. Also, You Need A Platform, so do Platform Engineering too, for your totally unique use case of connecting git to CI to slackbot/email/2FA to our release scripts.
At my new company we're working on fixing this but it'll probably be 1-2 more years until we can open source it (mostly because it's not generalized enough yet and I don't want to make the same mistake as Kubernetes. But we will open source it). The problem is mostly multitenancy, better primitives, modeling the whole user story in the platform itself, and getting rid of false dichotomies/bad abstractions regarding scaling and state (including the entire control plane). Also, more official tooling and you have to put on a dunce cap if YAML gets within 2 network hopes of any zone.
In your example, I think
1. you shouldn't have to think about scaling and provisioning at this level of granularity, it should always be at the multitenant zonal level, this is one of the cardinal sins Kubernetes made that Borg handled much better
2. YAML is indeed garbage but availability reporting and alerting need better official support, it doesn't make sense for every ecommerce shop and bank to building this stuff
3. a huge amount of alerts and configs could actually be expressed in business logic if cloud platforms exposed synchronous/real-time billing with the scaling speed of Cloud Run.
If you think about it, so so so many problems devops teams deal with are literally just
1. We need to be able to handle scaling events
2. We need to control costs
3. Sometimes these conflict and we struggle to translate between the two.
4. Nobody lets me set hard billing limits/enforcement at the platform level.
(I implemented enforcement for something close to this for Run/Appengine/Functions, it truly is a very difficult problem, but I do think it's possible. Real time usage->billing->balance debits was one of the first things we implemented on our platform).
5. For some reason scaling and provisioning are different things (partly because the cloud provider is slow, partly because Kubernetes is single-tenant)
6. Our ops team's job is to translate between business logic and resource logic, and half our alerts are basically asking a human to manually make some cost/scaling analysis or tradeoff, because we can't automate that, because the underlying resource model/platform makes it impossible.
stackskipton makes a good point about authority. SRE works at Google because SREs can block launches and demand fixes. Without that organizational power, you're just an on-call engineer who also writes tooling.
The article's premise (AI makes code cheap, so operations becomes the differentiator) has some truth to it. But I'd frame it differently: the bottleneck was never really "writing code." It was understanding what to build and keeping it running. AI helps with one of those. Maybe.
> because SREs can block launches and demand fixes
I didn't find that particularly true during my tenure, but obviously Google is huge, so there probably exist teams that actually can afford to behave this way...
If the agent swarm is collectively smarter and better than the SRE, they'll be replaced just like other types of workers. There is no domain that has special protection.
The models are not smarter than us by far. Have you not run into issues with reasoning and comprehension with them? They get confused, they miss big details, build complicated code thats ineffective. They don't work well at tasks that require a larger holistic understanding of the problem. The models are weak, brittle reasoners, because they have an indirect and contradictory understanding of the wold. We're several breakthroughs away and several hardware generations from having models that are robust reasoners for grounded, non-kind problems.
I agree. In many cases it's probably easier for a developer to become more of a product person, than for a product person to become a dev. Even with LLM's you still need to have some technical skills & be able to read code to handle technical tasks effectively.
Of course things might look different when the product is something that requires really deep domain knowledge.
I don't think the two are mutually exclusive! e.g. a T-shaped product engineer on one side and a T-shaped SRE on the other. Both will kind of compact what used to be multiple roles/responsibilities together. The good news (and my prediction) IMO is the engineering won't be going away as much as the other roles.
I was an old school SRE before the days of containerization and such. Today, we have one who is a YAML wizard and I won't even pretend to begin to understand the entire architecture between all the moving pieces(kube, flux, helm, etc).
That said, Claude has absolutely no problem not only answering questions, but finding bugs and adding new features to it.
In short, I feel they're just as screwed as us devs.
I knew what an SRE was and found the article somewhat interesting with a slightly novel (throwaway), more realistic take, on the "why need Salesforce when you can vibe your own Salesforce convo."
But not defining what an SRE is feels like a glaring, almost suffocating, omission.
As an SRE I can tell you AI can't do everything. I have done a little software development, even AI can't do everything. What we are likely to see is operational engineering become the consolidated role between the two. Knows enough about software development and knows enough about site reliability... blamo operational engineer.
I don’t think LLM context will able to digest large codebases and their algorithms are not going to reason like SREs in the next coming years. And given the current hype and market, investors are gonna pull out with recessions all over the world and we will see another AI Winters.
Code has become a commodity. Corporate engineering hierarchy will be much flat in coming years both horizontally and vertically - one staff will command two senior engineers with two juniors each, orchestrating N agents each.
I think that’s it - this is the end of bootcamp devs. This will act as a great filter and probably decrease the mass influx of bootcamp devs.
Bootcamp devs were always going to be doomed in the job market. They were a symptom of not having enough true classically trained computer science degree holding engineers to hire, so you compromised by looking for anyone that knew how to code well enough. But this problem eventually corrects.
Now, there are way too many computer science grads in a time when code is easy and cheap. Not much to gain from hiring a bootcamp dev over the real deal.
But I would say if you truly enjoy coding and you didn’t get to study CS in a university, a bootcamp is probably a fun experience to go through just for your own enjoyment, not for job seeking purposes. Just don’t pay too much.
As someone who works in Ops role (SRE/DevOps/Sysadmin), SREs are something that only works at Google mainly because for Devs to do SRE, they need ability to reject or demand code fixes which means you need someone being a prompt engineer who needs to understand the code and now they back to being developer.
As for more dedicated to Ops side, it's garbage in, garbage out. I've already had too many outages caused by AI Slop being fed into production, calling all Developers = SRE won't change the fact that AI can't program now without massive experienced people controlling it.
Most devs can't do SRE, in fact the best devs I've met know they can't do SRE (and vice versa). If I may get a bit philosophical, SRE must be conservative by nature and I feel that devs are often innovative by nature. Another argument is that they simply focus on different problems. One sets up an IDE and clicks play, has some ephemeral devcontainer environment that "just works", and the hard part is to craft the software. The other has the software ready and sometimes very few instructions on how to run it, + your typical production issues, security, scaling, etc. The brain of each gets wired differently over time to solve those very different issues effectively.
I manage a team of developers in a low code environment without AI. The junior developer positions require 8 years of experience, which I think is absurd. Everybody has to program on their own, though pair programming for knowledge transfer is super frequent, but the primary skills of concern are operational excellence (including some project management tasks), transmission, and reliability.
From a people perspective that means excellence when working with outside teams and gathering requirements on your own. It also means always knowing the status of your work in all environments, even in production after deployment. If your soft skills are strong and you can independently program work streams that touch multiple external parties you are golden. It seems this is the future.
I'm sorry, nothing personal...but any place that requires 8 years of experience but only gives a title of "junior" is pretty dang close to a sweat shop.
On a different note, i do see what you mention about some op excellence skills (e.g. project management, requirements gathering, etc.) being areas of concern at my $dayjob. But, i kinda always saw them as skills that are valuable in any era, and need not only be in this AI era....but everyone's mileage and environment certainly can vary that expectation. Also, at my $dayjob, the business lacks so much funding to pay software vendors fairly, properly that we get what we pay for....so its often low quality output. Its not low *code* because we employee and contract regular, full code devs....but it certainly often is poor quality...and i wonder as low code offerings and opportunities - paired with more solid AI development asistance - continue to emerge, i suppose something like a SRE role can become that much more important - regardless if one works in low code or low cost arena.
> All he wanted was to make his job easier and now he's shackled to this stupid system.
What people failed to grasp about low-code/no-code tools (and what I believe the author ultimately says) is that it was never about technical ability. It was about time.
The people who were "supposed" to be the targets of these tools didn't have the time to begin with, let alone the technical experience to round out the rough edges. It's a chore maintaining these types of things.
These tools don't change that equation. I truly believe that we'll see a new golden age of targeted, bepsoke software that can now be developed cheaper instead of small/medium businesses utilizing off-the-shelf, one-size-fits-all solutions.
Operational excellency was always part of the job, regardless of what fancy term described it, be it DevOps, SRE or something else. The future of software engineering is software engineering, with emphasis on engineering.
> And you definitely don't care how a payments network point of sale terminal and your bank talk to each other... Good software is invisible.
> ...
> Are you keeping up with security updates? Will you leak all my data? Do I trust you? Can I rely on you?
IMO, if the answers to those questions matter to you, then you damn well should care how it works. Because even if you aren't sufficiently technically minded to audit the system, having someone be able to describe it to you coherently is an important starting point in building that trust and having reason to believe that security and privacy will work as advertised.
Totally agree. Vibe coding will generate lots of internal AI apps, but turning them into reliable, secure, governed services still requires real engineering, which is exactly why we’re building https://manifest.build. It lets non-technical teams build Agentic apps fast through an AI powered workflow builder while giving engineering and IT a single platform to add governance, security, data access, and keep everything production-ready at scale.
In other words, the apps will be trash, and an operations team that doesn't have the time, capability, or mandate to fix them will be constantly scrambling to keep the fires out?
AI will not get much better than what we have today, and what we have today is not enough to totally transform software engineering. It is a little easier to be a software engineer now, but that’s it. You can still fuck everything up.
There were several cheaper than programmers options to automate things, Robot Processing Automation being probably the most known, but it never get the expected traction.
Why (imo)? Senior leaders still like to say: I run a 500 headcount finance EMEA organization for Siemens, I am the Chief People Officer of Meta anf I lead an org of 1000 smart HR pros. Most of their status is still tight to the org headcount.
Again there's a cognitive dissonance in play here where the future of coding is somehow LLMs and but at the same time the LLMS would not evolve not to handle the operations as well even if we disregard pipedreams about AGIs being just around the corner. Especially when markdown files for AI are essentially glorified runbooks.
If the future of software engineering is SRE, because GenAI is taking care of coding, a similar trend is coming for SRE-type work.
It's called AI SRE, and for now, it's mostly targeted at helping on-call engineers investigate and solve incidents. But of course, these agents can also be used proactively to improve reliability.
True, but also need to know the basics well of what constitutes good code and how it should scale vs just working code. Too many people relying on LLMs to produce stuff which just about works but give users a terrible experience as it bearly works.
“People don’t buy software, they hire a service” is a bullshit straw man.
That OS on your laptop? Software.
The terminal your SSH runs in? Software.
The browser you’re reading this take in? Software.
The editor you wrote your last 10k LOC in? Software.
The only “service” I buy is email — and even that I run myself. It’s still just software, plus ops.
Yes, running things is hard. Nobody serious disputes that. But pretending this is some new revelation is ahistorical. We used to call this systems engineering, operations, reliability, or just doing your job before SRE needed a brand deck.
And let’s be clear about the direction of value:
Software without SRE still has value.
SRE without software has none.
A binary I can run, copy, fork, and understand beats a perfectly monitored nothing. A CLI tool with zero uptime guarantees still solves problems. A library still ships value. A game still runs. A compiler still compiles.
Ops exists to serve software, not replace it.
Reliability amplifies value — it does not create it.
If “writing code is easy,” why is the world drowning in unreliable, unmaintainable, over-engineered trash with immaculate dashboards and flawless incident postmortems?
People buy software.
They appreciate service when the software becomes infrastructure.
Confusing the two is how you end up worshipping uptime graphs while shipping nothing worth running.
I have a lot of work:
Make the agents work at warp speed.
Prepare specs for next iteration
Hopefully exhaust resources.. for free time.
<rest as much as possible>
Except the small detail that as proven by all the people that lost their jobs to factory robots, the number of required SRE is relatively small in porpotion to existing demographics of SWEs.
Also this doesn't cover most of the jobs, which are actually in consulting, and not product development.
IMO SRE works mostly because they exist outside the product engineering organization. They want to help you succeed but if you want to YOLO your launch and move fast and break things they have the option to hand back the pager and find other work. That option is rarely used but the option alone seems to create better than usual incentives.
With Vibecoding I imagine the LLM will get a MCP that allows them to schedule the jobs on Kubernetes or whatever IaaS and a fleet of agents will do the basic troubleshooting or whackamole type activities, leaving only the hard problems for human SRE. Before and after AI, the corporate incentives will always be to ship slop unless there is a counterbalancing force keeping the shipping team accountable to higher standards.
It only matters if any of those can promise reliability and either put their own money where their mouth is or convince (and actually get them to pay up) a bigger player to insure them.
Ultimately hardware, software, QA, etc is all about delivering a system that produces certain outputs for certain inputs, with certain penalties if it doesn’t. If you can, great, if you can’t, good luck. Whether you achieve the “can” with human development or LLM is of little concern as long as you can pay out the penalties of “can’t”.
Site Reliability Engineering. It is the role that, among other things, ensures that a service uptime is optimal. It's the closest thing we have nowadays to the system admin role
What? Maybe OPs future. SWE is just going to replace QA and maybe architects if the industry adopts AI more, but there's a lot of hold outs. There's plenty of projects out there that are 'boring' and will not bother.
Operational excellence will always be needed but part of that is writing good code. If the slop machine has made bad decisions it could be more efficient to rewrite using human expertise and deploy that.
My take (I'm an SRE) is that SRE should work pre-emptively to provide reproducible prod-like environments so that QA can test DEV code closer to real-life conditions. Most prod platforms I've seen are nowhere near that level of automation, which makes it really hard to detect or even reproduce production issues.
And no, as an SRE I won't read DEV code, but I can help my team test it.
> Writing code was always the easy part of this job. The hard part was keeping your code running for the long time.
Spoken like a true SRE. I'm mostly writing code, rather than working on keeping it in production, but I've had websites up since 2006 (hope that counts as long time in this corner of the internet) with very little down time and frankly not much effort.
My experience with SREs was largely that they're glorified SSH: they tell me I'm the programmer and I should know what to type into their shell to debug the problem (despite them SREing those services for years, while I joined two months ago and haven't even seen the particular service). But no I can't have shell access, and yes I should be the one spelling out what needs to be typed in.
v_CodeSentinal|1 month ago
SRE becomes the most critical layer because it's the only discipline focused on 'does this actually run reliably?' rather than 'did we ship the feature?'. We're moving from a world of 'crafting logic' to 'managing logic flows'.
ottah|1 month ago
mupuff1234|1 month ago
But how much of current day software complexity is inherent in the problem space vs just bad design and too many (human) chefs in the kitchen? I'm guessing most of it is the latter category.
We might get more software but with less complexity overall, assuming LLMs become good enough.
wavemode|1 month ago
SREs usually don't know the first thing about whether particular logic within the product is working according to a particular set of business requirements. That's just not their role.
belter|1 month ago
And they drive the cost of validating the correctness of such code towards infinity...
storystarling|1 month ago
franktankbank|1 month ago
solatic|1 month ago
There's the small shops where you're running some kind of monolith generally open to the Internet, maybe you have a database hooked up to it. These shops do not need dedicated DevOps/SRE. Throw it into a container platform (e.g. AWS ECS/Fargate, GCP Cloud Run, fly.io, the market is broad enough that it's basically getting commoditized), hook up observability/alerting, maybe pay a consultant to review it and make sure you didn't do anything stupid. Then just pay the bill every month, and don't over-think it.
Then you have large shops: the ones where you're running at the scale where the cost premium of container platforms is higher than the salary of an engineer to move you off it, the ones where you have to figure out how to get the systems from different companies pre-M&A to talk to each other, where you have N development teams organizationally far away from the sales and legal teams signing SLAs yet need to be constrained by said SLAs, where you have some system that was architected to handle X scale and the business has now sold 100X and you have to figure out what band-aids to throw at the failing system while telling the devs they need to re-architect, where you need to build your Alertmanager routing tree configuration dynamically because YAML is garbage and the routing rules change based on whether or not SRE decided to return the pager, plus ensuring that devs have the ability to self-service create new services, plus progressive rollout of new alerts across the organization, etc., so even Alertmanager config needs to be owned by an engineer.
I really can't imagine LLMs replacing SREs in large shops. SREs debugging production outages to find a proximate "root" technical cause is a small fraction of the SRE function.
ffsm8|1 month ago
According to the specified goals of SRE, this is actually not just a small fraction - but something that shouldn't happen. To be clear, I'm fully aware that this will always be necessary - but whenever it happened - it's because the site reliability engineer (SRE) overlooked something.
Hence if that's considered a large part of the job.. then you're just not a SRE as Google defined that role
https://sre.google/sre-book/table-of-contents/
Very little connection to the blog post we're commenting on though - at least as far as I can tell.
At least I didn't find any focus on debugging. It put forward that the capability to produce reliable software is what will distinguish in the future, and I think this holds up and is inline with the official definition of SRE
tryauuum|1 month ago
what do you mean "progressive rollout of new alerts across the organization"? what kind of alerts?
weitendorf|1 month ago
Kubernetes is a huge problem, it's IMO a shitty prototype that industry ran away with (because Google tried to throw a wrench at Docker/AWS when Containers and Cloud were the hot new things, pretending Kubernetes is basically the same as Borg), then the community calcified around the prototype state and bought all this SAAS/structured their production environments around it, and now all these SAAS providers and Platform Engineers/Devops people who make a living off of milking money out of Kubernetes users are guarding their gold mines.
Part of the K8s marketing push was rebranding Infrastructure Engineering = building atop Kubernetes (vs operating at the layers at and beneath it), and K8s leaks abstractions/exposes an enormous configuration surface area, so you just get K8s But More Configuration/Leaks. Also, You Need A Platform, so do Platform Engineering too, for your totally unique use case of connecting git to CI to slackbot/email/2FA to our release scripts.
At my new company we're working on fixing this but it'll probably be 1-2 more years until we can open source it (mostly because it's not generalized enough yet and I don't want to make the same mistake as Kubernetes. But we will open source it). The problem is mostly multitenancy, better primitives, modeling the whole user story in the platform itself, and getting rid of false dichotomies/bad abstractions regarding scaling and state (including the entire control plane). Also, more official tooling and you have to put on a dunce cap if YAML gets within 2 network hopes of any zone.
In your example, I think
1. you shouldn't have to think about scaling and provisioning at this level of granularity, it should always be at the multitenant zonal level, this is one of the cardinal sins Kubernetes made that Borg handled much better
2. YAML is indeed garbage but availability reporting and alerting need better official support, it doesn't make sense for every ecommerce shop and bank to building this stuff
3. a huge amount of alerts and configs could actually be expressed in business logic if cloud platforms exposed synchronous/real-time billing with the scaling speed of Cloud Run.
If you think about it, so so so many problems devops teams deal with are literally just
1. We need to be able to handle scaling events
2. We need to control costs
3. Sometimes these conflict and we struggle to translate between the two.
4. Nobody lets me set hard billing limits/enforcement at the platform level.
(I implemented enforcement for something close to this for Run/Appengine/Functions, it truly is a very difficult problem, but I do think it's possible. Real time usage->billing->balance debits was one of the first things we implemented on our platform).
5. For some reason scaling and provisioning are different things (partly because the cloud provider is slow, partly because Kubernetes is single-tenant)
6. Our ops team's job is to translate between business logic and resource logic, and half our alerts are basically asking a human to manually make some cost/scaling analysis or tradeoff, because we can't automate that, because the underlying resource model/platform makes it impossible.
You gotta go under the hood to fix this stuff.
augusteo|1 month ago
The article's premise (AI makes code cheap, so operations becomes the differentiator) has some truth to it. But I'd frame it differently: the bottleneck was never really "writing code." It was understanding what to build and keeping it running. AI helps with one of those. Maybe.
nasretdinov|1 month ago
I didn't find that particularly true during my tenure, but obviously Google is huge, so there probably exist teams that actually can afford to behave this way...
pcj-github|1 month ago
ottah|1 month ago
measurablefunc|1 month ago
bronlund|1 month ago
Edit: Or maybe he is fully aware and just need to push some books before it's too late.
whoamii|1 month ago
joshuaisaact|1 month ago
Look at the 'Product Engineer' roles we are seeing spreading in forward-thinking startups and scaleups.
That's the future of SWE I think. SWEs take on more PM and design responsibilities as part of the existing role.
reeredfdfdf|1 month ago
Of course things might look different when the product is something that requires really deep domain knowledge.
jzig|1 month ago
pjmlp|1 month ago
However, like in automated factories, only a small percentage is required to stay around.
silisili|1 month ago
That said, Claude has absolutely no problem not only answering questions, but finding bugs and adding new features to it.
In short, I feel they're just as screwed as us devs.
adelmotsjr|1 month ago
arionmiles|1 month ago
F7F7F7|1 month ago
But not defining what an SRE is feels like a glaring, almost suffocating, omission.
ares623|1 month ago
Sparkyte|1 month ago
mellosouls|1 month ago
That's what they used to say about software engineering and yet this is becoming less and less obvious as capabilities increase.
There are no hiding places for any of us.
squidbeak|1 month ago
stared|1 month ago
We just created a benchmark on adding distributed logs (OpenTelemetry instrumentation) to small services, around 300 lines of code.
Claude Opus 4.5 succeed at 29%, GPT 5.2 at 26%, Gemini 3 Pro at 16%.
https://quesma.com/blog/introducing-otel-bench/
chubot|1 month ago
mon_|1 month ago
northfield27|1 month ago
I don’t think LLM context will able to digest large codebases and their algorithms are not going to reason like SREs in the next coming years. And given the current hype and market, investors are gonna pull out with recessions all over the world and we will see another AI Winters.
Code has become a commodity. Corporate engineering hierarchy will be much flat in coming years both horizontally and vertically - one staff will command two senior engineers with two juniors each, orchestrating N agents each.
I think that’s it - this is the end of bootcamp devs. This will act as a great filter and probably decrease the mass influx of bootcamp devs.
deadbabe|1 month ago
Now, there are way too many computer science grads in a time when code is easy and cheap. Not much to gain from hiring a bootcamp dev over the real deal.
But I would say if you truly enjoy coding and you didn’t get to study CS in a university, a bootcamp is probably a fun experience to go through just for your own enjoyment, not for job seeking purposes. Just don’t pay too much.
stackskipton|1 month ago
As for more dedicated to Ops side, it's garbage in, garbage out. I've already had too many outages caused by AI Slop being fed into production, calling all Developers = SRE won't change the fact that AI can't program now without massive experienced people controlling it.
bionsystem|1 month ago
austin-cheney|1 month ago
From a people perspective that means excellence when working with outside teams and gathering requirements on your own. It also means always knowing the status of your work in all environments, even in production after deployment. If your soft skills are strong and you can independently program work streams that touch multiple external parties you are golden. It seems this is the future.
mxuribe|1 month ago
On a different note, i do see what you mention about some op excellence skills (e.g. project management, requirements gathering, etc.) being areas of concern at my $dayjob. But, i kinda always saw them as skills that are valuable in any era, and need not only be in this AI era....but everyone's mileage and environment certainly can vary that expectation. Also, at my $dayjob, the business lacks so much funding to pay software vendors fairly, properly that we get what we pay for....so its often low quality output. Its not low *code* because we employee and contract regular, full code devs....but it certainly often is poor quality...and i wonder as low code offerings and opportunities - paired with more solid AI development asistance - continue to emerge, i suppose something like a SRE role can become that much more important - regardless if one works in low code or low cost arena.
mexicocitinluez|1 month ago
What people failed to grasp about low-code/no-code tools (and what I believe the author ultimately says) is that it was never about technical ability. It was about time.
The people who were "supposed" to be the targets of these tools didn't have the time to begin with, let alone the technical experience to round out the rough edges. It's a chore maintaining these types of things.
These tools don't change that equation. I truly believe that we'll see a new golden age of targeted, bepsoke software that can now be developed cheaper instead of small/medium businesses utilizing off-the-shelf, one-size-fits-all solutions.
ivan_gammel|1 month ago
zahlman|1 month ago
> ...
> Are you keeping up with security updates? Will you leak all my data? Do I trust you? Can I rely on you?
IMO, if the answers to those questions matter to you, then you damn well should care how it works. Because even if you aren't sufficiently technically minded to audit the system, having someone be able to describe it to you coherently is an important starting point in building that trust and having reason to believe that security and privacy will work as advertised.
stosssik|1 month ago
coffeefirst|1 month ago
Sounds... reliable.
chickensong|1 month ago
unknown|1 month ago
[deleted]
deadbabe|1 month ago
AI will not get much better than what we have today, and what we have today is not enough to totally transform software engineering. It is a little easier to be a software engineer now, but that’s it. You can still fuck everything up.
falcor84|1 month ago
Wow, where did this come from?
From what just comes to my mind based on recent research, I'd expect at least the following this or next year:
* Continuous learning via an architectural change like Titans or TTT-E2E.
* Advancement in World Models (many labs focusing on them now)
* Longer-running agentic systems, with Gas Town being a recent proof of concept.
* Advances in computer and browser usage - tons of money being poured into this, and RL with self-play is straightforward
* AI integration into robotics, especially when coupled with world models
alexgotoi|1 month ago
Why (imo)? Senior leaders still like to say: I run a 500 headcount finance EMEA organization for Siemens, I am the Chief People Officer of Meta anf I lead an org of 1000 smart HR pros. Most of their status is still tight to the org headcount.
unknown|1 month ago
[deleted]
petetnt|1 month ago
sylvainkalache|1 month ago
It's called AI SRE, and for now, it's mostly targeted at helping on-call engineers investigate and solve incidents. But of course, these agents can also be used proactively to improve reliability.
mg794613|1 month ago
joe_91|1 month ago
johndoh42|1 month ago
That OS on your laptop? Software. The terminal your SSH runs in? Software. The browser you’re reading this take in? Software. The editor you wrote your last 10k LOC in? Software.
The only “service” I buy is email — and even that I run myself. It’s still just software, plus ops.
Yes, running things is hard. Nobody serious disputes that. But pretending this is some new revelation is ahistorical. We used to call this systems engineering, operations, reliability, or just doing your job before SRE needed a brand deck.
And let’s be clear about the direction of value:
Software without SRE still has value. SRE without software has none.
A binary I can run, copy, fork, and understand beats a perfectly monitored nothing. A CLI tool with zero uptime guarantees still solves problems. A library still ships value. A game still runs. A compiler still compiles.
Ops exists to serve software, not replace it. Reliability amplifies value — it does not create it.
If “writing code is easy,” why is the world drowning in unreliable, unmaintainable, over-engineered trash with immaculate dashboards and flawless incident postmortems?
People buy software. They appreciate service when the software becomes infrastructure. Confusing the two is how you end up worshipping uptime graphs while shipping nothing worth running.
arbirk|1 month ago
Every 5 hours 24/7. Rinse repeat
nbevans|1 month ago
outside2344|1 month ago
Artoooooor|1 month ago
didip|1 month ago
eschneider|1 month ago
willtemperley|1 month ago
pjmlp|1 month ago
Also this doesn't cover most of the jobs, which are actually in consulting, and not product development.
siliconc0w|1 month ago
With Vibecoding I imagine the LLM will get a MCP that allows them to schedule the jobs on Kubernetes or whatever IaaS and a fleet of agents will do the basic troubleshooting or whackamole type activities, leaving only the hard problems for human SRE. Before and after AI, the corporate incentives will always be to ship slop unless there is a counterbalancing force keeping the shipping team accountable to higher standards.
trkabv|1 month ago
Who probably has never written anything of value in his life and therefore approves the theft of other people's valuable work.
ks2048|1 month ago
almosthere|1 month ago
Nextgrid|1 month ago
Ultimately hardware, software, QA, etc is all about delivering a system that produces certain outputs for certain inputs, with certain penalties if it doesn’t. If you can, great, if you can’t, good luck. Whether you achieve the “can” with human development or LLM is of little concern as long as you can pay out the penalties of “can’t”.
cl0ckt0wer|1 month ago
ozim|1 month ago
ikiris|1 month ago
metasim|1 month ago
netdevphoenix|1 month ago
giancarlostoro|1 month ago
hahahahhaah|1 month ago
pepperball|1 month ago
[deleted]
dionian|1 month ago
bionsystem|1 month ago
And no, as an SRE I won't read DEV code, but I can help my team test it.
VirusNewbie|1 month ago
tasuki|1 month ago
Spoken like a true SRE. I'm mostly writing code, rather than working on keeping it in production, but I've had websites up since 2006 (hope that counts as long time in this corner of the internet) with very little down time and frankly not much effort.
My experience with SREs was largely that they're glorified SSH: they tell me I'm the programmer and I should know what to type into their shell to debug the problem (despite them SREing those services for years, while I joined two months ago and haven't even seen the particular service). But no I can't have shell access, and yes I should be the one spelling out what needs to be typed in.