The fact that Github has been so unstable for so long is absolutely insane to me. I know ops is hard, but this level of consistent outage points to an endemic problem. Is it the legacy rails/mysql stack that is the largest culprit or is there systemic rot in the engineering org?
More likely, it's efforts to migrate away from the previously solid Rails stack to MS's preferred stack.
They've had a long history of this kind of stability issue when migrating or trying to migrate acquisitions from their previous stack to an MS one. This happened with Hotmail (Unix server -> Windows server), LinkedIn (custom cloud -> MS cloud) and others since.
I’ve had hardly any problems. Occasional issues, but rarely have I been impacted to the extent I notice for more than say an hour…. maybe I notice it a couple times a year.
My internet access at home is more likely the issue when I hit GitHub issues.
Every time there's a GitHub outage of any severity one of the top comments on HN is inevitably suggesting that it's probably due to Rails. It's getting pretty tiresome.
Calling it a "legacy rails" stack is incredibly disingenuous as well. It's not like they're running a 5 year old unsupported version of Rails/MySQL. GitHub runs from the Rails main branch - the latest stable version they possibly can - and they update several times per month.[^1] They're one of the largest known Rails code bases and contributors to the framework. Outside of maybe 37 Signals and Shopify they employ more experts in the framework and Ruby itself than any other company.
It's far more likely the issue is elsewhere in their stack. Despite running a rails monolith, GitHub is still a complex distributed system with many moving parts.
I feel like it's usually configuration changes and infra/platform issues, not code changes, that cause most outages these days. We're all A/B testing, canary deployments, and using feature flags to test actual code changes...
> Is it the legacy rails/mysql stack that is the largest culprit or is there systemic rot in the engineering org?
The culprit is change. Infra changes, config changes, new features, system state (os updates, building new images, rebooting, etc...), even fixing existing bugs all are larger changes to the system than most think. It's really remarkable at this point that Github is as stable as it is. It is a testament to the Github team they have been as stable as they are. It's not "rot" it's just a huge system.
And just as we're about to migrate 4 kubernetes clusters with a total of ~4k pods. Terraform in github actions on selfhosted runners and argoCD is failing.
Oh that sucks, there's always going to be those who will say that it's the price you pay for using Github, but locally hosted VCS and CI/CD systems have issues as well.
External dependencies are always problem, but do you have the capacity and resources required to manage those dependencies internally? Most don't and will still get a better product/service by using an external service.
That's where I feel like it's actually pretty nice to not have CI tied to your source code. It's probably more expensive to use Travis/Circle but at least you don't have a single point of failure for deploys.
Wouldn't it be wonderful if the most popular version control system was is decentralized?
This is achievable, and is the correct solution.
This way your git repo could be located on:
- GitHub
- Your Closet
(...)
- UCLA's supercomputer
- JBOD in Max Planck Institute
(...)
- GitLab
Doing this with a simple file with "[ipfs, github, gitlab]" on it would be revolutionary, especially for data version control, like nn weights or databases that are too large for git and cost too much on other services, as they would be free on ipf/torrent.
Then no one is phased by the inevitable failure of various companies.
Can't tell if this comment is sarcastic, but that's exactly what git is: Every clone of the repo is independent, and acts as a full backup. Likewise, a local repo can be pushed to various remotes, there is no inherent strong server-client coupling (even though it's often used in such a way).
Still having regular incidents at GitHub in 2024, even with Microsoft's infrastructure after 5 years since the acquisition with something always going down.
Just expect GitHub to go down at least once every month as it is that unreliable.
Full stack re-writes are not always the best way. Sometimes you end up with worse. Sometimes you end up with better. If you do go the 'full stack rewrite' you better have a decent plan in place. Because you are about to get to support 2 code bases for awhile.
[+] [-] acedTrex|2 years ago|reply
[+] [-] AlchemistCamp|2 years ago|reply
They've had a long history of this kind of stability issue when migrating or trying to migrate acquisitions from their previous stack to an MS one. This happened with Hotmail (Unix server -> Windows server), LinkedIn (custom cloud -> MS cloud) and others since.
[+] [-] nijave|2 years ago|reply
[+] [-] duxup|2 years ago|reply
Has it?
I’ve had hardly any problems. Occasional issues, but rarely have I been impacted to the extent I notice for more than say an hour…. maybe I notice it a couple times a year.
My internet access at home is more likely the issue when I hit GitHub issues.
[+] [-] izietto|2 years ago|reply
[+] [-] sonicanatidae|2 years ago|reply
Source: <-- OPs
[+] [-] dcchambers|2 years ago|reply
Calling it a "legacy rails" stack is incredibly disingenuous as well. It's not like they're running a 5 year old unsupported version of Rails/MySQL. GitHub runs from the Rails main branch - the latest stable version they possibly can - and they update several times per month.[^1] They're one of the largest known Rails code bases and contributors to the framework. Outside of maybe 37 Signals and Shopify they employ more experts in the framework and Ruby itself than any other company.
It's far more likely the issue is elsewhere in their stack. Despite running a rails monolith, GitHub is still a complex distributed system with many moving parts.
I feel like it's usually configuration changes and infra/platform issues, not code changes, that cause most outages these days. We're all A/B testing, canary deployments, and using feature flags to test actual code changes...
[^1]: https://github.blog/2023-04-06-building-github-with-ruby-and...
[+] [-] indymike|2 years ago|reply
The culprit is change. Infra changes, config changes, new features, system state (os updates, building new images, rebooting, etc...), even fixing existing bugs all are larger changes to the system than most think. It's really remarkable at this point that Github is as stable as it is. It is a testament to the Github team they have been as stable as they are. It's not "rot" it's just a huge system.
[+] [-] ahmgeek|2 years ago|reply
It's not rails nor MySQL, both proven good for years.
[+] [-] hunkins|2 years ago|reply
Hugs to the GitHub ops team.
[+] [-] efrecon|2 years ago|reply
[+] [-] aaomidi|2 years ago|reply
[+] [-] amelius|2 years ago|reply
[+] [-] sph|2 years ago|reply
We just hope SMTP keeps ticking along somehow or we're fcuked.
[+] [-] djbusby|2 years ago|reply
[+] [-] richardwhiuk|2 years ago|reply
[+] [-] robinhoodexe|2 years ago|reply
[+] [-] mrweasel|2 years ago|reply
External dependencies are always problem, but do you have the capacity and resources required to manage those dependencies internally? Most don't and will still get a better product/service by using an external service.
[+] [-] snarkyturtle|2 years ago|reply
[+] [-] cupofjoakim|2 years ago|reply
[+] [-] imdsm|2 years ago|reply
[+] [-] pklack|2 years ago|reply
[+] [-] inhumantsar|2 years ago|reply
[+] [-] jdthedisciple|2 years ago|reply
Sounds like all in good order then ...
[+] [-] chaxor|2 years ago|reply
This way your git repo could be located on: - GitHub - Your Closet (...) - UCLA's supercomputer - JBOD in Max Planck Institute (...) - GitLab
Doing this with a simple file with "[ipfs, github, gitlab]" on it would be revolutionary, especially for data version control, like nn weights or databases that are too large for git and cost too much on other services, as they would be free on ipf/torrent.
Then no one is phased by the inevitable failure of various companies.
[+] [-] richardwhiuk|2 years ago|reply
[+] [-] blauditore|2 years ago|reply
[+] [-] Bognar|2 years ago|reply
[+] [-] rvz|2 years ago|reply
Just expect GitHub to go down at least once every month as it is that unreliable.
This certainly has aged well: [0]
[0] https://news.ycombinator.com/item?id=22868406
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] dartos|2 years ago|reply
People really like avoiding ops
[+] [-] abhinavk|2 years ago|reply
_Maybe it’s time for rewriting it in Rust._
Edit: RIIR was said in jest. I forgot HN doesn’t support markdown.
[+] [-] sumtechguy|2 years ago|reply
edit: fair enough
[+] [-] bombcar|2 years ago|reply
Sorry everyone!
[+] [-] alexnewman|2 years ago|reply
[+] [-] ftkftk|2 years ago|reply
[+] [-] lillecarl|2 years ago|reply
[+] [-] iddan|2 years ago|reply
[+] [-] Narciss|2 years ago|reply