The conspiracy about how much infrastructure has been going down this summer, and how it's probably interns / a state actor / people pushing to get things in production before performance reviews / people being on holiday / that we've reached peak complexity.
Publicly visible outages on "pro" internet properties are actually a good thing because they help correct the views so commonly held by low-information executives about 5-nines and uptime. The more ammo people have, in aggregate, that "XXX never goes down" is a lie, the better off we will be as an industry.
I kind of wish HN wouldn't show these outages unless they're going on for multiple hours or at least hide them after they're back up. Usually by the time I see them and then check myself the site is already back up. Maybe I just need to read HN more often.
Yea, there are a lot of outage posts. It does kinda illustrate how many people depend on these core services though; and how little distributeiness or elasticity is built into the system.
Linux repos and ISO download sites have TONS of mirrors, sometimes contributed by companies, but also a lot of universities, government and public institutions.
Why is one of the largest core open source repositories centralized and private? Why are we so dependent on AWS, Google and others for large chunks of Internet services to be accessible?
There is a deeper question here than just the outages; but rather why they affect so many people.
I find the post-mortems helpful. I don't want to make the same mistakes with my infrastructure, and it gives me something to reference when, for instance, the upper management decides they want to migrate to one of these services.
I like those, even when it's been down for one or two hours only, but the title should be edited so that we know the outage is over. I'd still prefer this thread to be replaced by a postmortem but I don't think there's one yet.
Last night I wrote a little spreadsheet for myself showing how much time I spend doing various things. In particular, I for one do NOT need to read HN more often
Events like this make me realize how much I rely on my CI/CD pipeline for working on new features, deploying etc. I am often too lazy to run E2E tests on my machine if I know that on CI it will take only a few minutes to be done.
I feel guilty because I’ve never installed GitLab’s CI process locally and have only run it for the past few years through git commits. I keep expecting someone to complain, but so far, so good.
I expect similar poor behavior testing GitHub’s actions because it’s so convenient to only run through the service.
It is important though that it’s possible to run without CI/cd. If it wasn’t available I would complain because of the risk of lock-in.
Funny thing, I didn't even notice. In fact, being heavily reliant on Git and Github, I encountered maybe one or two issues a year when I couldn't push to Github because it was unavailable. In those cases, the progress can still be made locally, my workflow was barely affected. Of course, the deployment was impossible, but the downtime was never significant enough to worry or do anything about.
Git is designed as distributed VCS, so an outage of a centralized server should not matter too much. So far, the theory. Using Github excessively is the way back to centralization, nowadays handing control over to Microsoft. Everybody who is concerned (probably not too much) should have a look into alternatives.
Git itself is decentralised, but many (most? (all?)) of the workflows people build around it are not.
Individuals and teams who already have what they need in the local repos can continue to work through an outage of the VCS part of github, but at the points where they need to collaborate (merging each other's changes, issue tracking, etc.) the workflows break down. Yes you could share changes in a more distributed manner, or workaround the outage in other ways, but in reality people will stop and wait for the central repo to be available again. Also, pulling changes from my repo to yours directly to avoid that part of the downed service doesn't solve the issue tracker or CI manager also being down. The clue is in the name: git HUB.
That said, the issue is similar to how people see the results of aviation accidents: a jumbo going down takes out a lot of people but statistically, considering all air journeys, that is a lot less than the deaths that would happen due to equivalent car journeys. I suspect that everyone being a bit inconvenienced for a while when github/gitlab/other-centralised-service has issues like today doesn't add up to as much as the many small individual inconveniences they'd experience over time with their own local instances of, for example, gitlab, especially if including the routine maintenance time required which doesn't exist as much when using github, or gitlab managed instead of self-hosted - the outage just feels bigger than all the little problems because everyone is affected at once.
There are so many ways we become dependent of github, that nobody will shame on us because we are impacted by a github outage. In this context, is it useful to protect against outages ? It may be cheaper to do nothing.
We're self hosting Gogs on a Hetzner VM and it's been a great experience. Of course it has less features. But it's simple and fast and I do about 10 minutes sys admin per quarter.
If you don't the the "community" features of GitHub, we have had very good results with Amazon Code Commit. I've never experienced an outage, it's easy to manage, and very inexpensive.
I see posts like this make the front page of hacker news every so often, but I don't recall seeing posts about things like making list outages for something like the LKML (Linux Kernel Mailing List) or an IRC network like Freenode.
When was the last time ALL freenode servers were offline? I know about occasional netsplits but those do not affect all of their servers and freenode itself is still operational.
Justyna yesterday I was complaining about GitHub monoculture in the context of Atlassian sunsetting Mercurial.
Outages like this are normal, but the problem is lack of good competitors.
The problem is bad design and/or integration of components. VMS clusters over leased lines went years without outages. Record was 17 years at a rail yard. They also did rolling upgrades across both different OS versions and CPU architectures. The methods are public if any competitors want to match or exceed them:
These modern cloud services using Linux ecosystem are doing rolling outages instead. I'm amazed people keep making them dependencies. Maybe their local systems using similar technology went down more often. Maybe this is an improvement. I'd still put mine on an OpenBSD or OpenVMS cluster if it was business-critical with lots of money on the line. I want it staying up, up, up. :)
There are quite a decent list of competitors, all of which have similar or sometimes better offerings. There are also plenty of fully free and open source solutions, should you want to do things yourself.
I think you're right about a general move towards a Git monoculture (as opposed to a GitHub one), but there are dozens of great Git repo hosting/collaboration companies - GitLab, Bitbucket & SourceForge to name a few of the bigger ones
I've been having trouble getting GitHub to send me a verification email (after adding a new email address) for over a week. Anybody else having this problem? I wonder if it's connected.
[+] [-] yjftsjthsd-h|6 years ago|reply
Monoculture bad
Microsoft bad
Git is distributed but all the things around it that we really need aren't
You can set up git to push to multiple remotes automatically
Nobody is actually using git in distributed mode
Did I forget anything?
[+] [-] jddj|6 years ago|reply
[+] [-] davestephens|6 years ago|reply
[+] [-] HelloNurse|6 years ago|reply
[+] [-] foobiekr|6 years ago|reply
[+] [-] techntoke|6 years ago|reply
[+] [-] SilasX|6 years ago|reply
[+] [-] jolmg|6 years ago|reply
Can anyone recommend an issue tracker that is distributed with the repo?
[+] [-] maxerickson|6 years ago|reply
And the meta-meta follow on.
[+] [-] jmisavage|6 years ago|reply
[+] [-] djsumdog|6 years ago|reply
Linux repos and ISO download sites have TONS of mirrors, sometimes contributed by companies, but also a lot of universities, government and public institutions.
Why is one of the largest core open source repositories centralized and private? Why are we so dependent on AWS, Google and others for large chunks of Internet services to be accessible?
There is a deeper question here than just the outages; but rather why they affect so many people.
[+] [-] vallismortis|6 years ago|reply
[+] [-] k_|6 years ago|reply
[+] [-] jammygit|6 years ago|reply
Last night I wrote a little spreadsheet for myself showing how much time I spend doing various things. In particular, I for one do NOT need to read HN more often
[+] [-] krzkaczor|6 years ago|reply
[+] [-] prepend|6 years ago|reply
I expect similar poor behavior testing GitHub’s actions because it’s so convenient to only run through the service.
It is important though that it’s possible to run without CI/cd. If it wasn’t available I would complain because of the risk of lock-in.
[+] [-] olah_1|6 years ago|reply
https://radicle.xyz/
A Radicle project contains a git repository, plus the associated issues and proposals.
^ Neat project
[+] [-] alexeiz|6 years ago|reply
[+] [-] smartmic|6 years ago|reply
[+] [-] dspillett|6 years ago|reply
Individuals and teams who already have what they need in the local repos can continue to work through an outage of the VCS part of github, but at the points where they need to collaborate (merging each other's changes, issue tracking, etc.) the workflows break down. Yes you could share changes in a more distributed manner, or workaround the outage in other ways, but in reality people will stop and wait for the central repo to be available again. Also, pulling changes from my repo to yours directly to avoid that part of the downed service doesn't solve the issue tracker or CI manager also being down. The clue is in the name: git HUB.
That said, the issue is similar to how people see the results of aviation accidents: a jumbo going down takes out a lot of people but statistically, considering all air journeys, that is a lot less than the deaths that would happen due to equivalent car journeys. I suspect that everyone being a bit inconvenienced for a while when github/gitlab/other-centralised-service has issues like today doesn't add up to as much as the many small individual inconveniences they'd experience over time with their own local instances of, for example, gitlab, especially if including the routine maintenance time required which doesn't exist as much when using github, or gitlab managed instead of self-hosted - the outage just feels bigger than all the little problems because everyone is affected at once.
[+] [-] reacweb|6 years ago|reply
[+] [-] csomar|6 years ago|reply
[+] [-] scandox|6 years ago|reply
[+] [-] enlyth|6 years ago|reply
[+] [-] fortran77|6 years ago|reply
https://aws.amazon.com/codecommit/
[+] [-] u801e|6 years ago|reply
[+] [-] objclxt|6 years ago|reply
[+] [-] johnisgood|6 years ago|reply
When was the last time ALL freenode servers were offline? I know about occasional netsplits but those do not affect all of their servers and freenode itself is still operational.
[+] [-] xchaotic|6 years ago|reply
[+] [-] nickpsecurity|6 years ago|reply
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na...
These modern cloud services using Linux ecosystem are doing rolling outages instead. I'm amazed people keep making them dependencies. Maybe their local systems using similar technology went down more often. Maybe this is an improvement. I'd still put mine on an OpenBSD or OpenVMS cluster if it was business-critical with lots of money on the line. I want it staying up, up, up. :)
[+] [-] arghwhat|6 years ago|reply
The Github monopoly is driven by its users.
[+] [-] consolenaut|6 years ago|reply
[+] [-] cproctor|6 years ago|reply
[+] [-] knorker|6 years ago|reply
[+] [-] kome|6 years ago|reply
[+] [-] sergiotapia|6 years ago|reply
[+] [-] lousken|6 years ago|reply
[deleted]
[+] [-] truegrithub|6 years ago|reply
[deleted]
[+] [-] justinator|6 years ago|reply
[+] [-] janpot|6 years ago|reply