top | item 23817794

GitHub was down

173 points| nanddalal | 5 years ago |status.github.com | reply

124 comments

order
[+] Illniyar|5 years ago|reply
Seems like things started to go down the drain somewhere around February 2020.

https://www.githubstatus.com/uptime/kr09ddfgbfsf?page=2

Wonder what was the trigger for the reliability hit - actions went GA on nov 2019, so it's something else (or possibly a combination of things)

[+] tjomk|5 years ago|reply
Maybe they're migrating their infra to Azure?
[+] kenhwang|5 years ago|reply
COVID related work from home adjustments is my guess.
[+] mindfreeze|5 years ago|reply
I am thinking when was the last time GitLab went down, I rarely or did not see downtime experience like this bad with Gitlab, I was seeing history,

https://status.gitlab.com/pages/history/

They do have some latency or slowness issues, but couldn't find like whole system down thing,

Like in one of the comments here, reminded me of 2017 incident, https://about.gitlab.com/blog/2017/02/10/postmortem-of-datab... They should have improved a lot by now, but still I am curious, why such large or frequent downtimes are happening to GitHub. Is it due to making it more open for teams with Private repos, and more perks along with quarantine and WFH things

[+] humaid|5 years ago|reply
That GitLab downtime happened when we had deadlines, luckily git isn't a centralised platform so we merged our changes on a new GitHub repo we created.

Also the GitLab sluggishness reminds me of their daemon which kills the server to control memory leaks[1], although this probably isn't the main cause of the platform's slowness.

[1]: https://about.gitlab.com/blog/2015/06/05/how-gitlab-uses-uni...

[+] rvz|5 years ago|reply
14 Days ago, they went down [0]. And today it's happening again. Twice in less than a month.

Another reminder to self host via solutions like GitLab or Gitea. [1]

[0] https://news.ycombinator.com/item?id=23675864

[1] https://news.ycombinator.com/item?id=23676072

[+] gregoriol|5 years ago|reply
Could you provide details on how you plan to be more reliable with a self-hosted solution? what kind of archtecture would you use? how many people will be involved in maintaining?
[+] dt3ft|5 years ago|reply
I would choose self-hosting for small to medium size teams any day. I can't fathom why people choose not to self-host at this scale. Your data. Your control. Your network. Your infrastructure. Your responsibility. Are people becoming more afraid of responsibility these days?
[+] trulyrandom|5 years ago|reply
Self hosting GitHub is an option as well. We've been self hosting GitHub Enterprise for years and it has had no downtime other than during scheduled updates (at night, when nobody's at work).
[+] dtech|5 years ago|reply
self-hosting requires quite a bit of scale to be more reliable, otherwise you'll most likely still have possibly longer outages, just at different times.
[+] rplnt|5 years ago|reply
You can self-host GitHub too. Or BitBucket.
[+] thih9|5 years ago|reply
Looks like they need some high level analysis on why these outages happen so often.

At this point it feels like it’s no longer a series of accidents and that they should improve something.

[+] haik90|5 years ago|reply
after this downtime today. We're finally start discussing (again) to use our own Gitlab.
[+] rplnt|5 years ago|reply
It's important to note that it's the website that is down. Git itself worked through all the outages I can remember. Unfortunately, at least the last time I think, the integrations didn't.
[+] anemic|5 years ago|reply
Protip:

Always have an extra customer, like the flowershop downstairs. Let her borrow your wifi in exchange for some office flowers. Now she is technically your customer.

When your shit goes down and nothing works you can still write "some of our customers are experiencing issues" in the statuspage as the flowershop still has wifi (hopefully).

[+] quyleanh|5 years ago|reply
I still don't understand people who always mentions to Microsoft's acquisition. Until the official statement, it isn't Microsoft failure. Don't blame them.
[+] CathedralBorrow|5 years ago|reply
I think you're right in that blaming Microsoft without any evidence is probably a mistake, but I also think "until the official statement, it isn't Microsoft failure" is putting a bit too much faith in corporate PR as a source of truth.
[+] MattGaiser|5 years ago|reply
It’s a pattern. GitHub is now down a lot.
[+] neuronic|5 years ago|reply
Blaming without context or proof, outrage culture and assaulting people who were not convicted of anything yet is the Internet's primary sport.
[+] jaekash|5 years ago|reply
You know if you use Microsoft products every day, and every day they let you down, and every day you experience the worst most unintuitive design in the world, and every day you have to deal with their reliability issues, and then MS acquires github, and github starts to behave like everything else MS touches ...

Clearly something about how MS runs is responsible for their past outcomes, why is it a stretch to assume it is responsible for another similar outcome?

It is like saying we don't know the rotation of the earth is why the sun rose this morning because we have not had an official investigation into the matter.

[+] aspectmin|5 years ago|reply
Is there historical data of Github Uptime/Downtime? CSV format or other?

I'd love to do some analysis on how things were pre, vs post the acquisition (and trends in availability)

[+] dtech|5 years ago|reply
They have some self-reported data [1]. You could scrape that and transform it.

Eyeball analysis suggests it started in december 2019-february 2020, and rapidly went downhill starting april.

[1] https://www.githubstatus.com/uptime?page=2

[+] mullikine|5 years ago|reply
I have noticed a pattern than when I generate markdown from org-mode and have the text 'language' selected for highlighting push, this causes github to hang like crazy. I don't think I'm crazy in thinking it might be me. I push frequently to my blog and am starting to notice a correlation.

I export this into the below markdown.

    #+BEGIN_SRC text -n :f "translate-shell -s fr -t en" :async :results verbatim code
      I learned some French so that I can talk to
      you during tennis. I hope I know enough so you
      will not get bored.
    #+END_SRC
When I get a page build failure it's usually my fault for creating .

This is the markdown which was pushed to my blog. The 'Page Build failure' messages take a long time to arrive to my inbox and I can see that the page build is hanging.

    {{< highlight text "linenos=table, linenostart=1" >}}
    I learned some French so that I can talk to
    you during tennis. I hope I know enough so you
    will not get bored.
    {{< /highlight >}}
[+] gitgud|5 years ago|reply
Can't wait for the article about this outage, what will it be?

- Auto-scaling issue

- DDOS

- DNS error

- Datacenter outage

Any other possible problems?

[+] darkwater|5 years ago|reply
- Kubernetes control plane screw-up
[+] kchoudhu|5 years ago|reply
Second time in what, a week? What is going on at Microsoft?
[+] jaekash|5 years ago|reply
What is going on that people expect something better from Microsoft? Really this is quite on par with the quality they deliver. The only surprising thing is that people are surprised by this.
[+] echelon|5 years ago|reply
Well, there goes my night. I was waiting on a build triggered by Github actions and was wondering what was up.

I guess this is my sign to get some sleep.

Microsoft needs to slow things down and focus on stability. This really isn't good. I need these weekend and late night hours for my side hustle. I already have enough trouble as is, I don't need an injection of additional difficulty. (That's just my frustration; I can't imagine what y'all are all going through.)

They're making some very frustrating choices lately. Their redesign broke READMEs with tables (which now require horizontal scrolling), and they don't seem to care about all the repos they impacted.

Pull it together, Microsoft.

[+] jaekash|5 years ago|reply
Why are you not using an alternative?
[+] onyb|5 years ago|reply
That moment when you submit a very long comment on GitHub, and realise that it is down. :(
[+] zhdc1|5 years ago|reply
I received a 500 error when I went to GitHub, and the first thought on my mind was to check Hacker News.

I wasn't disappointed.

In all fairness, I GitHub has more or less been fairly reliable, minus whatever has been going on over the last week.

[+] mundanevoice|5 years ago|reply
Move off Rails now GitHub, is clearly not working for you. It took you this far, now go move to something that doesn't sh*ts the bed twice every month.
[+] brobinson|5 years ago|reply
Rails is (apparently still) a popular target for haters, but for projects at Github's scale it's rarely a code logic/framework-level blunder that's taking the service down. It's generally a cascade of failures in things like multiple database systems, auto-scaling, dns/caching, etc.
[+] noble_pleb|5 years ago|reply
Flask (python) or even CodeIgniter (php) are my frameworks of choice for this sort of thing. They may be a bit old and organizing a large project could be difficult but nothing can beat them on performance!
[+] niffydroid|5 years ago|reply
Github is probably still more reliable than Bitbucket, that has weekly disruptions (small but you do get a performance impact)
[+] NiceWayToDoIT|5 years ago|reply
On June 29th they also had outage (lasting 2 hrs), does anyone know what was the cause back then?
[+] dmpetrov|5 years ago|reply
Why this is usually happen on weekends? The only time that I have for coding :)
[+] svntid|5 years ago|reply
yet again - I have not experienced anywhere close the amount of outage before Github was swallowed by Micro$oft