Whew, glad I decided to scroll HN right now. I've been puzzling over why I'm getting "! [remote rejected] master -> master (Internal Server Error)" as well while trying to push and decided to take a break.
It's been like that for at least 6 hours, randomly appearing. I would take a pause and try again and then it would work, but now it's definitely much more persistent.
I'm finding that pushes do go through eventually, this is probably grossly irresponsible, so I don't recommend its use, but I remembered I had this old alias to "push harder" in my ~/.gitconfig:
[alias]
thrust = "!f() { until git push $@; do sleep 0.5; done; }; f"
I've done a few pushes so far, and found that it's going through in <10 tries or so.
# Retries a command a with backoff.
#
# The retry count is given by ATTEMPTS (default 100), the
# initial backoff timeout is given by TIMEOUT in seconds
# (default 5.)
#
# Successive backoffs increase the timeout by ~33%.
#
# Beware of set -e killing your whole script!
function try_till_success {
local max_attempts=${ATTEMPTS-100}
local timeout=${TIMEOUT-5}
local attempt=0
local exitCode=0
while [[ $attempt < $max_attempts ]]
do
"$@"
exitCode=$?
if [[ $exitCode == 0 ]]
then
break
fi
echo "Failure! Retrying in $timeout.." 1>&2
sleep $timeout
attempt=$(( attempt + 1 ))
timeout=$(( timeout * 40 / 30 ))
done
if [[ $exitCode != 0 ]]
then
echo "You've failed me for the last time! ($@)" 1>&2
fi
return $exitCode
}
It's fine. Maybe it will force them to finally start paying attention to the quality of their work. If crap I'm writing for a living was misbehaving that frequently, I'd be sweeping the streets by now (or doing some other work that's actually useful to society).
They had massive problems with their main database cluster (MySQL). If you read through their engineering blog, most of the outages were related to their growth and the main database cluster. They moved workloads for some features to different clusters, but that's only to buy more time. Eventually they'll do proper shredding (by user or org I guess, not by feature) but that takes time.
I have no idea if this is remotely close to reality but, what if, their culture of breaking things and bad uptime is what allowed them to move fast and build a great product in the first place?
This is causing actions jobs to hang after completing, consuming precious minutes. I don't think I've ever seen a refund when this happens, so I recommend everyone check their jobs and cancel them for now.
These incidents have to hurt Azure's brand value. It's a monster task to run something as big as GitHub, if they ever get it stable it will lend a lot of credibility to Microsoft's cloud skills.
There's not really all that much pointing to an infrastructure level failure - it's possible, but it's just as likely it's an application-level failure somewhere in Github's code. The API is returning 500s and not 503s and the failure is relatively quick, so it's not obviously a server outage.
1) Is GitHub runing under Azure's technology stack?
2) Is GitHub under Azure's mamagement (in contrast to Visual Studio's team)?
I'm not sure about two but I'm pretty sure that GitHub doesn't run under Azure at all, considering that GitHub has fully separate networking from MSN's/Azure's (and GitHub's machines do pingback unlike most of Microsoft's machines which don't).
At least one good thing about GH is that while things break, the status page is updated relatively fast compared to other companies, when all HN knows about outage for 1h+ until it's acknowledged.
For example: while actions are down, branches can be merged without ci tests passing, even for protected branches. This just happened on one of my repos.
One of our systems runs AWS code repository in parallel to Github and builds are triggered from there (but not in us-east-1). Time to migrate the rest of our systems to having that fallback.
It's almost the same time as their incident yesterday too. Although today the scope is wider - yesterday it was Webhooks and Actions. Today core git is broken as well as the APIs.
Yep. I hope they post an aws style postmortem… this is kinda ridiculous (although I do empathize as an ops person). Webhooks breaking broke all of our pr bots bringing development to a standstill yesterday; today everything seems f’d.
Here we go again. GitHub going completely down at least once a month as I said. [0] So nothing has changed. That is excluding the smaller intermittent issues. Let's see if anyone implemented a self-hosted backup or failsafe just in case.
[+] [-] Wavelets|4 years ago|reply
[+] [-] adelarsq|4 years ago|reply
[+] [-] forgingahead|4 years ago|reply
Guess it's time to go play some video games....
https://xkcd.com/303/
[+] [-] dgellow|4 years ago|reply
[+] [-] ahmadrosid|4 years ago|reply
[+] [-] distartin|4 years ago|reply
[+] [-] lukeinator42|4 years ago|reply
[+] [-] m3nu|4 years ago|reply
[+] [-] avar|4 years ago|reply
[+] [-] gfunk911|4 years ago|reply
[+] [-] hackandtrip|4 years ago|reply
[+] [-] totony|4 years ago|reply
>Time for some manual DoS
[+] [-] doersino|4 years ago|reply
[+] [-] svnpenn|4 years ago|reply
[+] [-] mkoubaa|4 years ago|reply
[+] [-] 5e92cb50239222b|4 years ago|reply
[+] [-] everfrustrated|4 years ago|reply
Eventually they took it down as their outages were just too often.
GitHub has _always_ had terrible uptime. It's a great product - wish something would change but it seems cultural at this point.
[+] [-] 15characterslon|4 years ago|reply
Their engineering blog is full of articles about MySQL and the main "mysql1" database cluster, e.g. https://github.blog/2021-09-27-partitioning-githubs-relation...
[+] [-] pythux|4 years ago|reply
[+] [-] intsunny|4 years ago|reply
Now I won't have to know what time is it California, and if California currently has PST, PDT, PTSD, etc
[+] [-] pdenton|4 years ago|reply
[+] [-] omegalulw|4 years ago|reply
[+] [-] candiddevmike|4 years ago|reply
[+] [-] deckard1|4 years ago|reply
[+] [-] jetpackjoe|4 years ago|reply
[+] [-] jetpackjoe|4 years ago|reply
[+] [-] niel|4 years ago|reply
Only while logged in, it seems.
[+] [-] arpinum|4 years ago|reply
[+] [-] ryanbrunner|4 years ago|reply
[+] [-] zinekeller|4 years ago|reply
1) Is GitHub runing under Azure's technology stack?
2) Is GitHub under Azure's mamagement (in contrast to Visual Studio's team)?
I'm not sure about two but I'm pretty sure that GitHub doesn't run under Azure at all, considering that GitHub has fully separate networking from MSN's/Azure's (and GitHub's machines do pingback unlike most of Microsoft's machines which don't).
[+] [-] gtirloni|4 years ago|reply
[+] [-] jaywalk|4 years ago|reply
[+] [-] jakub_g|4 years ago|reply
[+] [-] bloopernova|4 years ago|reply
I don't care that it works "some of the time"! Don't mess with the repos when the repo host is having seemingly random issues.
[+] [-] fritzo|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] PeterBarrett|4 years ago|reply
[+] [-] lebski88|4 years ago|reply
[+] [-] pm90|4 years ago|reply
[+] [-] WFHRenaissance|4 years ago|reply
[+] [-] timeimp|4 years ago|reply
There’s no way it’s DNS
It was DNS
[+] [-] rvz|4 years ago|reply
Oh dear.
[0] https://news.ycombinator.com/item?id=30149071
[+] [-] bastardoperator|4 years ago|reply
[+] [-] can16358p|4 years ago|reply
[+] [-] lambda_dn|4 years ago|reply
[+] [-] kitten_mittens_|4 years ago|reply
[+] [-] Xarodon|4 years ago|reply
[+] [-] sinkensabe|4 years ago|reply