top | item 30711269

Incident with GitHub Actions, API requests, Codespaces, Git operations, Issues

267 points| naglis | 4 years ago |githubstatus.com | reply

118 comments

order
[+] Wavelets|4 years ago|reply
Whew, glad I decided to scroll HN right now. I've been puzzling over why I'm getting "! [remote rejected] master -> master (Internal Server Error)" as well while trying to push and decided to take a break.
[+] adelarsq|4 years ago|reply
Time to take some coffee and configure Vim
[+] forgingahead|4 years ago|reply
It's been like that for at least 6 hours, randomly appearing. I would take a pause and try again and then it would work, but now it's definitely much more persistent.

Guess it's time to go play some video games....

https://xkcd.com/303/

[+] dgellow|4 years ago|reply
Yep, same here! Good time to make a new coffee :)
[+] ahmadrosid|4 years ago|reply
Same here got rejected when push. ! [remote rejected] HEAD -> main (Internal Server Error)
[+] distartin|4 years ago|reply
Never really realized that github had many technical incidents lol
[+] lukeinator42|4 years ago|reply
same here, I was having internet issues yesterday, and now that my internet is working github isn't, haha.
[+] avar|4 years ago|reply
I'm finding that pushes do go through eventually, this is probably grossly irresponsible, so I don't recommend its use, but I remembered I had this old alias to "push harder" in my ~/.gitconfig:

    [alias]
    thrust = "!f() { until git push $@; do sleep 0.5; done; }; f"
I've done a few pushes so far, and found that it's going through in <10 tries or so.
[+] gfunk911|4 years ago|reply

  # Retries a command a with backoff.
  #
  # The retry count is given by ATTEMPTS (default 100), the
  # initial backoff timeout is given by TIMEOUT in seconds
  # (default 5.)
  #
  # Successive backoffs increase the timeout by ~33%.
  #
  # Beware of set -e killing your whole script!
  function try_till_success {
    local max_attempts=${ATTEMPTS-100}
    local timeout=${TIMEOUT-5}
    local attempt=0
    local exitCode=0

    while [[ $attempt < $max_attempts ]]
    do
      "$@"
      exitCode=$?

      if [[ $exitCode == 0 ]]
      then
        break
      fi

      echo "Failure! Retrying in $timeout.." 1>&2
      sleep $timeout
      attempt=$(( attempt + 1 ))
      timeout=$(( timeout * 40 / 30 ))
    done

    if [[ $exitCode != 0 ]]
    then
      echo "You've failed me for the last time! ($@)" 1>&2
    fi

    return $exitCode
  }
[+] hackandtrip|4 years ago|reply
Add some kind of exponential backoff to be a good citizen!
[+] totony|4 years ago|reply
>Service degradation

>Time for some manual DoS

[+] doersino|4 years ago|reply
TIL about "until" loops! How neat.
[+] svnpenn|4 years ago|reply
half a second? Jesus dude calm down.
[+] mkoubaa|4 years ago|reply
The delay makes me think you should use the German word for thrust
[+] 5e92cb50239222b|4 years ago|reply
It's fine. Maybe it will force them to finally start paying attention to the quality of their work. If crap I'm writing for a living was misbehaving that frequently, I'd be sweeping the streets by now (or doing some other work that's actually useful to society).
[+] everfrustrated|4 years ago|reply
Does anybody else remember when GitHub's outage page used to have little graphs showing downtime?

Eventually they took it down as their outages were just too often.

GitHub has _always_ had terrible uptime. It's a great product - wish something would change but it seems cultural at this point.

[+] 15characterslon|4 years ago|reply
They had massive problems with their main database cluster (MySQL). If you read through their engineering blog, most of the outages were related to their growth and the main database cluster. They moved workloads for some features to different clusters, but that's only to buy more time. Eventually they'll do proper shredding (by user or org I guess, not by feature) but that takes time.

Their engineering blog is full of articles about MySQL and the main "mysql1" database cluster, e.g. https://github.blog/2021-09-27-partitioning-githubs-relation...

[+] pythux|4 years ago|reply
I have no idea if this is remotely close to reality but, what if, their culture of breaking things and bad uptime is what allowed them to move fast and build a great product in the first place?
[+] intsunny|4 years ago|reply
Whew, outage timestamps in UTC.

Now I won't have to know what time is it California, and if California currently has PST, PDT, PTSD, etc

[+] pdenton|4 years ago|reply
As someone with diagnosed PTSD, I never thought I'd psychologically level with an entire state ;)
[+] omegalulw|4 years ago|reply
To anyone who is reading this and genuinely wants to know: it's PDT, UTC-7.
[+] candiddevmike|4 years ago|reply
This is causing actions jobs to hang after completing, consuming precious minutes. I don't think I've ever seen a refund when this happens, so I recommend everyone check their jobs and cancel them for now.
[+] deckard1|4 years ago|reply
Two days they have been down now. Github has, by far, the worst uptime of any critical service I've seen going on multiple years now.
[+] jetpackjoe|4 years ago|reply
The github.com homepage, as well as api (via `gh`) are not working for me either.
[+] jetpackjoe|4 years ago|reply
Their status page is reflecting the new outages. Good on GitHub for actually updating that quickly.
[+] niel|4 years ago|reply
> The github.com homepage

Only while logged in, it seems.

[+] arpinum|4 years ago|reply
These incidents have to hurt Azure's brand value. It's a monster task to run something as big as GitHub, if they ever get it stable it will lend a lot of credibility to Microsoft's cloud skills.
[+] ryanbrunner|4 years ago|reply
There's not really all that much pointing to an infrastructure level failure - it's possible, but it's just as likely it's an application-level failure somewhere in Github's code. The API is returning 500s and not 503s and the failure is relatively quick, so it's not obviously a server outage.
[+] zinekeller|4 years ago|reply
Serious questions:

1) Is GitHub runing under Azure's technology stack?

2) Is GitHub under Azure's mamagement (in contrast to Visual Studio's team)?

I'm not sure about two but I'm pretty sure that GitHub doesn't run under Azure at all, considering that GitHub has fully separate networking from MSN's/Azure's (and GitHub's machines do pingback unlike most of Microsoft's machines which don't).

[+] gtirloni|4 years ago|reply
GitHub is pretty stable. What are you talking about? I doubt most GitHub users know it's on Azure.
[+] jaywalk|4 years ago|reply
I don't consider this a reflection on Azure at all. It's really just a reflection on GitHub under Microsoft's leadership.
[+] jakub_g|4 years ago|reply
At least one good thing about GH is that while things break, the status page is updated relatively fast compared to other companies, when all HN knows about outage for 1h+ until it's acknowledged.
[+] bloopernova|4 years ago|reply
And of course my developer teammates are still trying to merge PRs.

I don't care that it works "some of the time"! Don't mess with the repos when the repo host is having seemingly random issues.

[+] fritzo|4 years ago|reply
For example: while actions are down, branches can be merged without ci tests passing, even for protected branches. This just happened on one of my repos.
[+] PeterBarrett|4 years ago|reply
One of our systems runs AWS code repository in parallel to Github and builds are triggered from there (but not in us-east-1). Time to migrate the rest of our systems to having that fallback.
[+] lebski88|4 years ago|reply
It's almost the same time as their incident yesterday too. Although today the scope is wider - yesterday it was Webhooks and Actions. Today core git is broken as well as the APIs.
[+] pm90|4 years ago|reply
Yep. I hope they post an aws style postmortem… this is kinda ridiculous (although I do empathize as an ops person). Webhooks breaking broke all of our pr bots bringing development to a standstill yesterday; today everything seems f’d.
[+] WFHRenaissance|4 years ago|reply
Looks like the drinking started early at GitHub... good on them!
[+] timeimp|4 years ago|reply
It’s not DNS

There’s no way it’s DNS

It was DNS

[+] rvz|4 years ago|reply
Here we go again. GitHub going completely down at least once a month as I said. [0] So nothing has changed. That is excluding the smaller intermittent issues. Let's see if anyone implemented a self-hosted backup or failsafe just in case.

Oh dear.

[0] https://news.ycombinator.com/item?id=30149071

[+] bastardoperator|4 years ago|reply
The entire point of git is that it's decentralized, lol. If I've cloned locally like millions of people do daily, I have a backup.
[+] can16358p|4 years ago|reply
At some point GitHub main page 500'ed for me. The problem is probably somewhere down to the core, not at something isolated.
[+] lambda_dn|4 years ago|reply
This is why you should have your code on multiple remotes, i.e. Azure DevOps, Git labs, self hosted git server.