Heroku Seems to Be Down

jameshuntdo|6 days ago

I've been using DigitalOcean's App Platform as an alternative. It's not identical, but it's really close with the git-base deployments, etc. Anyone else can share their thoughts too.

robertwpearce|12 days ago

A Heroku app of mine went down starting at ~7:43pm Eastern US (I KNOW, I need to migrate it elsewhere), and in a separate, weird "thing is down" instance, some pages in the Epic healthcare app seem to be offline. Also seeing that YouTube issue.

Related comments on reddit:

* https://www.reddit.com/r/Heroku/comments/1r7o1hk/outage/ * https://www.reddit.com/r/youtube/comments/1r7onh1/youtube_we...

ktaraszk|12 days ago

The fact that your database is reachable but the app isn't after restart points to the routing layer being toast. If you can ssh into the dyno or check logs, look for anything related to the router handshake timing out. Usually when deploys work but traffic doesn't route, it's something between the load balancer and your instances.

The correlation with scheduled restarts is interesting though. Makes me wonder if there's a cert validation issue on boot that's causing new instances to fail health checks.

mullingitover|12 days ago

Perhaps related: on Downdetector right now I see outages reported in the past few minutes for GCP, Cloudflare, and AWS. Youtube is also down.

Unusual to see all three blowing up at the same time.

jskopek|12 days ago

One of my Heroku apps is down, and seems to have gone down immediately after a scheduled daily restart. The others all work, and I can still access my database directly. Kinda fun trying to reverse engineer what's going on with the limited information we have!

ktaraszk|12 days ago

The status page lag is the worst part. Hard to debug when you're questioning whether it's your config or their infrastructure. At least when multiple providers go down at once you know it's not just you going insane.

BryanBeshore|12 days ago

Nice of Heroku/salesforce to join the party - an hour late: https://status.salesforce.com/incidents/20003708

aaronmiler|12 days ago

Love the back dating like it has been there the whole time

aaronmiler|12 days ago

As the other commenter said, pretty sure it's at the routing layer. Deploys, start ups, all working just fine for our apps. But router can't reach them. Related to the Google CA outage maybe?

Gallows4574|12 days ago

Yea, we have several applications on heroku. Only failures are the ones that had platform initiated restarts after 5pm MST. Likely tried to get a new cert on restart and then fails. Don't deploy or do manual restarts right now.

rapnie|12 days ago

Not sure if this has any relation but Youtube front page since a couple minutes is void of video's, just top menu and a black page, and the recommended videos sidebar is completely empty too.

taylorhughes|12 days ago

No sign on the status page. But two separate / unrelated accounts I know about are fully down right now. Data services / backend workers seem fine, web/routing layer seems to be dead.

ktaraszk|12 days ago

The routing layer being down while workers/data services stay up is such a specific failure mode. Usually means the load balancers or edge routing got corrupted somehow, not the actual compute infrastructure.

If you're serious about migrating off (and not just saying it in the heat of the moment), the main thing is having a plan for the database migration. That's always the painful part. Everything else is just Docker containers that run anywhere.

14 comments