I've been using DigitalOcean's App Platform as an alternative. It's not identical, but it's really close with the git-base deployments, etc. Anyone else can share their thoughts too.
A Heroku app of mine went down starting at ~7:43pm Eastern US (I KNOW, I need to migrate it elsewhere), and in a separate, weird "thing is down" instance, some pages in the Epic healthcare app seem to be offline. Also seeing that YouTube issue.
The fact that your database is reachable but the app isn't after restart points to the routing layer being toast. If you can ssh into the dyno or check logs, look for anything related to the router handshake timing out. Usually when deploys work but traffic doesn't route, it's something between the load balancer and your instances.
The correlation with scheduled restarts is interesting though. Makes me wonder if there's a cert validation issue on boot that's causing new instances to fail health checks.
One of my Heroku apps is down, and seems to have gone down immediately after a scheduled daily restart. The others all work, and I can still access my database directly. Kinda fun trying to reverse engineer what's going on with the limited information we have!
The status page lag is the worst part. Hard to debug when you're questioning whether it's your config or their infrastructure. At least when multiple providers go down at once you know it's not just you going insane.
As the other commenter said, pretty sure it's at the routing layer. Deploys, start ups, all working just fine for our apps. But router can't reach them. Related to the Google CA outage maybe?
Yea, we have several applications on heroku. Only failures are the ones that had platform initiated restarts after 5pm MST. Likely tried to get a new cert on restart and then fails. Don't deploy or do manual restarts right now.
Not sure if this has any relation but Youtube front page since a couple minutes is void of video's, just top menu and a black page, and the recommended videos sidebar is completely empty too.
No sign on the status page. But two separate / unrelated accounts I know about are fully down right now. Data services / backend workers seem fine, web/routing layer seems to be dead.
The routing layer being down while workers/data services stay up is such a specific failure mode. Usually means the load balancers or edge routing got corrupted somehow, not the actual compute infrastructure.
If you're serious about migrating off (and not just saying it in the heat of the moment), the main thing is having a plan for the database migration. That's always the painful part. Everything else is just Docker containers that run anywhere.
jameshuntdo|6 days ago
robertwpearce|12 days ago
Related comments on reddit:
* https://www.reddit.com/r/Heroku/comments/1r7o1hk/outage/ * https://www.reddit.com/r/youtube/comments/1r7onh1/youtube_we...
ktaraszk|12 days ago
The correlation with scheduled restarts is interesting though. Makes me wonder if there's a cert validation issue on boot that's causing new instances to fail health checks.
mullingitover|12 days ago
Unusual to see all three blowing up at the same time.
jskopek|12 days ago
ktaraszk|12 days ago
BryanBeshore|12 days ago
aaronmiler|12 days ago
aaronmiler|12 days ago
Gallows4574|12 days ago
rapnie|12 days ago
taylorhughes|12 days ago
ktaraszk|12 days ago
If you're serious about migrating off (and not just saying it in the heat of the moment), the main thing is having a plan for the database migration. That's always the painful part. Everything else is just Docker containers that run anywhere.