Docker Systems Status: Full Service Disruption

tj_591|4 months ago

Hi all, Tushar from Docker here. We’re sorry about the impact our current outage is having on many of you. Yes, this is related to the ongoing AWS incident and we’re working closely with AWS on getting our services restored. We’ll provide regular updates on dockerstatus.com .

We know how critical Docker Hub and services are to millions of developers, and we’re sorry for the pain this is causing. Thank you for your patience as we work to resolve this incident. We’ll publish a post-mortem in the next few days once this incident is fully resolved and we have a remediation plan.

freedomben|4 months ago

Part of me hopes that we find out that Dynamo DB (which sounds like was the root of the cascading failures) is shipped in a Docker image which is hosted on Docker Hub :-D

tj_591|4 months ago

We’ve published an incident report outlining what happened and the steps we’re taking to strengthen resilience in the face of upstream service interruptions. - https://www.docker.com/blog/docker-hub-incident-report-octob...

tonyabracadabra|4 months ago

pls bring it back

atymic|4 months ago

Result of AWS outage https://news.ycombinator.com/item?id=45640754

reader_1000|4 months ago

> We have identified the underlying issue with one of our cloud service providers.

Isn't it everyone using multiple cloud providers nowadays? Why are they affected by single cloud provider outage?

ic4l|4 months ago

This broke our builds since we rely on several public Docker images, and by default, Docker uses docker.io.

Thankfully, AWS provides a docker.io mirror for those who can't wait:

  FROM public.ecr.aws/docker/library/{image_name}

In the error logs, the issue was mostly related to the authentication endpoint:

▪ https://auth.docker.io → "No server is available to handle this request"

After switching to the AWS mirror, everything built successfully without any issues.

CamouflagedKiwi|4 months ago

Mild irony that Docker is down because of the AWS outage, but the AWS mirror repos are still running...

kerblang|4 months ago

Also, docker.io is rate-limited, so if your organization experiences enough growth you will start seeing build failures on a regular basis.

Also, quay.io - another image hoster, from red hat - has been read-only all day today.

If you're going to have docker/container image dependencies it's best to establish a solid hosting solution instead of riding whatever bus shows up

firloop|4 months ago

I wasn't able to get this working, but I was able to use Google's mirror[0] just fine.

Just had to change

    FROM {image_name}

to

    FROM mirror.gcr.io/{image_name}

Hope this helps!

[0]: https://cloud.google.com/artifact-registry/docs/pull-cached-...

geostyx|4 months ago

public.ecr.aws was failing for me earlier with 5XX errors due to the AWS outage: https://news.ycombinator.com/item?id=45640754

anon7000|4 months ago

I manage a large build system and pulling from ECR has been flaking all day

KronisLV|4 months ago

I guess people who are running their own registries like Nexus and build their own container images from a common base image are feeling at least a bit more secure in their choice right now.

Wonder how many builds or redeployments this will break. Personally, nothing against Docker or Docker Hub of course, I find them to be useful.

yandie|4 months ago

It's actually an important practice to have a docker image cache in the middle. You never know if an upstream image is purged randomly from docker, and your K8s node gets replaced, and now can't pull the base image for your service.

Just engineering hygiene IMO.

tom1337|4 months ago

We are using base images but unfortunately some github actions are pulling docker images in their prepare phase - so while my application would build, I cannot deploy it because the CI/CD depends on dockerhub and you cannot change where these images are pulled from (so they cannot go through a pull-through cache)…

Sphax|4 months ago

We run Harbor and mirror every base image using its Proxy Cache feature, it's quite nice. We've had this setup for years now and while it works fine, Harbor has some rough edges.

nusl|4 months ago

Currently unable to do much of anything new in dev/prod environments without manual workarounds. I'd imagine the impact is pretty massive.

Asside; seems Signal is also having issues. Damn.

yread|4 months ago

That is nothing compared to how good i feel about not using containers at all.

jsmeaton|4 months ago

Guess where we host nexus..

frenkel|4 months ago

Only if they get their base images from somewhere else...

phillebaba|4 months ago

Shameless plug but this might be a good time to install Spegel in your Kubernetes clusters if you have critical dependencies on Docker Hub.

https://spegel.dev/

osivertsson|4 months ago

If it really is fully open-source please make that more visible on your landing page.

It is a huge deal if I can start investigating and deploying such a solution as a techie right away, compared to having to go through all the internal hoops for a software purchase.

storm1er|4 months ago

What's the difference with kuik? Spegel seems too complicated for my homelab, but could be a nice upgrade for my company

Kuik: https://github.com/enix/kube-image-keeper?tab=readme-ov-file...

CaptainOfCoit|4 months ago

There is a couple of alternatives that mirrors more than just Docker Hub too, most of them pretty bloated and enterprisey, but they do what they say on the tin and saved me more than once. Artifactory, Nexus Repository, Cloudsmith and ProGet are some of them.

mike-cardwell|4 months ago

This looks good, but we're using GKE and it looks like it only works there with some hacks. Is there a timeline to make it work with GKE properly?

unknown|4 months ago

[deleted]

unknown|4 months ago

[deleted]

theanonymousone|4 months ago

It's quite funny/interesting that this is higher in HN front page than the news of the AWS outage that caused it.

mcintyre1994|4 months ago

Not on the real secret front page! https://news.ycombinator.com/active :)

unknown|4 months ago

[deleted]

helpfulmandrill|4 months ago

I wonder if this is why I also can't log in to O'Reilly to do some "Docker is down, better find something to do" training...

p0w3n3d|4 months ago

Just install a pull-through proxy that will store all the packages recently used.

m463|4 months ago

this is by design

docker got requests to allow you to configure a private registry, but they selfishly denied the ability to do that:

https://stackoverflow.com/questions/33054369/how-to-change-t...

redhat created docker-compatible podman and lets you close that hole

/etc/config/docker: BLOCK_REGISTRY='--block-registry=all' ADD_REGISTRY='--add-registry=registry.access.redhat.com'

compootr|4 months ago

I still think this is an acceptable footgun (?) to have. The expressiveness of downloading an image tag with a domain included outweighs potential miscommunication issues.

For example, if you're on a team and you have documentation containing commands, but your docker config is outdated, you can accidentally pull from docker's global public registry.

A welcome change IMO would be removing global registries entirely, since it just makes it easier to tell where your image is coming from (but I severely doubt docker would ever consider this since it makes it fractionally easier to use their services)

scuff3d|4 months ago

This is a huge stretch.

Even if you could configure a default registry to point at something besides docker.io a lot of people, I'd say the vast majority, wouldn't have bothered. So they'd still be in the same spot.

And it's not hard to just tag images. I don't have a single image pulling from docker.io at work. Takes two seconds to slap <company-repo>/ at the front of the image name.

anon7000|4 months ago

Sadly doesn't help if you were using ECR in us-east-1 as your private registry. :(

darkamaul|4 months ago

For other people impacted, what helped me this morning was to use the `ghcr`, albeit this is not a one-to-one replacement.

Ex: `docker pull ghcr.io/linuxcontainers/debian-slim:latest`

TimWolla|4 months ago

That image is over one year old: https://github.com/linuxcontainers/debian-slim/pkgs/containe...

Google Container Registry provides a pull-through mirror, though, just prefix `mirror.gcr.io` and use `library` as the user for the Docker Official Images. For example `mirror.gcr.io/library/redis` for https://hub.docker.com/_/redis.

l2dy|4 months ago

Recovering as of October 20, 2025 09:43 UTC

> [Monitoring] We are seeing error rates recovering across our SaaS services. We continue to monitor as we process our backlog.

dd_xplore|4 months ago

Does it decrease the AWS's nine 9s ?

speedgoose|4 months ago

The marketing department did the maths and they said no.

jdthedisciple|4 months ago

So thus far today outages are reported from

- AWS

- Vercel

- Atlassian

- Cloudflare

- Docker

- Google (see downdetector)

- Microsoft (see downdetector)

What's going on?

ta1243|4 months ago

Or they all rely on AWS, because over the last 15 years we've built an extremely fragile interconnected global system in the pursuit of profit, austerity, and efficiency

d4rkp4ttern|4 months ago

Reddit appears to be only semi operational. Frequent “rate limit” errors and empty pares while just browsing. Not sure if related

throw-10-13|4 months ago

dns outage at aws exposing how overly centralized our infra is

wolfgangbabad|4 months ago

https://www.bbc.com/news/live/c5y8k7k6v1rt

2OEH8eoCRo0|4 months ago

The internet was designed to be fault tolerant and distributed from the beginning and we still ended up with a handful of mega hosts.

jabiko|4 months ago

Its impressive that even though registry-1.docker.io returned 503 errors they where able to keep a the metric "Docker Registry Uptime" at 100%.

lbruder|4 months ago

Well, the server was up, it was just returning HTTP 503...

unknown|4 months ago

[deleted]

unknown|4 months ago

[deleted]

wolfgangbabad|4 months ago

even reddit throws a lot of 503s when adding/editing comments

throw-10-13|4 months ago

reddit is always going down, thats the least surprising thing about this

sschueller|4 months ago

What are good proxy/mirror solutions to mitigate such issues? Best would be an all in one solution that for example also handles nodejs, packigist etc.

bravetraveler|4 months ago

Pulp is a popular project for 'one stop shop', I believe. Personally, always used project-specific solutions like 'distribution/distribution' for containers from the CNCF. This allows for pull-through caching with relatively little setup work.

cloudking|4 months ago

I'm fairly new to Docker. Do folks really rely on public images and registries for production systems? Seems like a brittle strategy.

edoceo|4 months ago

Yes, 1000s of orgs. Larger players might use a pull-through-cache - but it's not as common as it should be. Similar issue for other software-supply-chain (NPM, pyPi, etc)

conradfr|4 months ago

Is there a built-in way to bypass the request to the registry if your base layers are cached?

edoceo|4 months ago

pull: never?

danvesma|4 months ago

...well this explains a lot about how my morning is going...

Zekio|4 months ago

what good options are there for container registry proxies / caches to protect against something like this?

PhilipRoman|4 months ago

https://docs.docker.com/docker-hub/image-library/mirror/ ?

phillebaba|4 months ago

I build Spegel to keep my Kubernetes cluster running smoothly during an outage like this. https://spegel.dev/

gjvc|4 months ago

mirror.gcr.io is your friend

134 comments