The self-service unpause is brilliant. The worst thing about hitting these sorts of limits is that time window when you think you've fixed the problem but you can't check because you're throttled - so there's nothing you can do but wait. Giving literally any affordance so that a human can make progress with a fix removes this huge source of frustration.
My server got renewal halted. I rolled my own wrapper for certbot. Idk it's just a blog, I'm not that attached. It hit some rock a few months ago, I just retried and manually installed it, and it seems to have perked back up and continued receiving certs. Probably would have been more frustrating if it were a huge fleet but, it wasn't even worth my time to check logs and figure out what precisely happened (cert distributed with a modified that didn't match the ASN.1 expiry? transient issuance failure? issues the same cert? ...who knows.)
Looking at the relevant limit, "Consecutive Authorization Failures per Hostname per Account"[0], it looks like there's no way to hit that specific limit if you only run once per day.
Ah, to think how many cronjobs are out there running certbot on * * * * *!
Happy to be running Caddy on a growing number of servers instead of renewing certs through certbot. Caddy has really good defaults and does the right thing with TLS certs without much hassle. Less moving parts too.
Sure, and they must have already emailed the person when they failed to get a new cert before their last one expired. But I suspect a lot of people don't use a real email address for LE, since there's no enforcement/verification. Or they might be using one that isn't their main one.
Thanks for all the work that goes into this crucial service!
3% and "3,200 people manually unpaused issuance" does seem much higher than expected to me and no cause for celebration, especially at this scale.
Are there no better patterns to be exploited to identify 'zombies'? Running experiments with blocking and then unblocking to validate should work here.
I guess this falls into the bucket of: sure we can do that, given sufficient time and resources
Why do you think that this indicates a problem in identifying zombies? The pause may have simply been the reason that someone became aware there was even a problem. The zombie might have persisted, if it hadn't been paused.
A working domain needs one validation every ~60 days, but these zombie domains sound like they’re making multiple requests per hour (per the article, twice daily would still take 10 years to hit the limit) which is a massively disproportionate amount of resources.
Does the Unpause button have a CAPTCHA, because it's only a matter of time when software will try to auto-unpause if there's a failure... and the cycle repeats. Hence CAPTCHA on the button should at least discourage software devs from automating the process of unpausing.
No, I don't think that will happen at large because there's no good reason for it.
If this is the error that you're getting, then hitting unpause won't make the certificate requests start working. You'll just go back to receiving the persistent error messages from before the pause.
What do you gain by automating it? This isn't an error that you'll experience in day-to-day successful operation. It's not an error that reoccurs after resolution because it can be removed for years with one action. This lock will only happen if a cert request is consistently broken for a really long time.
Fixing the underlying cause of the cert issuance failures requires human intervention anyway, a human can easily click the button. They also provide first-class support for bulk enablement.
The motivations for automating button are extremely small.
cibyr|9 months ago
philjohn|9 months ago
efitz|9 months ago
meltyness|9 months ago
globie|9 months ago
Looking at the relevant limit, "Consecutive Authorization Failures per Hostname per Account"[0], it looks like there's no way to hit that specific limit if you only run once per day.
Ah, to think how many cronjobs are out there running certbot on * * * * *!
[0]: https://letsencrypt.org/docs/rate-limits/#consecutive-author...
aorth|9 months ago
NicolaiS|9 months ago
Caddy even supports 'ACME profiles' for people that want to follow the latest recommendation from CAB / want shortlived certs
greatgib|9 months ago
Macha|9 months ago
https://letsencrypt.org/2025/01/22/ending-expiration-emails/
xp84|9 months ago
TonyTrapp|9 months ago
undebuggable|9 months ago
smallnix|9 months ago
3% and "3,200 people manually unpaused issuance" does seem much higher than expected to me and no cause for celebration, especially at this scale.
Are there no better patterns to be exploited to identify 'zombies'? Running experiments with blocking and then unblocking to validate should work here.
I guess this falls into the bucket of: sure we can do that, given sufficient time and resources
tux1968|9 months ago
Palomides|9 months ago
wolfgang42|9 months ago
jadbox|9 months ago
RadiozRadioz|8 months ago
If this is the error that you're getting, then hitting unpause won't make the certificate requests start working. You'll just go back to receiving the persistent error messages from before the pause.
What do you gain by automating it? This isn't an error that you'll experience in day-to-day successful operation. It's not an error that reoccurs after resolution because it can be removed for years with one action. This lock will only happen if a cert request is consistently broken for a really long time.
Fixing the underlying cause of the cert issuance failures requires human intervention anyway, a human can easily click the button. They also provide first-class support for bulk enablement.
The motivations for automating button are extremely small.
tough|8 months ago
unknown|9 months ago
[deleted]
lakomen|9 months ago
[deleted]
saagarjha|9 months ago