(no title)
dastbe | 5 days ago
For general reliability, you can create partitions of checkers and use quorum across partitions to determine what the health state is for a given dest. This also enables centralized monitoring to detect systemic issues with bad healthcheck configuration changes (i.e. are healthchecks failing because the service is unhealthy or because of a bad healthchecker?)
In industry, I personnaly know AWS has one or two health-check-as-a-service systems that they are using internally for LBs and DNS. Uber runs its own health-check-as-a-service system which it integrates with its managed proxy fleet as well as p2p discovery. IIRC Meta also has a system like this for at least some things? But maybe I'm misremembering.
No comments yet.