(no title)
nzach | 10 days ago
I don't really see what problem this solves. If you have proper timeouts and circuit breakers in your service this shouldn't really matter. This solution will save a few hundred requests, but I don't think this really matters. If this is a pain point its easier to adjust the circuit-breaker settings (reduce the error rate, increase the window, ...) than introduce a whole new level of complexity.
> Curious how you handle the recovery side
We have a feature flag provider built in-house. But it doesn't support this use-case, so what we done is to create flag where we put the % value we want to bring back and handle the logic inside the service. Example: if you want to bring back 6,25% (1/16) of our users this means we should switch back every user that has an account-id ending in 'a'. For 12.5% (2/16) we want users with account-id ending either in 'a' or 'b'. This is a pretty hacky solution, but it solves our problem when we need to transition from our fallback to our main flow.
rodrigorcs|10 days ago
Each service discovering by their own is not really the main problem to be solved with my proposal, the thing is that by doing it locally, we lack observability and there is no way to act on them.
> what we done is to create flag where we put the % value we want to bring back
Oh I see, well that is indeed a good problem to solve. Openfuse does not do that gradual recovery but it would be possible to add.
Do you think that by having that feature and having the Openfuse solution self-hosted, it would be something you would give a try? Not trying to sell you anything, just gathering feedback so I can learn from the discussion.
By the way, if you don't mind, how often do you have to run that type of recovery?
nzach|10 days ago
No, I don't think this is compelling enough to try it at work.
> By the way, if you don't mind, how often do you have to run that type of recovery?
I would say we use this feature once every 3 months.