It’s not a questions of being able to reverse. It’s a question of being able to diagnose that one of these changes even was the problem and if so which one.
Record changes in git and then git bisect issues, maybe?
Without change capture, solid regression testing, or observability, it seems difficult to manage these changes. I’d like to how others are managing these kinds of changes to readily troubleshoot them, without lots of regression testing or observability, if anyone has successes to share.
Suppose you run a fleet of a thousand machines. They all autotune. They are, lets say, serving cached video, or something.
You notice that your aggregate error rate been drifting upwards since using bpftune. It turns out, in reality, there is some complex interaction between the tuning and your routers, or your TOR switches, or whatever - there is feedback that causes oscillations in a tuned value, swinging between too high and too low.
Can you see how this is not a matter of simple deduction and rollbacks?
This scenario is plausible. Autotuning generally has issues with feedback, since the overall system lacks control theoretic structure. And the premise here is that you use this to tune a large number of machines where individual admin is infeasible.
yourapostasy|1 year ago
Without change capture, solid regression testing, or observability, it seems difficult to manage these changes. I’d like to how others are managing these kinds of changes to readily troubleshoot them, without lots of regression testing or observability, if anyone has successes to share.
pbhjpbhj|1 year ago
Your issue appears to be true for any system change. Although, risk will of course vary.
nehal3m|1 year ago
spenczar5|1 year ago
You notice that your aggregate error rate been drifting upwards since using bpftune. It turns out, in reality, there is some complex interaction between the tuning and your routers, or your TOR switches, or whatever - there is feedback that causes oscillations in a tuned value, swinging between too high and too low.
Can you see how this is not a matter of simple deduction and rollbacks?
This scenario is plausible. Autotuning generally has issues with feedback, since the overall system lacks control theoretic structure. And the premise here is that you use this to tune a large number of machines where individual admin is infeasible.
jstanley|1 year ago