This is a great write-up. As someone who has had to deal with backend subsetting a bunch in the past, this is such an interesting read.
One thing interesting about Google's scale and this architecture is that if you use classic firewalls that end up having connection tracking in them, you can easily overload that connection tracking table. M x k connections can balloon that connection count given the number of replicas some services have.
There are several points in the article where the examples didn't make sense at all to me. Overall an interesting article, but I'm either a bit dense this morning, or it's sloppy in the details and explanations
For instance, in table 3, it looks like they excluded backend tasks {0,1} (for frontend tasks {0, 1}) then {2,3} (for frontend tasks {2,3}) in the N=10 case, but backend tasks {1,2} then {3,4} in the N=11. Why the discrepancy? I get that it helps them make the point about task 3 changing subset, but it's inconsistent with excluding left-overs in a round-robin fashion presented in the previous paragraph.
Another sentence that I couldn't make sense of is: "If these [tasks 2 and 4] carry over to the subset of the next frontend task, you might get the shuffled backend tasks [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], but you can't assign backend task 2 to the same frontend task. " The "same frontend task" as what? Obviously note the one task 2 was already assigned to (the most intuitive reading to me), since precisely task 2 was not assigned and is a left-over. But then again, what does this mean?
Figure 3 is an (arbitrary) example of round-robin sunsetting with randomized sunset ordering. The point is to demonstrate how bad backend churn is with this algorithm, by inspecting a normal example of the decisions this algorithm makes.
> The "same frontend task" as what? [...] what does this mean?
It's not phrased great, but it's also tricky to communicate. My read is this: given backend shuffles [9, 1, 3, 0, 8, 6, 5, 7, 2, 4], [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], if you combine these shuffles to choose a backend assignment, you end up with subsets {9, 1, 3, 0}, {8, 6, 5, 7}, {2, 4, 7, 2}, {0, 8, 9, 1}, {4, 5, 3, 6}. That third subset means the third frontend only has a subset of three backends, even though you want it to have four.
The rest is reductio ad absurdum- reasoning through the ways you might fix this, and explaining why they in turn don't work. (I believe there's also an implicit assumption about the requirement that the final algorithm require no dynamic/runtime coordination, only static before-the-fact coordination amongst the front ends, i.e. agreement on a hash seed for a given subset, and say which hashing strategy the front ends would use).
> For instance, in table 3, it looks like they excluded backend tasks {0,1} (for frontend tasks {0, 1}) then {2,3} (for frontend tasks {2,3}) in the N=10 case, but backend tasks {1,2} then {3,4} in the N=11. Why the discrepancy?
With N = 10, there will be N mod k = 10 mod 4 = 2 leftover tasks, and so the round-robin fashion excludes {0, 1} then {2, 3}.
However for N = 11, there will be N mod k = 11 mod 4 = 3 leftover tasks, so the round-robin fashion excludes {0, 1, 2} then {3, 4, 5}.
But as joatmon-snoo correctly said, the more important point is demonstrating how bad backend churn is with this algorithm.
Subsetting is literally part of load balancing. It literally means to ask the client to choose a subset of the available servers and load balance among them. At Google's scale of RPC systems, there would be way too many connections if every client connects to every server.
I thought this was going to be Google automating shutting down their own products. Might be a good idea. Just start a clock for 18 months, shut down the service automatically, and reward the relevant executive with a bonus.
>"Wankadia" is an intriguing surname, combining as it does "Arcadia", the ancient Greek idea of a rural utopia and "Wank".
This[0]:
CALIFORNIA: From Latin 'calor', meaning "heat" (as in English 'calorie' or
Spanish 'caliente'); and 'fornia', for "sexual intercourse" or "fornication."
Hence: Tierra de California, "the land of hot sex."
Should be right up your alley (yes, there's a double entendre there), friend. :)
Sadly, your sense of humor (and mine, for that matter) isn't very popular around here. And more's the pity.
[+] [-] kyrra|3 years ago|reply
This is a great write-up. As someone who has had to deal with backend subsetting a bunch in the past, this is such an interesting read.
One thing interesting about Google's scale and this architecture is that if you use classic firewalls that end up having connection tracking in them, you can easily overload that connection tracking table. M x k connections can balloon that connection count given the number of replicas some services have.
If you like write-ups like this, there is a great 2016 acm paper by Google on how it does source control: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...
[+] [-] bushbaba|3 years ago|reply
[+] [-] kerneis|3 years ago|reply
For instance, in table 3, it looks like they excluded backend tasks {0,1} (for frontend tasks {0, 1}) then {2,3} (for frontend tasks {2,3}) in the N=10 case, but backend tasks {1,2} then {3,4} in the N=11. Why the discrepancy? I get that it helps them make the point about task 3 changing subset, but it's inconsistent with excluding left-overs in a round-robin fashion presented in the previous paragraph.
Another sentence that I couldn't make sense of is: "If these [tasks 2 and 4] carry over to the subset of the next frontend task, you might get the shuffled backend tasks [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], but you can't assign backend task 2 to the same frontend task. " The "same frontend task" as what? Obviously note the one task 2 was already assigned to (the most intuitive reading to me), since precisely task 2 was not assigned and is a left-over. But then again, what does this mean?
[+] [-] joatmon-snoo|3 years ago|reply
Figure 3 is an (arbitrary) example of round-robin sunsetting with randomized sunset ordering. The point is to demonstrate how bad backend churn is with this algorithm, by inspecting a normal example of the decisions this algorithm makes.
> The "same frontend task" as what? [...] what does this mean?
It's not phrased great, but it's also tricky to communicate. My read is this: given backend shuffles [9, 1, 3, 0, 8, 6, 5, 7, 2, 4], [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], if you combine these shuffles to choose a backend assignment, you end up with subsets {9, 1, 3, 0}, {8, 6, 5, 7}, {2, 4, 7, 2}, {0, 8, 9, 1}, {4, 5, 3, 6}. That third subset means the third frontend only has a subset of three backends, even though you want it to have four.
The rest is reductio ad absurdum- reasoning through the ways you might fix this, and explaining why they in turn don't work. (I believe there's also an implicit assumption about the requirement that the final algorithm require no dynamic/runtime coordination, only static before-the-fact coordination amongst the front ends, i.e. agreement on a hash seed for a given subset, and say which hashing strategy the front ends would use).
[+] [-] flowblok|3 years ago|reply
With N = 10, there will be N mod k = 10 mod 4 = 2 leftover tasks, and so the round-robin fashion excludes {0, 1} then {2, 3}. However for N = 11, there will be N mod k = 11 mod 4 = 3 leftover tasks, so the round-robin fashion excludes {0, 1, 2} then {3, 4, 5}.
But as joatmon-snoo correctly said, the more important point is demonstrating how bad backend churn is with this algorithm.
[+] [-] kerneis|3 years ago|reply
[+] [-] fh973|3 years ago|reply
Thanks for confirming that this is still the case.
[+] [-] tuananh|3 years ago|reply
[+] [-] kccqzy|3 years ago|reply
[+] [-] detaro|3 years ago|reply
[+] [-] wilg|3 years ago|reply
[+] [-] postsantum|3 years ago|reply
[+] [-] sieabahlpark|3 years ago|reply
[deleted]
[+] [-] w0mbat|3 years ago|reply
[deleted]
[+] [-] nobody9999|3 years ago|reply
This[0]:
Should be right up your alley (yes, there's a double entendre there), friend. :)Sadly, your sense of humor (and mine, for that matter) isn't very popular around here. And more's the pity.
[0] http://www.quotationspage.com/quote/23088.html