Reinventing backend subsetting at Google

[+] kyrra|3 years ago|reply

Googler, options are my own.

This is a great write-up. As someone who has had to deal with backend subsetting a bunch in the past, this is such an interesting read.

One thing interesting about Google's scale and this architecture is that if you use classic firewalls that end up having connection tracking in them, you can easily overload that connection tracking table. M x k connections can balloon that connection count given the number of replicas some services have.

If you like write-ups like this, there is a great 2016 acm paper by Google on how it does source control: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

[+] bushbaba|3 years ago|reply

Conntrack tables filling up are also a challenge with k8s at very, very, very large scale.

[+] kerneis|3 years ago|reply

There are several points in the article where the examples didn't make sense at all to me. Overall an interesting article, but I'm either a bit dense this morning, or it's sloppy in the details and explanations

For instance, in table 3, it looks like they excluded backend tasks {0,1} (for frontend tasks {0, 1}) then {2,3} (for frontend tasks {2,3}) in the N=10 case, but backend tasks {1,2} then {3,4} in the N=11. Why the discrepancy? I get that it helps them make the point about task 3 changing subset, but it's inconsistent with excluding left-overs in a round-robin fashion presented in the previous paragraph.

Another sentence that I couldn't make sense of is: "If these [tasks 2 and 4] carry over to the subset of the next frontend task, you might get the shuffled backend tasks [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], but you can't assign backend task 2 to the same frontend task. " The "same frontend task" as what? Obviously note the one task 2 was already assigned to (the most intuitive reading to me), since precisely task 2 was not assigned and is a left-over. But then again, what does this mean?

[+] joatmon-snoo|3 years ago|reply

> in table 3

Figure 3 is an (arbitrary) example of round-robin sunsetting with randomized sunset ordering. The point is to demonstrate how bad backend churn is with this algorithm, by inspecting a normal example of the decisions this algorithm makes.

> The "same frontend task" as what? [...] what does this mean?

It's not phrased great, but it's also tricky to communicate. My read is this: given backend shuffles [9, 1, 3, 0, 8, 6, 5, 7, 2, 4], [7, 2, 0, 8, 9, 1, 4, 5, 3, 6], if you combine these shuffles to choose a backend assignment, you end up with subsets {9, 1, 3, 0}, {8, 6, 5, 7}, {2, 4, 7, 2}, {0, 8, 9, 1}, {4, 5, 3, 6}. That third subset means the third frontend only has a subset of three backends, even though you want it to have four.

The rest is reductio ad absurdum- reasoning through the ways you might fix this, and explaining why they in turn don't work. (I believe there's also an implicit assumption about the requirement that the final algorithm require no dynamic/runtime coordination, only static before-the-fact coordination amongst the front ends, i.e. agreement on a hash seed for a given subset, and say which hashing strategy the front ends would use).

[+] flowblok|3 years ago|reply

> For instance, in table 3, it looks like they excluded backend tasks {0,1} (for frontend tasks {0, 1}) then {2,3} (for frontend tasks {2,3}) in the N=10 case, but backend tasks {1,2} then {3,4} in the N=11. Why the discrepancy?

With N = 10, there will be N mod k = 10 mod 4 = 2 leftover tasks, and so the round-robin fashion excludes {0, 1} then {2, 3}. However for N = 11, there will be N mod k = 11 mod 4 = 3 leftover tasks, so the round-robin fashion excludes {0, 1, 2} then {3, 4, 5}.

But as joatmon-snoo correctly said, the more important point is demonstrating how bad backend churn is with this algorithm.

[+] kerneis|3 years ago|reply

Other typos: Bbackend; missing math formula in "(for a frontend task m, this is )" — should be ⎣m/L⎦ I guess. Proofreading at ACM seems to be lacking.

[+] fh973|3 years ago|reply

> Google services run on Borg, the company's cluster management software.

Thanks for confirming that this is still the case.

[+] tuananh|3 years ago|reply

why do they call it "subsetting" vs "load balancing"? is there difference between the 2 terms?

[+] kccqzy|3 years ago|reply

Subsetting is literally part of load balancing. It literally means to ask the client to choose a subset of the available servers and load balance among them. At Google's scale of RPC systems, there would be way too many connections if every client connects to every server.

[+] detaro|3 years ago|reply

See the linked chapter from the SRE book: https://sre.google/sre-book/load-balancing-datacenter/#limit...

[+] wilg|3 years ago|reply

I thought this was going to be Google automating shutting down their own products. Might be a good idea. Just start a clock for 18 months, shut down the service automatically, and reward the relevant executive with a bonus.

[+] postsantum|3 years ago|reply

Sunsetting-based promotion cycle. Manager dies with the product only to be reborn at the higher step of the ladder

[+] sieabahlpark|3 years ago|reply

[deleted]

[+] w0mbat|3 years ago|reply

[deleted]

[+] nobody9999|3 years ago|reply

>"Wankadia" is an intriguing surname, combining as it does "Arcadia", the ancient Greek idea of a rural utopia and "Wank".

This[0]:

   CALIFORNIA: From Latin 'calor', meaning "heat" (as in English 'calorie' or 
   Spanish 'caliente'); and 'fornia', for "sexual intercourse" or "fornication." 
   Hence: Tierra de California, "the land of hot sex."

Should be right up your alley (yes, there's a double entendre there), friend. :)

Sadly, your sense of humor (and mine, for that matter) isn't very popular around here. And more's the pity.

[0] http://www.quotationspage.com/quote/23088.html

21 comments