(no title)
cletus | 9 months ago
More than a decade ago Google had to start managing their resource usage in data centers. Every project has a budget. CPU cores, hard disk space, flash storage, hard disk spindles, memory, etc. And these are generally convertible to each other so you can see the relative cost.
Fun fact: even though at the time flash storage was ~20x the cost of hard disk storage, it was often cheaper net because of the spindle bottleneck.
Anyway, all of these things can be turned into software engineer hours, often called "mili-SWEs" meaning a thousandth of the effort of 1 SWE for 1 year. So projects could save on hardware and hire more people or hire fewer people but get more hardware within their current budgets.
I don't remember the exact number of CPU cores amounted to a single SWE but IIRC it was in the thousands. So if you spend 1 SWE year working on optimization acrosss your project and you're not saving 5000 CPU cores, it's a net loss.
Some projects were incredibly large and used much more than that so optimization made sense. But so often it didn't, particularly when whatever code you wrote would probably get replaced at some point anyway.
The other side of this is that there is (IMHO) a general usability problem with the Web in that it simply shouldn't take the resources it does. If you know people who had to or still do data entry for their jobs, you'll know that the mouse is pretty inefficient. The old terminals from 30-40+ years ago that were text-based had some incredibly efficent interfaces at a tiny fraction of the resource usage.
I had expected that at some point the Web would be "solved" in the sense that there'd be a generally expected technology stack and we'd move on to other problems but it simply hasn't happened. There's still a "framework of the week" and we're still doing dumb things like reimplementing scroll bars in user code that don't work right with the mouse wheel.
I don't know how to solve that problem or even if it will ever be "solved".
mike_hearn|9 months ago
Google DID put a ton of effort into two other aspects of performance: latency, and overall machine utilization. Both of these were top-down directives that absorbed a lot of time and attention from thousands of engineers. The salary costs were huge. But, if you're machine constrained you really don't want a lot of cores idling for no reason even if they're individually cheap (because the opportunity cost of waiting on new DC builds is high). And if your usage is very sensitive to latency then it makes sense to shave milliseconds off because of business metrics, not hardware $ savings.
cletus|9 months ago
Likewise there have been many optimization projects and they used to call these out at TGIF. No idea if they still do. One I remember was reducing the health checks via UDP for Stubby and given that every single Google product extensively uses Stubby then even a small (5%? I forget) reduction in UDP traffic amounted to 50,000+ cores, which is (and was) absolutely worth doing.
I wouldn't even put latency in the same category as "performance optimization" because often you decrease latency by increasing resource usage. For example, you may send duplicate RPCs and wait for the fastest to reply. That could be double or tripling effort.
xondono|9 months ago
The evaluation needs to happen in the margins, even if it saves pennies/year on the dollar, it’s best to have those engineers doing that than have them idling.
The problem is that almost no one is doing it, because the way we make these decisions has nothing to do with the economical calculus behind, most people just do “what Google does”, which explains a lot of the disfunction.
bjourne|9 months ago
> The evaluation needs to happen in the margins, even if it saves pennies/year on the dollar, it’s best to have those engineers doing that than have them idling.
That's debatable. Performance optimization almost always lead to complexity increase. Doubled performance can easily cause quadrupled complexity. Then one has to consider whether the maintenance burden is worth the extra performance.
arp242|9 months ago
I think this probably holds true for outfits like Google because 1) on their scale "a core" is much cheaper than average, and 2) their salaries are much higher than average. But for your average business, even large businesses? A lot less so.
I think this is a classic "Facebook/Google/Netflix/etc. are in a class of their own and almost none of their practices will work for you"-type thing.
morepork|9 months ago
smikhanov|9 months ago
You can run a thought experiment imagining an alternative universe where human resource were directed towards optimization, and that alternative universe would look nothing like ours. One extra engineer working on optimization means one less engineer working on features. For what exactly? To save some CPU cycles? Don’t make me laugh.
karmakaze|9 months ago