top | item 39043399

(no title)

vault_ | 2 years ago

Autoscaling seems like a downstream concern from the techniques being discussed here. Autoscaling tends to have a pretty high latency, so you still need a strategy for being overloaded while that extra capacity comes online. There's also a question of how the autoscaler knows what "load" is and when it's "too high." Just going off of CPU/memory usage probably means you're over-provisioning. Instead, if you have back-pressure or load-shedding built into your system you can use those as signals to the autoscaler.

discuss

EdwardDiego|2 years ago

Autoscaling is great, if you solve the problems you rightly mention.

But IMO it's best viewed not as a technique to increase capacity that risks overprovisioning, but rather it should be viewed as a technique to significantly reduce the overprovisioning you were already likely doing to provide capacity that could handle peaks in demand without blowing through delivery expectations (e.g., timeliness, data loss minimisation, etc.)

At an old employer, our load was seasonal over the day. If one instance of an app could handle N req/s, and the daily peak maxed out at 100N req/s, then we had to run 100 instances as a minimum (we usually chucked some extra capacity in there for surprises) even if the mean daily peak was 75N req/s.

And of course, at the times of the day when incoming reqs/s was 0.5N reqs/s, well, we still had 99 instances twiddling their thumbs.

And then there were the days when suddenly we're hitting 200N req/s because Germany made the World Cup quarter-finals, and things are catching fire and services are degraded in a way that customers notice, and it becomes an official Bad Thing That Must Be Explained To The CEO.

So when we reached a point in our system architecture (which took a fair bit of refactoring) where we could use autoscaling, we saved soooo much money, and had far fewer Bad Thing Explanations to do.

We had always been massively overprovisioned for 20 hours of the day, and often still overprovisioned for the other 4, but we weren't overprovisioned enough for black swans, it was the worst of both worlds.

(Although we kept a very close eye on Germany's progress in the football after that first World Cup experience)

You're spot on that

a) to autoscale up effectively we had to minimise the time an instance took to go from cold to hot, so focused a lot on shared caches being available to quickly to populate in-memory caches

b) adding new hardware instances was always going to take longer than adding new app instances, so we had to find some balance in how we overprovisioned hardware capacity to give us breathing room for scaling without wasting too much money and

c) we found significant efficiencies in costs and time to scale by changing the signals used to scale after starting out using CPU/mem.

Also a significant learning curve for our org was realising that we needed to ensure we didn't scale down too aggressively, especially the hardware stuff that scaled down far faster than it scaled up.

We hit situations where we'd scale down after a peak had ended, then shortly after along came another peak, so all the capacity we'd just dynamically removed had to be added back, with the inherent speed issues you mentioned, causing our service to be slow and annoying for customers, with minimal savings while capacity was trampolining.

(This incidentally can be really problematic in systems where horizontal scaling can introduce a stop the world pause across multiple instances of an app.

Anything that uses Kafka and consumer groups is particularly prone to this, as membership change in the group pauses all members of the CG while partitions are reallocated, although later versions of Kafka with sticky assignors have improved this somewhat. But yeah, very critical to stop these kinda apps from trampolining capacity if you want to keep data timeliness within acceptable bounds.)

It took a lot of tuning to get all of it right, but when we did, the savings were spectacular.

I think the CTO worked out that it only took six months of the reduced AWS costs to equal the cost of the two years of system refactoring needed to get to that point, and after that, it was all ongoing cream for the shareholders.

And while I get the hate people have for unnecessary usage of K8s (like Kafka, it's a complex solution for complicated problems and using it unnecessarily is taking on a whole lot of complexity for no gain), it was perfect for our use case, the ability to tune how HPAs scale down, being able to scale on custom metrics, it was just brilliant.

(I wish I could end with "And the company reinvested a significant proportion of the savings into growth and gave us all big fat bonuses for saving so much money", but haha, no. The CFO did try to tell us we'd been unnecessarily wasteful prior and should have just built a system that was created in 2007 like the 2019 version from the start, because apparently a lot of MBA schools have an entrance requirement of psychopathy and then to graduate you have to swear a bloodpact with the cruel and vicious God of Shareholder Value)