> Autoscaling is configured via CloudWatch alarms on CPU usage:
> Scale-out policy adds workers when CPU > 30%.
> Scale-in policy removes idle workers when CPU < 20%.
Does this handle the case where there are longer-running activities that have low CPU usage? Couldn't these be canceled during scalein?
Temporal would retry them, but it would make some workflow runs take longer, which could be annoying for some user-interactive workflows.
Otherwise I've seen needing to hit the metrics endpoint to query things like `worker_task_slots_available` to scale up, or query pending activities, pending workflows, etc to scale down per worker.
They can be cancelled if CPU drops below the scale-in threshold.
In my case the activities were CPU-heavy, batch-style, and not client-facing — so preferred occasional retries and slightly longer runtimes over blowing up the AWS bill. For that workload, CPU-based autoscaling was perfectly fine.
I originally ran this setup on Temporal Cloud, and pulling detailed worker/queue metrics directly from Cloud can be tricky... you need to expose custom worker metrics yourself, then pipe them into CloudWatch. If you host Temporal yourself, it is easier:)
Whats funny is in some sense, temporal replaces alot of the AWS stack. You dont really need queues, step functions, lambdas, and the rest. I personally think its a better compute model than the wildly complicated AWS infra. Deploying temporal on compute primitives is simply better, and allows for you to be cloud agnostic.
I sometimes suspect AWS deliberately looks for ways to extract low-overhead tasks into dedicated services for the simple reason that many people will pay for the service without thinking about whether they really need it.
This article is really about hosting Temporal _workers_ in ECS - which is the "easy" part - not running the Temporal service itself. That would be a valuable follow-up!
99.9% sure the entire article was written by Claude or ChatGPT - so you can probably direct that question at the source. Make sure to end your prompt with, "no emojis"
We went with Fargate because it keeps things lean — no servers to manage, no patching, no scaling headaches. It’s perfect for our bursty workloads, since we only pay when containers actually run . Plus autoscaling just works .
In the github you can find comments to easily switch to EC2 if your workload needs it
jarboot|3 months ago
Does this handle the case where there are longer-running activities that have low CPU usage? Couldn't these be canceled during scalein?
Temporal would retry them, but it would make some workflow runs take longer, which could be annoying for some user-interactive workflows.
Otherwise I've seen needing to hit the metrics endpoint to query things like `worker_task_slots_available` to scale up, or query pending activities, pending workflows, etc to scale down per worker.
norapap|3 months ago
I originally ran this setup on Temporal Cloud, and pulling detailed worker/queue metrics directly from Cloud can be tricky... you need to expose custom worker metrics yourself, then pipe them into CloudWatch. If you host Temporal yourself, it is easier:)
llmslave|3 months ago
causal|3 months ago
jen20|3 months ago
norapap|3 months ago
whalesalad|3 months ago
unknown|3 months ago
[deleted]
DoofWarrior|3 months ago
norapap|3 months ago
In the github you can find comments to easily switch to EC2 if your workload needs it