top | item 30109263

(no title)

headlessvictim2 | 4 years ago

Thanks for the suggestion.

How would this work with GPU-bound machine learning models?

The model processing takes > 30 seconds and would still represent the bottleneck?

discuss

order

pjgalbraith|4 years ago

You would still have the same bottleneck but the API request would return straight away with some sort of correllation ID. Then the workers that handle the GPU bound tasks would pull jobs when they are ready. If you get a lot of jobs all that will happen is the queue will fill up and the clients will wait longer and hit the status endpoint a few more times.

Here is an example of what it could look like: https://docs.microsoft.com/en-us/azure/architecture/patterns...

headlessvictim2|4 years ago

Thanks for the explanation.

Right now, we use ELB (Elastic Load Balancer) to sit in front of multiple GPU instances.

Is this sufficient or do you suggest adding Celery into this architecture?