Not sure why this post is getting flagged to oblivion, but it is technically pretty interesting.
GPUs remain expensive, and their use is typically prioritized for training, and then inference is run on CPUs. This provides a cost effective way to attach GPU resources on demand to regular instances, rather than having to run dedicated GPU instances.
How is there not more interest in this? This is huge for anyone trying to bootstrap a machine learning business. If you need a 24/7 on prediction service the costs of full fledged GPUs can be prohibitive and a waste. This allows me to distribute my prediction across multiple nodes.
[+] [-] jedwhite|7 years ago|reply
GPUs remain expensive, and their use is typically prioritized for training, and then inference is run on CPUs. This provides a cost effective way to attach GPU resources on demand to regular instances, rather than having to run dedicated GPU instances.
[+] [-] thwy12321|7 years ago|reply
[+] [-] borramakot|7 years ago|reply
https://aws.amazon.com/machine-learning/inferentia/