I'm afraid that, unlike proprietary APIs and Petals, this system can't be used for single-batch inference of 175B models with interactive speeds - the thing you actually need for running ChatGPT and other interactive LM apps. See https://news.ycombinator.com/item?id=34874976
No comments yet.