top | item 44439167 (no title) cotran2 | 8 months ago The model is compact 1.5B, most GPUs can serve it locally and has <100ms e2e latency. For L40s, its 50ms. discuss order hn newest No comments yet.
No comments yet.