top | item 44439167

(no title)

cotran2 | 8 months ago

The model is compact 1.5B, most GPUs can serve it locally and has <100ms e2e latency. For L40s, its 50ms.

discuss

order

No comments yet.