top | item 41576897

(no title)

gwenzek | 1 year ago

It' s a bit early to compare directly to TensorRT because we don't have a full-blown equivalent.

Note that our focus is being platform agnostic, easy to deploy/integrate, good performance all-around, and ease of tweaking. We are using the same compiler than Jax, so our performances are on par. But generally we believe we can gain on overall "tok/s/$" by having shorter startup time, choosing the most efficient hardware available, and easily implementing new tricks like multi-token prediction.

discuss

order

No comments yet.