top | item 46817562

(no title)

make3 | 1 month ago

There's a million algorithms to make LLM inference more efficient as a tradeoff for performance, like using a smaller model, using quantized models, using speculative decoding with a more permissive rejection threshold, etc etc

discuss

No comments yet.