top | item 46808854 (no title) anon373839 | 1 month ago I have seen ~1,300 tokens/sec of total throughout with Llama 3 8B on a MacBook Pro. So no, you don’t halve the performance. But running batched inference takes more memory, so you have to use shorter contexts than if you weren’t batching. discuss order hn newest No comments yet.
No comments yet.