top | item 36216106 (no title) nivekney | 2 years ago On a similar thread, how does it compare to Hippoml?Context: https://news.ycombinator.com/item?id=36168666 discuss order hn newest brucethemoose2|2 years ago We don't necessarily know... Hippo is closed source for now.Its comparable to Apache TVM's vulkan in speed on cuda, see https://github.com/mlc-ai/mlc-llmBut honestly, the biggest advantage of llama.cpp for me is being able to split a model so performantly. My puny 16GB laptop can just barely, but very practically, run LLaMA 30B at almost 3 tokens/s, and do it right now. That is crazy! smiley1437|2 years ago >> run LLaMA 30B at almost 3 tokens/sPlease tell me your config! I have an i9-10900 with 32GB of ram that only gets .7 tokens/s on a 30B model load replies (3)
brucethemoose2|2 years ago We don't necessarily know... Hippo is closed source for now.Its comparable to Apache TVM's vulkan in speed on cuda, see https://github.com/mlc-ai/mlc-llmBut honestly, the biggest advantage of llama.cpp for me is being able to split a model so performantly. My puny 16GB laptop can just barely, but very practically, run LLaMA 30B at almost 3 tokens/s, and do it right now. That is crazy! smiley1437|2 years ago >> run LLaMA 30B at almost 3 tokens/sPlease tell me your config! I have an i9-10900 with 32GB of ram that only gets .7 tokens/s on a 30B model load replies (3)
smiley1437|2 years ago >> run LLaMA 30B at almost 3 tokens/sPlease tell me your config! I have an i9-10900 with 32GB of ram that only gets .7 tokens/s on a 30B model load replies (3)
brucethemoose2|2 years ago
Its comparable to Apache TVM's vulkan in speed on cuda, see https://github.com/mlc-ai/mlc-llm
But honestly, the biggest advantage of llama.cpp for me is being able to split a model so performantly. My puny 16GB laptop can just barely, but very practically, run LLaMA 30B at almost 3 tokens/s, and do it right now. That is crazy!
smiley1437|2 years ago
Please tell me your config! I have an i9-10900 with 32GB of ram that only gets .7 tokens/s on a 30B model