Python and PyTorch all call out to C libraries… I don’t get what he means by “proving LLMs can run without Python and PyTorch” at all. Seems like they don’t understand basic fundamentals about things here…
llama.cpp being the best choice doesn't make it popular.
When I got started, I was led to ollama and other local-llm freemium.
I didn't necessarily assume that they weren't c++(I don't even know) but I do think that –as implied– Python duct-tape solutions are more popular than llama.cpp.
I imagine so regarding GPUs, right? Is this is a legitimate project then doesn’t it provide a proof of concept for performance constraints that relate to them? Couldn't the environmentally concerned take this as an indicator that the technology can progress without relying on as much energy is potentially spent now? Shouldn’t researchers in the industry be thinking of ways to prevent the future capabilities of the technology from outrunning the capacity of the infrastructure?
I know very little about AI but these are things that come to mind here for me.
GPUs are more efficient than CPUs for LLM inference, using less energy per token and being cheaper overall. Yes, a single data center GPU draws a lot of power and costs a fortune, but it can also serve a lot more people in the time your CPU or consumer GPU needs to respond to a single prompt.
jdefr89|1 month ago
jasonjmcghee|1 month ago
avadodin|1 month ago
When I got started, I was led to ollama and other local-llm freemium.
I didn't necessarily assume that they weren't c++(I don't even know) but I do think that –as implied– Python duct-tape solutions are more popular than llama.cpp.
christianqchung|1 month ago
skybrian|1 month ago
kgeist|1 month ago
tolerance|1 month ago
I know very little about AI but these are things that come to mind here for me.
yorwba|1 month ago