(no title)
cxie | 1 year ago
The real test will be inference latency and throughput on consumer hardware, not just the cherry-picked benchmark graphs they've shared. Anyone run comparative evals against Llama 3.2 3B or Gemma-2 on identical hardware yet?
The fully open approach (weights, hyperparams, training code) is refreshing compared to the "open weights only" trend we've been seeing. This is how you actually build a community around your tech stack.
Edge deployment is where this gets interesting - having truly open small models running locally on laptops/phones/embedded without phoning home feels like the computing paradigm we should have been pushing for all along instead of the current API-gated centralization.
No comments yet.