Besides the python implementations, we also implemented a standalone C++ implementation that runs locally with just CPU simd https://github.com/google/gemma.cpp
Are there any cool highlights you can give us about gemma.cpp? Does it have any technical advantages over llama.cpp? It looks like it introduces its own quantization format, is there a speed or accuracy gain over llama.cpp's 8-bit quantization?
austinvhuang|2 years ago
Besides the python implementations, we also implemented a standalone C++ implementation that runs locally with just CPU simd https://github.com/google/gemma.cpp
tveita|2 years ago
kathleenfromgdm|2 years ago
aphit|2 years ago