top | item 37069409

(no title)

junrushao1994 | 2 years ago

yeah we tried out popular solutions like exllama and llama.cpp among others that support inference of 4bit quantized models

discuss

No comments yet.