top | item 39689324

(no title)

BMSR | 2 years ago

I'm also learning. The models get more accurate when they have more parameters, say 7b (7 billion parameters) vs 8x7b (56 billion parameters). They also take more time and resources at higher parameters. TheBloke at Huggingface uploads quantized models, which means they can run on lower spec computers but with a possible hit on quality, he offers multiple configurations per model depending on what you prefer. Big models can be too heavy and slow, the sweet spot is probably something like 13b. You can try different gguf models with this program: https://github.com/madprops/meltdown

discuss

lkrubner|2 years ago

Thank you for this. It's a good start. But also, there is so much on Hugging Face, how does anyone evaluate it all? It's not possible to personally test everything, and develop good intuitions about what might be useful in particular situations.

As an analogy, 10 years ago we had a lot of debates on Hacker News about various languages and frameworks: PHP versus Python and Rails versus Django. But where do I go nowadays for similar discussions about all of the tooling that is springing up around the AI and LLM and NLP space?

BMSR|2 years ago

I found this thread useful: https://old.reddit.com/r/LocalLLaMA/comments/17fhp9k/huge_ll...

It compares different kinds of models at several tiers.

If you find a model interesting you can then look for it on HF.