top | item 39451236

(no title)

I assume you are referring to Llama 2? Is there a way to compare models? e.g. what is Llama-7b equivalent to in OpenAI land? Perplexity scores?

Also, does ChatGPT use GPT 4 under the hood or 3.5?

discuss

Tiberium|2 years ago

Actually, there have been new model releases after LLaMA 2. For example, for small models Mistral 7B is simply unbeatable, with a lot of good fine-tunes available for it.

Usually people compare models with all the different benchmarks, but of course sometimes models get trained on benchmark datasets, so there's no true way of knowing except if you have a private benchmark or just try the model yourself.

I'd say that Mistral 7B is still short of gpt-3.5-turbo, but Mixtral 7x8B (the Mixture-of-Experts one) is comparable. You can try them all at https://chat.lmsys.org/ (choose Direct Chat, or Arena side-by-side)

ChatGPT is a web frontend - they use multiple models and switch them as they create new ones. Currently, the free ChatGPT version is running 3.5, but if you get ChatGPT Plus, you get (limited by messages/hour) access to 4, which is currently served with their GPT-4-Turbo model.

mark_l_watson|2 years ago

I agree with your comments and want to add re: benchmarks: I don’t pay too much attention to benchmarks, but I have the advantage of now being retired so I can spend time experimenting with a variety of local models I run with Ollama and commercial offerings. I spend time to build my own, very subjective, views of what different models are good for. One kind of model analysis that I do like are the circle displays on Hugging Face that show how a model benchmarks for different capabilities (word problems, coding, etc.)

tarruda|2 years ago

> Is there a way to compare models?

This is what I like to use for comparing models: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

It is an ELO system based on users voting LLM answers to real questions

> what is Llama-7b equivalent to in OpenAI land?

I don't think Llama 7b compares with OpenAI models, but if you look in the rank I linked above, there are some 7B models which rank higher than early versions of GPT 3.5. those models are Mistral 7b fine tunes.

int_19h|2 years ago

Miqu (the leaked large Mistral model) and its finetunes seem to be the most coherent currently, and I'd say they beat GPT-3.5 handily.

There are no models comparable to GPT-4, open source or not. Not even close.

dkarras|2 years ago

no it's mistral. mistral 7b and mixtral 8x7b MoE which is almost on par (or better than) chatgpt 3.5. Mistral 7b itself packs a punch as well.

mark_l_watson|2 years ago

Mixtral 8x7b continues to amaze me, even though I have to run it with 3 bit quantization on my Mac (I just have 32G memory). When I run this model on commercial services with 4 or more bits of quantization I definitely notice, subjectively, better results.

I like to play around with smaller models and regular app code in Common Lisp or Racket, and Mistral 7b is very good for that. Mixing and matching old fashioned coding with the NLP, limited world knowledge, and data manipulation capabilities of LLMs.

guappa|2 years ago

llama 2 isn't open source