top | item 39844574

(no title)

brucethemoose2 | 1 year ago

I would note the actual leading models right now (IMO) are:

- Miqu 70B (General Chat)

- Deepseed 33B (Coding)

- Yi 34B (for chat over 32K context)

And of course, there are finetunes of all these.

And there are some others in the 34B-70B range I have not tried (and some I have tried, like Qwen, which I was not impressed with).

Point being that Llama 70B, Mixtral and Grok as seen in the charts are not what I would call SOTA (though mixtral is excellent for the batch size 1 speed)

discuss

order

jph00|1 year ago

Miqu is a leaked model -- no license is provided to use it. Yi 34B doesn't allow commercial use. Deepseed 33B isn't much good at stuff outside of coding.

So it's fair to say that DBRX is the leading general purpose model that can be used commercially.

ok_dad|1 year ago

Model weights are just constants in a mathematical equation, they aren’t copyrightable. It’s questionable whether licenses to use them only for certain purposes are even enforceable. No human wrote the weights so they aren’t a work of art/authorship by a human. Just don’t use their services, use the weights at home on your machines so you don’t bypass some TOS.

jMyles|1 year ago

This only applies to projects whose authors seek to comply with the whims of a particular jurisdiction.

Surely there are plenty of project prospects - even commercial in nature - which don't have this limitation.

hipadev23|1 year ago

[deleted]

blueblimp|1 year ago

Qwen1.5-72B-Chat is dominant in the Chatbot Arena leaderboard, though. (Miqu isn't on there due to being bootleg, but Qwen outranks Mistral Medium.)

brucethemoose2|1 year ago

Yeah I know, hence its odd I found it kind of dumb for personal use. Moreso with the smaller models, which lost an objective benchmark I have to some Mistral finetunes.

And I don't think I was using it wrong. I know, for instance, the Chinese language models are funny about sampling since I run Yi all the time.

echaozh|1 year ago

It's Deepseek, not Deepseed, just so people can actually find the model.

belter|1 year ago

For all the Model Cards and License notices, I find it interesting there is not much information on the contents of the dataset used for training. Specifically, if it contains data subject to Copyright restrictions. Or did I miss that?

brucethemoose2|1 year ago

Yeah, its an unspoken but rampant thing in the llm community. Basically no one respects licenses for training data.

I'd say the majority of instruct tunes, for instance, use OpenAI output (which is against their TOS).

But its all just research! So who cares! Or at least, that seems to be the mood.