(no title)
brucethemoose2 | 1 year ago
- Miqu 70B (General Chat)
- Deepseed 33B (Coding)
- Yi 34B (for chat over 32K context)
And of course, there are finetunes of all these.
And there are some others in the 34B-70B range I have not tried (and some I have tried, like Qwen, which I was not impressed with).
Point being that Llama 70B, Mixtral and Grok as seen in the charts are not what I would call SOTA (though mixtral is excellent for the batch size 1 speed)
jph00|1 year ago
So it's fair to say that DBRX is the leading general purpose model that can be used commercially.
ok_dad|1 year ago
jMyles|1 year ago
Surely there are plenty of project prospects - even commercial in nature - which don't have this limitation.
ldehaan|1 year ago
[deleted]
hipadev23|1 year ago
[deleted]
blueblimp|1 year ago
brucethemoose2|1 year ago
And I don't think I was using it wrong. I know, for instance, the Chinese language models are funny about sampling since I run Yi all the time.
echaozh|1 year ago
belter|1 year ago
brucethemoose2|1 year ago
I'd say the majority of instruct tunes, for instance, use OpenAI output (which is against their TOS).
But its all just research! So who cares! Or at least, that seems to be the mood.
unknown|1 year ago
[deleted]