top | item 35243877

(no title)

lgbr | 2 years ago

It's absolutely fantastic that we have so many runtimes, so quickly, to the point where we have an awesome list.

However, given that the usefulness of chatbots depends more on the model being used, what I would find a lot more useful is a ranking of the various models that are available. Currently I'm having to rely on comments on the internet to find out if Alpaca 7B or LlaMA 65B is genuinely productive to use. As new models come out, I'd love it if I knew how well it tells jokes, answers complicated questions, or generates code.

discuss

LASR|2 years ago

We have a whole team of folks just watching for these to come out and then go evaluate them.

Short answer: none of them do as well as the OG Davinci-003. Not even close. Even the 3.5 Turbo models from OpenAI don’t do as well.

We throw some sophisticated prompts at them to attempt chain of thought reasoning.

WinstonSmith84|2 years ago

That's quite a confusing comment. `davinci-003` is from OpenAI, whereas ChatGPT is some sort of variants more "optimized" for chatting. Said differently, GPT3 or 3.5 is a customized version of `davinci-003`, made for chatting. Please don't ask me on the details, I don't know, but `davinci-003` is not an alternative to ChatGPT

inciampati|2 years ago

Do you have a citation for that?

simonw|2 years ago

What kind of things have you seen davinci-003 do better than 3.5 turbo?

dr_dshiv|2 years ago

We need open benchmarks, clearly. Know any projects in that space?

dotancohen|2 years ago

Could you expand on this a bit more? What types of prompts? What are your evaluation criteria?

This actually sounds fascinating. Not unlike birdwatching! ))

joenot443|2 years ago

That’s interesting - what about 4?