top | item 35806920

(no title)

Impressive model, thank you for releasing it under a business-friendly license!

Have you considered using Google's sparse "scaling transformer" architecture as the base? Even at 3B scale it can generate 3-4x more tokens per FLOP while being competitive at perplexity with a dense transformer. I think OpenAI uses a variant of it in their ChatGPT-3.5-Turbo product.

Here is the paper https://arxiv.org/abs/2111.12763 and the implementation https://github.com/google/trax/blob/master/trax/models/resea... if you are interested.

Hope you get to look into this!

discuss

b33j0r|2 years ago

Thank you for releasing the weights along with the announcement. The posts that made great headlines, but “weights are on their way!”

Like why did we even get excited? This? Great work.

swyx|2 years ago

> I think OpenAI uses a variant of it in their ChatGPT-3.5-Turbo product.

is that a guess or is there a source? im curious to read more

kir-gadjello|2 years ago

It is a guess informed by some familiarity with the literature and by going over the papers authored by researchers credited in the OpenAI's "GPT-4 contributors" web page.

I have an expanded list of foundational research that is likely to serve as basis for gpt4 here in my blog: https://kir-gadjello.github.io/posts/gpt4-some-technical-hyp...

Hope it helps!

chaxor|2 years ago

I don't think it's a business friendly license?

kir-gadjello|2 years ago

It allows for modifications and commercial use: https://creativecommons.org/licenses/by-sa/4.0/

>You are free to:

>Share — copy and redistribute the material in any medium or format

>Adapt — remix, transform, and build upon the material

>for any purpose, even commercially.

Compare this to the latest release from StabilityAI lab DeepFloyd, "IF", which in addition to various restrictive clauses strictly prohibits commercial use: https://github.com/deep-floyd/IF/blob/develop/LICENSE-MODEL

Repl.it's release is as open as it gets these days, in my book.