top | item 35836588

(no title)

acapybara | 2 years ago

I've been following the RedPajama project closely and I must say, it's quite an impressive undertaking. The fact that it's all open-source, and the collaboration between various institutions, is nothing short of amazing. This shows the power of the open-source community in action, with a bunch of smart people coming together to build something truly remarkable.

The 3B model, being super fast and accessible, is a game changer for a lot of us who may not have the latest hardware. I mean, running on an RTX 2070 that was released 5 years ago? That's pretty cool.

As for the 7B model, it's great to see that it's already outperforming the Pythia 7B. The bigger dataset definitely seems to be making a difference here. I'm eager to see how far this project goes, and what kinda improvements we can expect in the coming weeks with the new RedPajama dataset they're working on.

One thing I found interesting is the mention of differences between the LLaMA 7B and their replication. I'd love to learn more about those differences, as it could shed light on what's working well and what could be improved further.

discuss

SeanAnderson|2 years ago

Sorry, excuse my ignorance, but why is having access to a 3B model a gamechanger?

I played with a pirated 7B model a while back. My computer runs a 1080 TI - so it used to be good but now it's pretty old. The model ran with a reasonable number of tokens/sec, but the quality was just trash compared to what I'd grown used to with ChatGPT. It was a novelty I interacted with for just a single evening.

I truly don't understand the use case for a 3B model with our current technologies.

What are you going to use it for?

examplary_cable|2 years ago

You can ultra fine tune those models ... look at vicune 13B, if you know how to prompt it well, you can get it to work as """"well"""" as ChatGPT. Running on local hardware .... I just got vicune 13b on gradio[1] to act as japanese kanji personal trainer, and I've only used a simple prompt: "I want you to act as a Japanese Kanji quiz machine. Each time I ask you for the next question, you are to provide one random Japanese kanji from JLPT N5 kanji list and ask for its meaning. You will generate four options, one correct, three wrong. The options will be labeled from A to D. I will reply to you with one letter, corresponding to one of these labels. You will evaluate my each answer based on your last question and tell me if I chose the right option. If I chose the right label, you will congratulate me. Otherwise you will tell me the right answer. Then you will ask me the next question. Avoid simple kanjis, let's go."

[1] https://chat.lmsys.org/

ttt3ts|2 years ago

Finetuning which can easily be done on consumer hardware and can give these models a lot more power for specific applications.

Also, ChatGPT just can't do a lot of things because of their "rules". I was doing question answering about products on Amazon with ChatGPT and refused to answer any questions about underwear, certain books/videos, etc

elorant|2 years ago

Depends on what you want it for. Chatting isn't the only application. For text summarization a model like Vicuna-13b has similar performance to ChatGPT 3.5. Fine-tuned models like the one in this thread might perform way better than the initial ones that leaked from Meta. The important thing is that there's constant progress in this area from the Open Source community and we're about to see amazing things in the future.

barbariangrunge|2 years ago

I'm in the market for a laptop. If I was crazy and wanted to run or train models like these, what kind of resources would I need?

Would the way the m2 MacBooks share memory be an advantage, or would the lack of cuda support be a killer? Can you do anything with 16GB, or do you need 128gb or something like that? How large are the datasets?

I've only used scikit-learn and pandas so far, I'm not very familiar with neural networks yet

youssefabdelm|2 years ago

Completely agree. Perhaps they were planning to fine-tune it for something though.

acapybara|2 years ago

Hey SeanAnderson, good question! While parameter count is certainly an important factor in model performance, it's not the only one. The RedPajama project is taking a more nuanced approach to understanding what makes a model perform well, and their focus on smaller models like the 3B is a big part of that.

Sure, you may have played with a 7B model in the past, but that doesn't mean there's no use case for a smaller model like the 3B. In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models. Plus, smaller models are generally faster and more accessible, which is always a plus.

Sunhold|2 years ago

Took me a bit to realize this comment was written by an LLM.

awegio|2 years ago

How did you realize it here? This user has multiple comments in this thread but this one actually sounds more normal than the others.

I find it very uncanny to see comments like this that sound like ChatGPT but are surprisingly relevant to the discussion.