(no title)
acapybara | 2 years ago
The 3B model, being super fast and accessible, is a game changer for a lot of us who may not have the latest hardware. I mean, running on an RTX 2070 that was released 5 years ago? That's pretty cool.
As for the 7B model, it's great to see that it's already outperforming the Pythia 7B. The bigger dataset definitely seems to be making a difference here. I'm eager to see how far this project goes, and what kinda improvements we can expect in the coming weeks with the new RedPajama dataset they're working on.
One thing I found interesting is the mention of differences between the LLaMA 7B and their replication. I'd love to learn more about those differences, as it could shed light on what's working well and what could be improved further.
SeanAnderson|2 years ago
I played with a pirated 7B model a while back. My computer runs a 1080 TI - so it used to be good but now it's pretty old. The model ran with a reasonable number of tokens/sec, but the quality was just trash compared to what I'd grown used to with ChatGPT. It was a novelty I interacted with for just a single evening.
I truly don't understand the use case for a 3B model with our current technologies.
What are you going to use it for?
examplary_cable|2 years ago
[1] https://chat.lmsys.org/
ttt3ts|2 years ago
Also, ChatGPT just can't do a lot of things because of their "rules". I was doing question answering about products on Amazon with ChatGPT and refused to answer any questions about underwear, certain books/videos, etc
elorant|2 years ago
barbariangrunge|2 years ago
Would the way the m2 MacBooks share memory be an advantage, or would the lack of cuda support be a killer? Can you do anything with 16GB, or do you need 128gb or something like that? How large are the datasets?
I've only used scikit-learn and pandas so far, I'm not very familiar with neural networks yet
youssefabdelm|2 years ago
acapybara|2 years ago
Sure, you may have played with a 7B model in the past, but that doesn't mean there's no use case for a smaller model like the 3B. In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models. Plus, smaller models are generally faster and more accessible, which is always a plus.
Sunhold|2 years ago
awegio|2 years ago
I find it very uncanny to see comments like this that sound like ChatGPT but are surprisingly relevant to the discussion.