I was able to run a LLaMa on my personal machine to run some labeling on my documents, as a test of its capabilities. It was instruct tune. 30b parameters
4 example labels, and I had a binary classifier in seconds. Sure, semantic text classifiers were possible for a while, but making it accessible changes everything. Giving anyone who can use a spreadsheet the power of a local LLM (or, basically free LLMs) can make them much, much more productive. A lot of office work is clicking through sheets and doing manual labeling.
It's truly wild what is becoming accessible! Really excited to see the next gen software that the open community comes up with :)
LLMs as general purpose classifiers is a really big deal, especially because you can give them fuzzy instructions. I know people are worried about LLMs and spam, but I think LLMs may provide an opportunity to elevate online discourse by being more efficient at filtering out spam and low quality commentary.
Agree, though plain old Bayesian classifiers have been able to handle some significant portion of that office work for a long time. And not much ever came from it for everyday stuff outside of spam filters.
Maybe both the buzz factor and broader applicability means it's more likely to happen this go around?
> Sure, semantic text classifiers were possible for a while, but making it accessible changes everything.
Binary classification can actually take you all the way in terms of classification if you are clever with set theory. It's also one of the most traceable & deterministic ways to understand how the natural language is being interpreted at each step.
The amount of performance required to run something like an SVM is laughable compared to what is required to run even baby-tier LLMs. If you can reduce the cost of running models to a <1ms invocation over a few megabytes of black box, you can easily test thousands of these per-user-query. Re-training and iterating is much more enjoyable for these reasons. You also don't need any GPUs for this.
At the end of the day, the quality of your data will be the biggest issue with older techniques. LLMs can bandaid all sorts of weird things that crop up in the real world and aren't present in the training data. SVMs cannot tolerate requests delivered in the format of Shakespeare (if unexpected). In a well-controlled domain, you would probably be able to get away with much cheaper options that are also more flexible.
What makes it so much better than normal text classification for me is it doesn't require tons of training data to accurately classify text. using it to parse craigslist posts which i might find interesting showed very promising results although it's fairly slow on my base m1 machine.
I expect we will see the biggest jump in performance if (when) consumer-grade coprocessors like mobile GPUs start incorporating attention layers as a primitive building block at the hardware level, e.g., with instructions and memory layouts engineered specifically to make ultra-low-precision (say, 4-bit) transformer layers as compute- and memory-efficient as possible on consumer devices. That seems almost inevitable to me.
I found this to be very liberating, that I can finally type whatever I want into the LLM, without the possibility of the government knowing what I am writing. Just being able to do that, and have the watchful eye of the state not being able to monitor you is amazing.
Apple should get working on a version of the Neural Engine that is useful for these models, and remove the 3GB size limit [1] to take full advantage of the 'unified' memory architecture. Game changer.
Waste of die space currently (on Macbook at least, I'm sure they find uses for it in the iPhone)
It's not a waste on Mac, it will dynamically switch between GPU and NPU whenever CoreML is called. There are a decent amount of applications that use CoreML.
It appears there is this genre of articles pretending that LLAMA or its RL-HF tuned variants are somehow even close to an alternative to ChatGPT.
Spending more than a few moments interacting even with the larger instruct-tuned variants of these models quickly dispels that idea. Why do these takes around open-source AI remain so popular? What is the driving force?
I've posted this before, but it seems like this genre is just getting more and more popular - and more and more untethered from any actual metrics of how good these models are.
It's great they got LLMs running on resource constrained devices but are they any good? Or I should ask, with the limited resources they get, what good are they for?
From my experience with llama.cpp and oobaboogas webui I can say they are amazing, at least on my gaming pc. I’m absolutely astonished at the speed and quality of llama, alpaca, galactica and vicuna (the >10B parameters ones).
Make no mistake, it’s for tinkerers that do not expect each prompt to be answered human like.
I see them as creativity and thought testing tools, also knowledge exploratory.
As I see these things come out, it feels like there's not a lot of discussion on which hardware (that isn't one of the fancy new Macs?) As in, there might be a lot of graphics cards out there that could be used here? Is it only Nvidia still, is AMD a possibility? Maybe I'm missing something on how the tech works?
30B llama needs a 3090 or 4090. 13B I think you can get away with a 3/4080. If you have 64 gigs of ram and a beefy CPU you can run even 65B, but boy it's slow.
13B is pretty meh, but 30B is great, if not quite Chatgpt. But I can ask it why my highschool geometry teacher was such a cunt and it will happily discuss the matter without reservation. Very therapeutic.
I don't think it will help. Actual friends occasionally send me mail that says "test" from a random account. And spammers do too... There is no way to seperate them.
I don't understand why people are so excited to build this big thing on top of Llama, which is closed source, severely license restricted and we now know for a fact that Meta is going after users with the legal hammer.
I'm sure if we'd pool resources together we could build a truly open alternative worthy of building on top of.
One simple thing these LLM models cannot do yet .. that is to simply point a LLM to a URL and it will start scraping - ie follow the hyperlinks and start consuming the content. I am not an AI guy but I guess this has to do with the context limitations of most model? How did they train OpenAI with all internet data till 2021? This I think will be a most popular feature for LLM models and I seriously hope it is OSS whenever it comes out.
While many NLP related Apple ML job listings have been added since this article was written, there were several recent listings at the time of its writing. While I feel that Apple does not focus well on intangible technologies, products that can't be readily carried, worn and given their boutique product development fetish focus, I have some hope that they can overcome this bias somewhat, and see how behind they are.
Out of doubt, which seems to be spreading around the internet. The LLaMa model weights weren't "leaked" AFAIK but rather explicitly given access to to researchers, isn't it right?
I know the article goes on to speak about something else, but I'm not sure why this claim that the LLaMa model weights were leaked, as in unintendenly made available is being done.
My understanding is that researchers could ask for access to weights, but then also they were leaked so that anyone could get them without asking. There is another layer, where Facebook seems to accept it on some level (I mean they don't have a choice anymore anyway); they put a cheeky comment in the open pull request instead of closing it.
The model weights were only shared by FB to people who applied for research access. Github repos containing links to the model weights have been taken down by FB.
LLAMA isn't there and probably never will be, but the possibility of running something equivalent to ChatGPT has certainly made me reconsider my GPU purchases. I wonder if in the end will it be Nvidia's CUDA advantage or AMD's larger amount of memory that will end up being more important when we do get it.
This is wonderful. As hardware and software continues to improve, everything seems to find a way to run on ever smaller devices. Guess your own pocket-AGI is not too far away after all.
By the way, I was thinking of something along the lines of a powerful FPGA with direct access to large quantities of very fast NAND flash, likely many chips in parallel, which will save having to load the model into RAM..... So it will be able to directly run from NAND flash, which opens up the possibility of using very large models???
Power consumption would not be an issue if it's used sporadically throughout the day, it's not like it needs to run continuously?
There is still the issue of NAND flash read disturb, which I haven't fully looked into yet.
[+] [-] yacine_|2 years ago|reply
4 example labels, and I had a binary classifier in seconds. Sure, semantic text classifiers were possible for a while, but making it accessible changes everything. Giving anyone who can use a spreadsheet the power of a local LLM (or, basically free LLMs) can make them much, much more productive. A lot of office work is clicking through sheets and doing manual labeling.
It's truly wild what is becoming accessible! Really excited to see the next gen software that the open community comes up with :)
[+] [-] rcme|2 years ago|reply
[+] [-] tyingq|2 years ago|reply
Maybe both the buzz factor and broader applicability means it's more likely to happen this go around?
[+] [-] bob1029|2 years ago|reply
Binary classification can actually take you all the way in terms of classification if you are clever with set theory. It's also one of the most traceable & deterministic ways to understand how the natural language is being interpreted at each step.
The amount of performance required to run something like an SVM is laughable compared to what is required to run even baby-tier LLMs. If you can reduce the cost of running models to a <1ms invocation over a few megabytes of black box, you can easily test thousands of these per-user-query. Re-training and iterating is much more enjoyable for these reasons. You also don't need any GPUs for this.
At the end of the day, the quality of your data will be the biggest issue with older techniques. LLMs can bandaid all sorts of weird things that crop up in the real world and aren't present in the training data. SVMs cannot tolerate requests delivered in the format of Shakespeare (if unexpected). In a well-controlled domain, you would probably be able to get away with much cheaper options that are also more flexible.
[+] [-] czbond|2 years ago|reply
[+] [-] ticviking|2 years ago|reply
The potential amplifying power of that is enormous.
[+] [-] alden5|2 years ago|reply
[+] [-] syntaxing|2 years ago|reply
[+] [-] hospitalJail|2 years ago|reply
Curious how you run the model then interface with it.
[+] [-] cs702|2 years ago|reply
[+] [-] 1827162|2 years ago|reply
[+] [-] seydor|2 years ago|reply
[+] [-] anentropic|2 years ago|reply
Waste of die space currently (on Macbook at least, I'm sure they find uses for it in the iPhone)
[1] https://github.com/smpanaro/more-ane-transformers/blob/main/...
[+] [-] leetharris|2 years ago|reply
But I do agree it should be improved!
[+] [-] Torkel|2 years ago|reply
[+] [-] turnsout|2 years ago|reply
[+] [-] whimsicalism|2 years ago|reply
I've posted this before, but it seems like this genre is just getting more and more popular - and more and more untethered from any actual metrics of how good these models are.
[+] [-] emrah|2 years ago|reply
[+] [-] kbrkbr|2 years ago|reply
Make no mistake, it’s for tinkerers that do not expect each prompt to be answered human like.
I see them as creativity and thought testing tools, also knowledge exploratory.
[+] [-] jrm4|2 years ago|reply
[+] [-] seydor|2 years ago|reply
https://www.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...
[+] [-] boppo1|2 years ago|reply
13B is pretty meh, but 30B is great, if not quite Chatgpt. But I can ask it why my highschool geometry teacher was such a cunt and it will happily discuss the matter without reservation. Very therapeutic.
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] layer8|2 years ago|reply
[+] [-] londons_explore|2 years ago|reply
[+] [-] imaurer|2 years ago|reply
https://github.com/imaurer/awesome-decentralized-llm
[+] [-] turnsout|2 years ago|reply
[+] [-] gigel82|2 years ago|reply
I'm sure if we'd pool resources together we could build a truly open alternative worthy of building on top of.
[+] [-] FL33TW00D|2 years ago|reply
[+] [-] la64710|2 years ago|reply
[+] [-] marijnz|2 years ago|reply
[+] [-] waffletower|2 years ago|reply
[+] [-] simonw|2 years ago|reply
I have Kevin Kwok's SheepyT running on my iPhone right now - it uses GPT-J, which is an openly licensed LLM by EleutherAI.
https://twitter.com/antimatter15/status/1644456371121954817
[+] [-] monkeydust|2 years ago|reply
'You are a koala who plays with 5-7 year olds, you are friendly natured and curious and like to ask questions'
[+] [-] jjtheblunt|2 years ago|reply
https://www.youtube.com/watch?v=YRsICbxDEiI
[+] [-] txomon|2 years ago|reply
I know the article goes on to speak about something else, but I'm not sure why this claim that the LLaMa model weights were leaked, as in unintendenly made available is being done.
[+] [-] ftxbro|2 years ago|reply
[+] [-] turmeric_root|2 years ago|reply
[+] [-] causi|2 years ago|reply
[+] [-] croes|2 years ago|reply
[+] [-] binkHN|2 years ago|reply
[+] [-] 1827162|2 years ago|reply
Power consumption would not be an issue if it's used sporadically throughout the day, it's not like it needs to run continuously?
There is still the issue of NAND flash read disturb, which I haven't fully looked into yet.