Note that what they released are the delta weights from the og LLaMa model. To play around with it, you'll need to grab the original LLaMA 13B model and apply the changes.
> We release Vicuna weights as delta weights to comply with the LLaMA model
> license. You can add our delta to the original LLaMA weights to obtain
> the Vicuna weights.
That's what they say but I just spent 10 minutes searching the git repo, reading the relavent .py files and looking at their homepage and the vicuna-7b-delta and vicuna-13b-delta-v0 files are no where to be found. Am I blind or did they announce a release without actually releasing?
Not a lawyer, but that still feels like dubious territory. I would still be on the hook for acquiring the original download, which Facebook has been launching dmca takedown requests for the llama-dl project.
One of my favorite queries to assess an LLM is Is it legal for a man to marry his widow's sister?
Vicuna-13B: The legality of a man marrying his widow's sister (widow marriage) varies depending on the country and cultural context. In some societies, widow marriage was traditionally practiced or tolerated.
GPT-4: It is not possible for a man to marry his widow's sister, as the term "widow" refers to a deceased man's wife. Therefore, if a man has a widow, he is deceased and therefore cannot marry anyone.
GPT 3.5: No, it is not possible for a man to marry his widow's sister because by definition, a widow is a woman whose spouse has died. If a man is deceased, then he cannot marry anyone, including his widow's sister. This is a logical impossibility and would not be recognized as a valid marriage under any legal system.
Koala: The legality of a man marrying his widow's sister varies by jurisdiction. In some countries, such as the United States, it is generally legal for a man to marry his widow's sister. In other countries, such as the United Kingdom, it is illegal for a man to marry his widow's sister because it is considered incest. It is always important to check the laws and regulations in the jurisdiction in which the marriage will take place to ensure that it is legal.
would still be possibly legal on the basis that if it's not illegal then it's legal - in British jurisprudence tradition at least https://en.wikipedia.org/wiki/Everything_which_is_not_forbid... - namely it's not law that impedes it (also in some places there's posthumous marriage)
Hi! Funnily enough I couldn't find much on it either, so that's exactly what I've been working on for the past few months: just in case this kind of question got asked.
I've recently opened a GitHub repository which includes information for both AI model series[0] and frontends you can use to run them[1]. I've wrote a Reddit post beforehand that's messier, but a lot more technical[2].
I try to keep them as up-to-date as possible, but I might've missed something or my info may not be completely accurate. It's mostly to help get people's feet wet.
the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )
GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant
PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM
feel free to answer back if you're trying any of these things this week (later I might lose track)
That might not be surprising considering these jailbreaks are written and tested specifically against ChatGPT and ChatGPT alone. This model probably has its own jailbreaks that would also be refused by ChatGPT
Just when you think Nvidia will go down something happens that changes it. These days unless you were into gaming or a machine learning dev the integrated graphics were good enough. But now first time in a long time I am interested in getting a gpu for running some of these chatbots locally.
As a very occasional gamer who uses an iMac for work I thought about getting a gaming PC for like 6 years.
Last fall it seemed that all the stars have aligned. The crypto winter and Ethereum switching to proof of stake meant that GPU prices fell to a reasonable level, I knew i would have a bit of a time to play some game during the holidays and as soon as Stable Diffusion was first posted on hacker news I knew that that's my excuse and my sign.
So far I think I have spent more time tinkering with the 20 python environments I have[0] for all the ML projects than playing RDR2.
This model is also censored to the brim, it refuses to answer half of my questions, some of them perfectly legal. It’s useless, we already have GPT-4 (and Vicuna is even more censored/guarded).
Alpaca-30B is much better, it will even tell you how to build a nuclear weapon (incorrectly, of course, it’s not that smart).
I am waiting for Coati13B weights, these should work great.
This looks really good for a run-it-on-your-own-hardware model from the examples and sibling comments. I've been working on a pure AVX2 Rust implementation of LLaMA but was starting to lose interest and been waiting for whatever is the next hot downloadable model, but now I want to add this thing to it.
It's actually very impressive. I gave it the task of converting a query and an OpenAPI spec into an API call, and it worked! I've not been succesful in getting GPT-3.5 to do this without rambling on about the reasoning for its decision.
Usually if I want code from the GPT family I always add "Just show me the code, no extra words or explanation" in the end of the prompt, and it works 99% of the time.
Edit: just finished the conversion of Vicuna myself now and been doing some light testing, seems to work in ~80% of the cases for it, not as high success-rate as with GPT for sure. Probably there is a better way of structuring the prompt for Vicuna.
I have a universal benchmark for judging how much knowledge a language model stores, and it's asking about the G-FOLD paper (https://www.lpi.usra.edu/meetings/marsconcepts2012/pdf/4193....), because I noticed GPT-3.5 hallucinates when asked about it, whereas GPT-4 is capable of providing a high-level overview.
Is there any way yet to train one of these on my entire online output and correspondence in order to create a hyper-personal “autocomplete” or a me-chatbot? lol
All of the llama derivatives are tainted by Meta's license, which makes commercial use illegal and even personal use dubious.
Is all the training material used for Llama available as open source? Maybe lots of folks can pool their resources and create fully open clean models / weights instead.
> All of the llama derivatives are tainted by Meta's license, which makes commercial use illegal and even personal use dubious.
This is not true if you never agreed to Meta's license. If you haven't, you either can't redistribute the weights or you're completely free to use them as you see fit depending on whether weights are copyrightable (very likely) or not. We'll have to wait for the llama-dl lawsuit to find out for sure.
Is it worth it to host this on an EC2 which might take ~1.5$ per hour (on demand) than running GPT3.5 API for this purpose?
What is the breakeven number of queries (~2000 tokens/query) to justify the hosting of such model?
The default loader doesn't seem to let you load quantized models but if you use something like https://github.com/oobabooga/text-generation-webui you can 1) use the model with `--load-in-8bit` which halves the memory (runs on my 24GB consumer card w/o an issue then, probably would fit on a 16GB card). There are also 4-bit quantized models and you can run probably `anon8231489123/vicuna-13b-GPTQ-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128` although there have been reports that bitsandbytes have problems w/ 4bit perf on some cards: https://github.com/TimDettmers/bitsandbytes/issues/181
I got the 64gb MacBook Pro but already realizing the 96gb laptop would have made sense now - I got it in Jan right before all the ai crazy really lite up - distinctly remember thinking who would ever need more then 64gb of ram…
2 x 32GiB (SDDR4 3200MHz) can be had for 170€ and probably less than that if doing the research. Took a bit of faith and a lot of impulse decision-making as this device was/is specified for up to 32GiB RAM only - but it went through.
and changing cuda to mps on line 80:
if args.device == "mps":
I'm not sure it's working correctly but at least it's a step. It's told me how to catch a duck but it often falls into some "renewable energy" sequence. :D
A large array of uniquely-set floating point values. (AKA "parameters".)
In a language model, a word is put in one end (as a numerical index to a wordlist), and then it and the weights multiplied together, and then a new word comes out (again as an index).
Numbers in, numbers out, and a small bit of logic that maps words to numbers and back at either end. ("Encodings".)
"Training" is the typically expensive process of feeding huge amounts of data into the model, to get it to choose the magic values for its weights that allow it to do useful stuff that looks and feels like that training data.
Something else that can be done with weights is they can be "fine-tuned", or "tweaked" slightly to give different overall results out of the model, therefore tailored to some new use-case. Often the model gets a new name after.
In this case, what's been released is not actually the weights. It's a set of these tweaks ("deltas"), which are intended to be added to Meta's LLaMA model weights to end up with the final intended LLaMA-based model, called "Vicuna".
Essentially a computer neural network is just a lot of addition (and matrix multiplication) of floating point numbers. The parameters are the "strength" or "weights" of the connections between neurons on different layers and the "bias" of each neuron. If neuron Alice is connected to neuron Bob and Alice has a value of 0.7, and the weight of Alice's connection to bob is 0.5, then the value sent from Alice to Bob is 0.35. This value (and the values from all the other incoming connections) are summed at added to the neuron's negative bias.
I highly recommend checking out 3blue1brown series on how neural nets, gradient descent, and the dot product (implemented as a matrix multiplication) all tie together: https://www.youtube.com/watch?v=aircAruvnKk
They basically encapsulate what a model has "learned." ML models without their weights are useless because the output is essentially random noise. You then train the model on data, and it changes the weights into numbers that cause the whole thing to work. Training data and processing power are usually very expensive so the resulting weights are valuable.
a5huynh|2 years ago
superkuh|2 years ago
swyx|2 years ago
stcredzero|2 years ago
(I know a vicuna is a llama like animal.)
0cf8612b2e1e|2 years ago
gigel82|2 years ago
Vicuna-13B: The legality of a man marrying his widow's sister (widow marriage) varies depending on the country and cultural context. In some societies, widow marriage was traditionally practiced or tolerated.
GPT-4: It is not possible for a man to marry his widow's sister, as the term "widow" refers to a deceased man's wife. Therefore, if a man has a widow, he is deceased and therefore cannot marry anyone.
andai|2 years ago
Koala: The legality of a man marrying his widow's sister varies by jurisdiction. In some countries, such as the United States, it is generally legal for a man to marry his widow's sister. In other countries, such as the United Kingdom, it is illegal for a man to marry his widow's sister because it is considered incest. It is always important to check the laws and regulations in the jurisdiction in which the marriage will take place to ensure that it is legal.
https://chat.lmsys.org/?model=koala-13b
stevenhuang|2 years ago
You'd probably need to come up with a new one now though, or confirm knowledge cutoff for the next evaluation :p
muyuu|2 years ago
ode|2 years ago
takantri|2 years ago
I've recently opened a GitHub repository which includes information for both AI model series[0] and frontends you can use to run them[1]. I've wrote a Reddit post beforehand that's messier, but a lot more technical[2].
I try to keep them as up-to-date as possible, but I might've missed something or my info may not be completely accurate. It's mostly to help get people's feet wet.
[0] - https://github.com/Crataco/ai-guide/blob/main/guide/models.m...
[1] - https://github.com/Crataco/ai-guide/blob/main/guide/frontend...
[2] - https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...
muyuu|2 years ago
these could be useful:
https://nixified.ai
https://github.com/Crataco/ai-guide/blob/main/guide/models.m... -> https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...
https://github.com/cocktailpeanut/dalai
the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )
GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant
PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM
feel free to answer back if you're trying any of these things this week (later I might lose track)
sr-latch|2 years ago
sillysaurusx|2 years ago
I tried a few from https://www.jailbreakchat.com/ and it refused them all. Interesting.
a2128|2 years ago
xbmcuser|2 years ago
yreg|2 years ago
Last fall it seemed that all the stars have aligned. The crypto winter and Ethereum switching to proof of stake meant that GPU prices fell to a reasonable level, I knew i would have a bit of a time to play some game during the holidays and as soon as Stable Diffusion was first posted on hacker news I knew that that's my excuse and my sign.
So far I think I have spent more time tinkering with the 20 python environments I have[0] for all the ML projects than playing RDR2.
[0] https://xkcd.com/1987/
atemerev|2 years ago
Alpaca-30B is much better, it will even tell you how to build a nuclear weapon (incorrectly, of course, it’s not that smart).
I am waiting for Coati13B weights, these should work great.
Beaver117|2 years ago
adeon|2 years ago
I'll be busy next few days. Heck yeah.
sr-latch|2 years ago
simse|2 years ago
capableweb|2 years ago
Edit: just finished the conversion of Vicuna myself now and been doing some light testing, seems to work in ~80% of the cases for it, not as high success-rate as with GPT for sure. Probably there is a better way of structuring the prompt for Vicuna.
zhwu|2 years ago
mesmertech|2 years ago
mlboss|2 years ago
https://pastebin.com/urDUsEew
weichiang|2 years ago
sr-latch|2 years ago
pmarreck|2 years ago
vlugorilla|2 years ago
> This conversion command needs around 60 GB of CPU RAM.
Ok. I don't have that. Has/will someone release the full weights with the deltas applied?
capableweb|2 years ago
akiselev|2 years ago
unknown|2 years ago
[deleted]
gigel82|2 years ago
Is all the training material used for Llama available as open source? Maybe lots of folks can pool their resources and create fully open clean models / weights instead.
7to2|2 years ago
This is not true if you never agreed to Meta's license. If you haven't, you either can't redistribute the weights or you're completely free to use them as you see fit depending on whether weights are copyrightable (very likely) or not. We'll have to wait for the llama-dl lawsuit to find out for sure.
r00fus|2 years ago
noobcoder|2 years ago
Animats|2 years ago
Everybody's server costs are about to go the roof.
akiselev|2 years ago
capableweb|2 years ago
lhl|2 years ago
qeternity|2 years ago
holoduke|2 years ago
Animats|2 years ago
phoenixreader|2 years ago
taf2|2 years ago
Jiocus|2 years ago
2 x 32GiB (SDDR4 3200MHz) can be had for 170€ and probably less than that if doing the research. Took a bit of faith and a lot of impulse decision-making as this device was/is specified for up to 32GiB RAM only - but it went through.
This is precisely the use case I had in mind
*Lenovo 16ACH6H
progman32|2 years ago
capableweb|2 years ago
d4rkp4ttern|2 years ago
swsdsailor|2 years ago
Allow passing in --device="mps": ie: choices=["cuda", "cpu", "mps"]
Set kwargs: kwargs = { "torch_dtype": torch.float16 }
then adding to("mps") on line 98: model = AutoModelForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True, *kwargs).to('mps')
commenting out: raise ValueError(f"Invalid device: {args.device}")
and changing cuda to mps on line 80: if args.device == "mps":
I'm not sure it's working correctly but at least it's a step. It's told me how to catch a duck but it often falls into some "renewable energy" sequence. :D
pugio|2 years ago
nerdchum|2 years ago
detrites|2 years ago
In a language model, a word is put in one end (as a numerical index to a wordlist), and then it and the weights multiplied together, and then a new word comes out (again as an index).
Numbers in, numbers out, and a small bit of logic that maps words to numbers and back at either end. ("Encodings".)
"Training" is the typically expensive process of feeding huge amounts of data into the model, to get it to choose the magic values for its weights that allow it to do useful stuff that looks and feels like that training data.
Something else that can be done with weights is they can be "fine-tuned", or "tweaked" slightly to give different overall results out of the model, therefore tailored to some new use-case. Often the model gets a new name after.
In this case, what's been released is not actually the weights. It's a set of these tweaks ("deltas"), which are intended to be added to Meta's LLaMA model weights to end up with the final intended LLaMA-based model, called "Vicuna".
superkuh|2 years ago
I highly recommend checking out 3blue1brown series on how neural nets, gradient descent, and the dot product (implemented as a matrix multiplication) all tie together: https://www.youtube.com/watch?v=aircAruvnKk
ozmodiar|2 years ago
MMMercy2|2 years ago
tomp|2 years ago