Show HN: A fully open-source (Apache 2.0)implementation of llama

[+] adeon|3 years ago|reply

I think implying that GPL is not "fully open source" is a hot take. It's specifically designed to ensure you and anyone you distribute your code gets the same freedoms. Maybe you don't agree that it's a good license but that is its intention. GPL vs BSD-type licenses I guess is decades long argument by now.

Maybe I'm a naive idealist but IMO the GPL-family of licenses are underrated. You can use them to make sure you don't work for free for someone who won't share their improvements.

I liked the choice of AGPL for AUTOMATIC1111 Stable Diffusion web UI. (https://github.com/AUTOMATIC1111/stable-diffusion-webui)

Commercial interests are very allergic to AGPL which ensures the project stays community-run and new features and fixes will prioritize the most ordinary user doing things for fun.

[+] cuuupid|3 years ago|reply

I think OP mischaracterized the issue with the license, its more that the weights don’t fall under the same scope. They’re research use only, no commercial use allowed.

[+] kmeisthax|3 years ago|reply

As far as I'm aware the GPL/BSD license argument is basically dead now and people just use whatever. In retrospect it seems to be less an argument about whether or not copyleft clauses are bad and more to do with Berkley not wanting to deal with RMS.

>Commercial interests are very allergic to AGPL which ensures the project stays community-run

Mostly because AGPL is not a Free license unless you take great pains to build license compliance into the program that you ship. If you don't do this, then people who want to modify your code need to first build the license compliance mechanism before they can do anything else. This is not how any other Free license works. And compliance is not always obvious, either. Hector Martin has documented a few different cases of terrible AGPL uses. My favorite is an Ethernet PHY[0], which practically speaking cannot offer AGPL source in the way the license intends. AGPL only works for one particular use case, which is web[1] applications written in an interpreted language that can introspect its own source code. So Perl, PHP, and Python to varying degrees.

Also, let's keep in mind that Stable Diffusion's weights are licensed under a moderate copyleft with a morality clause - CreativeML OpenRAIL-M. Morality clauses are incompatible with all flavors of GPL, and the "program" clause in GPL is vague enough to encompass the model weights. At least, assuming that the model weights are copyrightable, which they might not be. Morality clauses are also non-free, though I'll settle for "don't use this for political disinformation campaigns or porn" over "pony up for our hosted API where we can enforce new morality clauses whenever we like".

If you want a no-corpos license, then don't use a license at all[2]. Non-commercial clauses will also work since they effectively confer no rights[3]. Keep in mind that anyone who can gain sufficient copyright interest in the code can sue, and that AI art tends to be a bottomless well of scenesters. I'd rather not subject ordinary users to legal risk, though.

If you want a "service provider loophole-proof" license, use the OpenWatcom License. It is far less ambiguous and has a reasonable compliance path: if you use the software you have to publish source. Period. It's simple, it does what the AGPL set out to do, and people would use it if it wasn't for Stallman saying this:

> This is not a free software license. It requires you to publish the source code publicly whenever you "Deploy" the covered software, and "Deploy" is defined to include many kinds of private use.

This sounds like a fixable problem: just make the clause only trip on modification, so that if you use a modified version privately you have to publish those modifications, but unchanged software doesn't have to be published. Someone hosting unmodified versions of the software isn't a threat to software freedom, and we consider Freedom Three more violable than Freedom Zero - that's why we tolerate GPL and why AGPL was drafted. But as far as I'm aware such a license does not exist and the few people interested in Extremely Strong Copyleft just use AGPL despite its flaws.

[0] https://social.treehouse.systems/@marcan/110038008055623292

[1] Hector Martin has also posited working around the AGPL's requirement to provide source on network access by putting the web app behind a reverse proxy that hides the source. I am not willing to test this by getting sued by the Mastodon developers.

[2] The various Silly Licenses might work as sufficient corporate deterrent insamuch as a court is willing to disregard them.

[3] Specifically, there is no copyright definition of noncommercial use, and most copyright laws assume that the mere utility of the work in question is inherently commercial. There is no "as long as they aren't making money off of it" license because not having to pay for the work is considered making money off of it.

To be pedantic, Creative Commons -NC does state that filesharing is non-commercial, so that can be interpreted as a "BitTorrent only" license clause.

[+] ipsum2|3 years ago|reply

FYI, there's something fishy going on in this thread. Multiple people from the LightningAI team theaniketmaurya (developer advocate for Lightning AI) and rasbt (developer at Lightning AI) are shilling for this post without disclosing their affiliations. The account that submitted this (osurits) also only has two comments, also with the same behavior.

Having interacted with the Lightning AI team in the past, this is unsurprising behavior.

[+] philipkglass|3 years ago|reply

If you suspect vote manipulation, email [email protected]. Dang is good about replying to email and he has server-side logs available for more investigation.

[+] querez|3 years ago|reply

IANAL, but this seems very fishy to me: 1) I don't understand how this isn't a derivative work of the original code, as I very highly doubt you've done a clean room implementation. I doubt this would hold up in court.

2) Doesn't the original FB license also apply to the weights? Just re-implementing the code would not change the license on the weights. So while THE CODE may now be re-licensed, the weights would still fall under the original license.

I'd love if someone with more legal understanding could shed some light on this.

[+] rnosov|3 years ago|reply

1) I've looked at both codebases and this one is definitely a derivative of the nanoGPT. You can compare all three implementations yourself as they are actually surprisingly compact and readable.

2) The issue whether weights are copyrightable at all has not been settled yet. If they are, there is a fair use doctrine that allows transformative works of a copyrighted work. The line is a bit blurry but consider Cariou v. Prince case[1] where addition of colour to some black and white photos was considered enough to be transformative. Similarly, full fine tuning on current news or adding visual modality could potentially create a brand new model in the eyes of a law.

[1] https://cyber.harvard.edu/people/tfisher/cx/2013_Cariou.pdf

[+] MacsHeadroom|3 years ago|reply

>I don't understand how this isn't a derivative work of the original code

The original code is Apache 2 licensed. Derivatives are fine and allowed. This retains the same Apache 2 license as Facebook's code.

It's only the model that isn't covered by that permissive Apache 2 license. A model produced by a derivative of the permissively licensed code, or even by the original code itself, is not a derivative or the original non-permissively licensed model produced by the original code and is non-infringing even if it is a bit-perfect replica.

> Doesn't the original FB license also apply to the weights?

Again, there are different licenses for the code and the model and neither license actually applies to the weights within the model only the actual exact model. If this project produced a bit-for-bit replica of Facebook's model it would still not infringe on that model's license.

But it doesn't produce a bit-for-bit replica. Even if Facebook were to re-run their same training code on their same hardware would they could not produce the exact same weights as before since massively parallel matrix multiplications are not deterministic. Benign environmental noise like microscopic fluctuations in temperature make a difference in the outcome.

[+] mostdataisnice|3 years ago|reply

This is the clearest example of an attention grab I have seen - it does nothing for commercial use of Llama unless they provide a version of the weights produced by them and not Facebook. (and they don't...they ask you to download them from Facebook's repo)

[+] 2Gkashmiri|3 years ago|reply

Bs.

Prevents meaningful academic.....

How the hell does agpl prevent academic use? Commercial use sure because agpl follows 4 freedoms and commercial often wants to take someone else's work, slap their brand without acknowledging the original work. That and the downstream is often closed source for "business reasons" which causes their users to not enjoy the fruits of the first party's licensing.

Where does academia come into it? Are researchers now keeping everything under wraps for "shareholders interests"?

Isn't academia supposed to be open culture from the start without any restrictions so what am I missing or are they mixing two unrelated things?

Also, I think I might be wrong but isn't it merely converting llama into their version? Uh ...

[+] ftxbro|3 years ago|reply

I'm not saying this is how it should be, but a lot of the author lists of published papers on scaling properties of large language models have been employees in research divisions within big tech companies or academics holding dual positions with those companies and with their university.

> Where does academia come into it? Are researchers now keeping everything under wraps for "shareholders interests"? Isn't academia supposed to be open culture from the start without any restrictions so what am I missing or are they mixing two unrelated things?

Yeah academia was never perfect, but it's becoming more and more like you describe. It's been happening for a while and that's a whole other thing.

[+] alexb_|3 years ago|reply

>GPL...prevents meaningful academic and commercial use

WTF are you talking about?

[+] theaniketmaurya|3 years ago|reply

GPL is a copyleft license which requires you to share anything that you build using the original software. This makes it difficult for commercial use.

[+] homarp|3 years ago|reply

llama.cpp is also MIT

https://github.com/ggerganov/llama.cpp

previously discussed here https://news.ycombinator.com/item?id=35100086

and one of the rust wrapper: https://news.ycombinator.com/item?id=35171527 (also MIT)

[+] barefeg|3 years ago|reply

But aren’t the weights still not for commercial use?

[+] Ciantic|3 years ago|reply

That's what I thought too, the source code was not an issue so much as that.

What we need is some sort of "Large Language Model at Home" (like SETI@home was) that could crowdsource the creation of the model which would be free to use.

[+] ficiek|3 years ago|reply

If you hate GPL so much then I assume that you don't run any GPL licensed code on your machines then. I admire your resolve because I would think that is pretty hard!

[+] javimh|3 years ago|reply

No, the GPL doesn't prevent meaningful academic or commercial use; rather, it seeks to prevent individuals from taking advantage of free software to limit the freedom of other users. It is important to note that if you live in a free country, there are laws that protect the liberties of all citizens and prevent actions that could restrict those freedoms.

[+] blendergeek|3 years ago|reply

> We believe that AI should be fully open source and part of the collective knowledge.

As do I.

> The original LLaMA code is GPL licensed which means any project using it must also be released under GPL.

Yep. This ensures that AI is "fully open source and part of the collective knowledge."

> This "taints" any other code and prevents meaningful academic and commercial use.

Taints? As in "makes fully open source"? Isn't that the goal?

> Lit-LLaMA solves that for good.

Lit-LLaMA helps people create proprietary closed-source AI instead of the fully open source AI required by Llama. Okay.

[+] nynx|3 years ago|reply

There are already a million ways to run LLaMA. This doesn't change the issue at all, which is that the weights aren't commercially licensed.

[+] theaniketmaurya|3 years ago|reply

Yes, agree that the weights aren't commercially licensed (yet)! The other ways to run LLaMA are using GPL license which makes it difficult for commercial use even if someone trains and upload the weights publicly.

This could be a step in for the change :)

[+] rasbt|3 years ago|reply

I think some businesses and people are worried about using GPL code in their code bases because that's incompatible with their own licenses.

[+] nl|3 years ago|reply

Just noting that HuggingFace has a Llama code implementation[1]. It's also under an Apache 2 license.

While this seems to be nice code I don't particularly see any reason to use that over HuggingFace transformers, where you can easily swap out alternative implementations.

Also, going to legal restrictions on the Facebook LLama code when there are much stronger restrictions on the use of the model seems an odd thing to do. It's true that in some - not all - jurisdictions it is possible the model might not be copyrightable - but you'd have a bold legal department to rely on those arguments. It's also moderately likely that an instruction-tuned Llama (like Alpaca) would be copyrightable even in those jurisdictions.

TL;DR: Use the HuggingFace transformers library. You can experiment with Llama and switch to truly free models like GPT-J or anything new that arrives very easily.

[1] https://huggingface.co/docs/transformers/main/model_doc/llam...

[+] AmuVarma|3 years ago|reply

Llama by FB is under a non-commercial license not a GPL license, so I assume you are using a different base model, what model is that?

[+] charcircuit|3 years ago|reply

The reference inference code is GPL.

https://github.com/facebookresearch/llama/blob/main/LICENSE

[+] sp332|3 years ago|reply

This isn't a new model, it's just new code.

[+] leke|3 years ago|reply

I'm still confused about this. Does it require you to have a chatGPT API key for it to work?

[+] yewnork|3 years ago|reply

I see this as a win for the AI community. The key for LLMs is to enable people to train collaboratively and innovate more quickly in this space. Are there any examples or demos available that showcase the capabilities of "lit-llama"?

[+] unknown|3 years ago|reply

[deleted]

[+] unknown|3 years ago|reply

[deleted]

[+] theaniketmaurya|3 years ago|reply

I am in love with this implementation considering the ability to run on 8 GB VRAM and Apache 2.0 license.

[+] theaniketmaurya|3 years ago|reply

I am curious though how would the model weights work out?

[+] rasbt|3 years ago|reply

I guess that means time to fire up a few GPUs later today and get some weights! We should have a weight exchange platform for that maybe, haha.

[+] A4ET8a8uTh0|3 years ago|reply

You mean like a blockchain? I jest, but only a little.

52 comments