top | item 36376111

Falcon LLM – A 40B Model

178 points| Risyandi94 | 2 years ago |falconllm.tii.ae

87 comments

order
[+] LeoPanthera|2 years ago|reply
> to remove machine generated text and adult content

Why are tech companies so puritanical? Adult content is not immoral.

[+] gillesjacobs|2 years ago|reply
It's important for general-purpose use: with generative models there's always a chance of hallucinations. For all uses except the specific adult-flavoured ones, you don't want the response to contain vulgarities. No one wants their company's chatbot to start narrating furry erotica. If trained on adult content, you would need to have a burdensome moderation layer downstream of the LLM.

When you do want the more niche adult themed LLMs, there are fine-tuning datasets available. Fine-tuning a vanilla open-source LLM for these uses works great. There are active communities of adult roleplay LLMs on imageboards.

[+] Havoc|2 years ago|reply
Government of Abu Dhabi footed the bill on this one and they have a uhm unique take on:

>Adult content is not immoral.

[+] pmoriarty|2 years ago|reply
"Why are tech companies so puritanical? Adult content is not immoral."

It's all about "optics" and PR. These companies don't want their brands associated with porn. That's why YouTube doesn't allow porn on their site, even though it would be enormously profitable.

[+] comfypotato|2 years ago|reply
Whether or not it’s immoral is opinion-based. It’s probably typically in order to not alienate those who think it is. Effectively a business decision even in the context of an open model.
[+] moffkalast|2 years ago|reply
It was made in the UAE, I'm surprised it doesn't censor far more.
[+] aaronsteers|2 years ago|reply
Although not evil, adult content should be opt-in, and should be able to be opted-out at a platform level... hence, the need for censored models. Imagine a restaurant booking AI app, built on GPT, that accidentally doubled as a bomb-making tutor or an adult content generator. It's a lawsuit waiting to happen, if nothing else, and it's worth making these use cases harder (if not impossible) to implement in mainstream, commercially available products. Note that for many of these products, the age and consent for adult material has not been already established.

So far, the open source ecosystem seems to be doing a good job of providing both censored and uncensored LLMs - and it seems there are valid use cases for both.

Think of this as similar to Falcon LLM being launched in both 40B and smaller 7B variants - the LLM often will need to match the use case, and the 7B model is a good example of making the model smaller (and worse) on purpose in order to reach certain trade-offs.

[+] nashashmi|2 years ago|reply
And you know this because you’re the expert of knowing what is morality? Decades of people telling you it is immoral, yet somehow you come along to say the opposite and hope people believe it? The only justification you can give is an ad hominem attack accusing the other person of not knowing what they are talking about.

Try a little bit of academic knowledge here: https://m.youtube.com/watch?v=wSF82AwSDiU

[+] 0898|2 years ago|reply
TII is based in the United Arab Emirates.
[+] kossTKR|2 years ago|reply
Haven't we seen that too much reinforcement to censor worsens the model, just like ignoring some data actually makes it worse in all other parts?

Even though this is quite bizarre on its surface - that ignoring for example works of fiction makes it worse at programming.

In that case a simple middle man agent that is inaccessible to the user would provide better quality while maintaining censorship that can even be dynamically and quickly redefined or extended.

[+] elahieh|2 years ago|reply
Is there a guide out there for dummies on how to try a ChatGPT like instance of this on a VM cheaply? eg pay $1 or $2 an hour for a point and click experience with the instruct version of this. A docker image perhaps.

Reading posts on r/LocalLLAMA is people’s trial and error experiences, quite random.

[+] lhl|2 years ago|reply
For Falcon specifically, this is easy, it's embedded here: https://huggingface.co/blog/falcon#demo or you can access the demo here: https://huggingface.co/spaces/HuggingFaceH4/falcon-chat

I just tested both and it's pretty zippy (faster than AMD's recent live MI300 demo).

For llama-based models, recently I've been using https://github.com/turboderp/exllama a lot. It has a Dockerfile/docker-compose.yml so it should be pretty easy to get going. llama.cpp is the other easy one and the most recent updates put it's CUDA support only about 25% slower and generally is a simple `make` with a flag depending on which GPU you support you want and has basically no dependencies.

Also, here's a Colab notebook that should let shows you run up to 13b quantized models (12G RAM, 80G disk, Tesla T4 16G) for free: https://colab.research.google.com/drive/1QzFsWru1YLnTVK77itW... (for Falcon, replace w/ Koboldcpp or ctransformers)

[+] api|2 years ago|reply
A small cheap VPS won’t have the compute or RAM to run these. The best way (and the intent) is to run it locally. A fast box with at least 32GiB of RAM (or VRAM for a GPU) can run many of the models that work with llama.cpp. For this 40G model you will need more like 48GiB of RAM.

Apple Silicon is pretty good for local models due to the unified CPU/GPU memory but a gaming PC is probably the most cost effective option.

If you want to just play around and don’t have a box big enough then temporarily renting one at Hetzner or OVH is pretty cost effective.

[+] londons_explore|2 years ago|reply
Try $100/hour for big LLM's... And you're probably going to need a fleet of 16 machines unless you want to quantize it and do inference only.
[+] sbierwagen|2 years ago|reply
Doesn't do so great on the leaderboards: https://tatsu-lab.github.io/alpaca_eval/
[+] londons_explore|2 years ago|reply
I really like the fact that the leaderboards are almost identical when using claude or GPT-4 as evaluators.

If a less powerful model can be a good decider of the better answer between two more powerful models, it opens up a lot of research opportunities into perhaps using these evaluations as part of an automated reinforcement learning process.

[+] mikeravkine|2 years ago|reply
Its a pretty terrible model, I wouldn't use this for anything at all. Vicuna 1.1 outperforms it in all my tests.
[+] brucethemoose2|2 years ago|reply
That leaderboard does not line up with my personal experience with those models at all...
[+] logicchains|2 years ago|reply
Worth noting that according to the initial press release, they're also working on Falcon 180B, which would be the largest (and likely most effective) open source model by far.
[+] bioemerl|2 years ago|reply
No, don't tell me that, I'm going to need more graphics cards now.
[+] jumpCastle|2 years ago|reply
Not by far, there's bloom and opt if you count it.
[+] moffkalast|2 years ago|reply
And also the most unfeasibly impossible to run.
[+] Risyandi94|2 years ago|reply
Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
[+] jrflowers|2 years ago|reply
Has anybody gotten this running on consumer hardware ala llama or is that not in the cards?
[+] bestcoder69|2 years ago|reply
I've only seen people mention that it runs really slow, even on like A100s.
[+] logicchains|2 years ago|reply
llama.cpp just got Falcon support (not yet merged), so you could run it on just RAM. Not too fast though.
[+] antonmks|2 years ago|reply
When it comes to writing stories, this model is way behind ChatGpt 3.5
[+] SparkyMcUnicorn|2 years ago|reply
I don't have first hand experience, but I've heard that it performs really well at story writing with some fine tuning.
[+] interlinked|2 years ago|reply
> You will need at least 85-100GB of memory

RAM or VRAM?

[+] kiraaa|2 years ago|reply
you need 94gb, does not matter which RAM.