It's important for general-purpose use: with generative models there's always a chance of hallucinations. For all uses except the specific adult-flavoured ones, you don't want the response to contain vulgarities. No one wants their company's chatbot to start narrating furry erotica. If trained on adult content, you would need to have a burdensome moderation layer downstream of the LLM.
When you do want the more niche adult themed LLMs, there are fine-tuning datasets available. Fine-tuning a vanilla open-source LLM for these uses works great. There are active communities of adult roleplay LLMs on imageboards.
"Why are tech companies so puritanical? Adult content is not immoral."
It's all about "optics" and PR. These companies don't want their brands associated with porn. That's why YouTube doesn't allow porn on their site, even though it would be enormously profitable.
Whether or not it’s immoral is opinion-based. It’s probably typically in order to not alienate those who think it is. Effectively a business decision even in the context of an open model.
Although not evil, adult content should be opt-in, and should be able to be opted-out at a platform level... hence, the need for censored models. Imagine a restaurant booking AI app, built on GPT, that accidentally doubled as a bomb-making tutor or an adult content generator. It's a lawsuit waiting to happen, if nothing else, and it's worth making these use cases harder (if not impossible) to implement in mainstream, commercially available products. Note that for many of these products, the age and consent for adult material has not been already established.
So far, the open source ecosystem seems to be doing a good job of providing both censored and uncensored LLMs - and it seems there are valid use cases for both.
Think of this as similar to Falcon LLM being launched in both 40B and smaller 7B variants - the LLM often will need to match the use case, and the 7B model is a good example of making the model smaller (and worse) on purpose in order to reach certain trade-offs.
And you know this because you’re the expert of knowing what is morality? Decades of people telling you it is immoral, yet somehow you come along to say the opposite and hope people believe it? The only justification you can give is an ad hominem attack accusing the other person of not knowing what they are talking about.
Haven't we seen that too much reinforcement to censor worsens the model, just like ignoring some data actually makes it worse in all other parts?
Even though this is quite bizarre on its surface - that ignoring for example works of fiction makes it worse at programming.
In that case a simple middle man agent that is inaccessible to the user would provide better quality while maintaining censorship that can even be dynamically and quickly redefined or extended.
Is there a guide out there for dummies on how to try a ChatGPT like instance of this on a VM cheaply? eg pay $1 or $2 an hour for a point and click experience with the instruct version of this. A docker image perhaps.
Reading posts on r/LocalLLAMA is people’s trial and error experiences, quite random.
I just tested both and it's pretty zippy (faster than AMD's recent live MI300 demo).
For llama-based models, recently I've been using https://github.com/turboderp/exllama a lot. It has a Dockerfile/docker-compose.yml so it should be pretty easy to get going. llama.cpp is the other easy one and the most recent updates put it's CUDA support only about 25% slower and generally is a simple `make` with a flag depending on which GPU you support you want and has basically no dependencies.
Take a look at youtube vids for this. Mainly because you're going to see people show all the steps when presenting instead of skipping them when talking about what they did. E.g. https://www.youtube.com/watch?v=KenORQDCXV0
A small cheap VPS won’t have the compute or RAM to run these. The best way (and the intent) is to run it locally. A fast box with at least 32GiB of RAM (or VRAM for a GPU) can run many of the models that work with llama.cpp. For this 40G model you will need more like 48GiB of RAM.
Apple Silicon is pretty good for local models due to the unified CPU/GPU memory but a gaming PC is probably the most cost effective option.
If you want to just play around and don’t have a box big enough then temporarily renting one at Hetzner or OVH is pretty cost effective.
I really like the fact that the leaderboards are almost identical when using claude or GPT-4 as evaluators.
If a less powerful model can be a good decider of the better answer between two more powerful models, it opens up a lot of research opportunities into perhaps using these evaluations as part of an automated reinforcement learning process.
Worth noting that according to the initial press release, they're also working on Falcon 180B, which would be the largest (and likely most effective) open source model by far.
Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.
[+] [-] LeoPanthera|2 years ago|reply
Why are tech companies so puritanical? Adult content is not immoral.
[+] [-] gillesjacobs|2 years ago|reply
When you do want the more niche adult themed LLMs, there are fine-tuning datasets available. Fine-tuning a vanilla open-source LLM for these uses works great. There are active communities of adult roleplay LLMs on imageboards.
[+] [-] Havoc|2 years ago|reply
>Adult content is not immoral.
[+] [-] pmoriarty|2 years ago|reply
It's all about "optics" and PR. These companies don't want their brands associated with porn. That's why YouTube doesn't allow porn on their site, even though it would be enormously profitable.
[+] [-] comfypotato|2 years ago|reply
[+] [-] moffkalast|2 years ago|reply
[+] [-] aaronsteers|2 years ago|reply
So far, the open source ecosystem seems to be doing a good job of providing both censored and uncensored LLMs - and it seems there are valid use cases for both.
Think of this as similar to Falcon LLM being launched in both 40B and smaller 7B variants - the LLM often will need to match the use case, and the 7B model is a good example of making the model smaller (and worse) on purpose in order to reach certain trade-offs.
[+] [-] nashashmi|2 years ago|reply
Try a little bit of academic knowledge here: https://m.youtube.com/watch?v=wSF82AwSDiU
[+] [-] 0898|2 years ago|reply
[+] [-] kossTKR|2 years ago|reply
Even though this is quite bizarre on its surface - that ignoring for example works of fiction makes it worse at programming.
In that case a simple middle man agent that is inaccessible to the user would provide better quality while maintaining censorship that can even be dynamically and quickly redefined or extended.
[+] [-] bpiche|2 years ago|reply
[+] [-] elahieh|2 years ago|reply
Reading posts on r/LocalLLAMA is people’s trial and error experiences, quite random.
[+] [-] lhl|2 years ago|reply
I just tested both and it's pretty zippy (faster than AMD's recent live MI300 demo).
For llama-based models, recently I've been using https://github.com/turboderp/exllama a lot. It has a Dockerfile/docker-compose.yml so it should be pretty easy to get going. llama.cpp is the other easy one and the most recent updates put it's CUDA support only about 25% slower and generally is a simple `make` with a flag depending on which GPU you support you want and has basically no dependencies.
Also, here's a Colab notebook that should let shows you run up to 13b quantized models (12G RAM, 80G disk, Tesla T4 16G) for free: https://colab.research.google.com/drive/1QzFsWru1YLnTVK77itW... (for Falcon, replace w/ Koboldcpp or ctransformers)
[+] [-] joshka|2 years ago|reply
[+] [-] ianpurton|2 years ago|reply
docker run -it --rm ghcr.io/purton-tech/mpt-7b-chat
It's a big download due to the model size i.e. 5GB. The model is quantized and runs via the ggml tensor library. https://ggml.ai/.
[+] [-] api|2 years ago|reply
Apple Silicon is pretty good for local models due to the unified CPU/GPU memory but a gaming PC is probably the most cost effective option.
If you want to just play around and don’t have a box big enough then temporarily renting one at Hetzner or OVH is pretty cost effective.
[+] [-] londons_explore|2 years ago|reply
[+] [-] sbierwagen|2 years ago|reply
[+] [-] binarymax|2 years ago|reply
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...
[+] [-] londons_explore|2 years ago|reply
If a less powerful model can be a good decider of the better answer between two more powerful models, it opens up a lot of research opportunities into perhaps using these evaluations as part of an automated reinforcement learning process.
[+] [-] mikeravkine|2 years ago|reply
[+] [-] brucethemoose2|2 years ago|reply
[+] [-] logicchains|2 years ago|reply
[+] [-] bioemerl|2 years ago|reply
[+] [-] jumpCastle|2 years ago|reply
[+] [-] moffkalast|2 years ago|reply
[+] [-] kristianp|2 years ago|reply
[+] [-] kytazo|2 years ago|reply
[+] [-] Risyandi94|2 years ago|reply
[+] [-] jrflowers|2 years ago|reply
[+] [-] orost|2 years ago|reply
It has problems but it does work
[+] [-] bestcoder69|2 years ago|reply
[+] [-] logicchains|2 years ago|reply
[+] [-] ilaksh|2 years ago|reply
[+] [-] SparkyMcUnicorn|2 years ago|reply
https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-...
[+] [-] antonmks|2 years ago|reply
[+] [-] SparkyMcUnicorn|2 years ago|reply
[+] [-] interlinked|2 years ago|reply
RAM or VRAM?
[+] [-] kiraaa|2 years ago|reply