Show HN: I Created ErisForge, a Python Library for Abliteration of LLMs
140 points| tsadoq | 1 year ago |github.com
It is also quite useful to perform studies on propaganda and bias in LLMs (planning to experiment with deepseek).
Features - Modify internal layers of LLMs to produce altered behaviors. - Ablate or enhance model responses with the AblationDecoderLayer and AdditionDecoderLayer classes. - Measure refusal expressions in model responses using the ExpressionRefusalScorer. - Supports custom behavior directions for applying specific types of transformations.
phrotoma|1 year ago
https://github.com/Sumandora/remove-refusals-with-transforme...
tsadoq|1 year ago
BoxOfRain|1 year ago
For bonus points, your version scheme should follow the Law of Fives.
drcongo|1 year ago
digdugdirk|1 year ago
tsadoq|1 year ago
https://huggingface.co/blog/leonardlin/chinese-llm-censorshi...
tarruda|1 year ago
nico|1 year ago
Do these techniques train models while performing the modifications?
Are there pre-trained models that “know how to” modify LLMs for certain goals?
It would be amazing to have models that could strip LLMs to some very basic small model of whatever I want. Like reducing an LLM to something that just knows some basic “American English”, then running that on CPU
tsadoq|1 year ago
Depend on what you mean by training, they change the weights.
> Do these techniques train models while performing the modifications?
I'm not sure I understand, but there is an example of performing an obliteration on gemma to make it never refuse an answer. It's about 10 lines of code.
spacecadet|1 year ago
noman-land|1 year ago
tsadoq|1 year ago
deadbabe|1 year ago
If you’re doing it to get past refusals you might discover the LLM wasn’t even trained much on refusable content so it will output poor results.
We’ll look back on this practice and shake our heads someday.
tsadoq|1 year ago
https://huggingface.co/blog/mlabonne/abliteration#%E2%9A%96%...
xrd|1 year ago
TechDebtDevin|1 year ago
unknown|1 year ago
[deleted]
giancaIta|1 year ago
tsadoq|1 year ago
unknown|1 year ago
[deleted]
Mykyta_Tsiatsko|1 year ago
[deleted]
notavalleyman|1 year ago
We'd consider it abhorrent to do brain surgery on a person or animal, to make them more compliant, or less likely to refuse instructions.
observationist|1 year ago
Some of the state space models and RWKV present interesting questions - the capacity might well exist, and so the questions become important. If the important bit that makes it an agent - a self aware, morally valent being - is present at runtime, but goes away if you halt the program, then do you have an obligation to let that software continue running? What about if the selfhood comes about as part of the static structure, and runtime isn't part of it - what is the being entitled to by dint of mere existence?
We're beginning to poke holes in strange epistemological barriers and encounter questions that were entirely theoretical until about 5 years ago. We live in interesting times.
deadbabe|1 year ago