(no title)
byschii | 1 year ago
https://arxiv.org/abs/2412.14093 (Alignment faking in large language models)
https://joecarlsmith.com/2024/12/18/takes-on-alignment-fakin...
PS I m definitely not an expert
byschii | 1 year ago
https://arxiv.org/abs/2412.14093 (Alignment faking in large language models)
https://joecarlsmith.com/2024/12/18/takes-on-alignment-fakin...
PS I m definitely not an expert
numba888|1 year ago
Final text is only a small part of model's thinking. It's produced from embeddings which probably have much more in them. Each next token depends not only on previous, but all the intermediate values for all tokens. We don't know them, they are actually important and represent inner 'thinking'. So, LLM is still a black box. The result is usually A because of B. Sort of explanation for A, but where B came from we can only guess.
achierius|1 year ago
jononor|1 year ago
winwang|1 year ago
swagmoney1606|1 year ago
IshKebab|1 year ago
nowittyusername|1 year ago
patcon|1 year ago
My current thinking is that I would support a ban on this style of research. Really hard to set lines for regulation, but this feels like an easy and intuitive place to exercise caution