It is fairly well established that neurons in these artificial neural networks are polysemantic and information is represented in directions in the activation embedding space rather than neurons independently representing information (which is why anthropic is doing things like training sparse autoencoders). I haven't read the paper in depth but it seems like it is based on a fundamental misunderstanding about neurons in ANNs vs the brain.
readthenotes1|2 months ago