reminds me of the anthropic's recent work on identifying the neuron sets that correlate to various semantic concepts in Claude: https://news.ycombinator.com/item?id=40429540 "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet"
szvsw|1 year ago
https://openai.com/index/extracting-concepts-from-gpt-4/
https://news.ycombinator.com/item?id=40599749
cabidaher|1 year ago