(no title)
giang_at_glai | 4 days ago
This post shows “concept algebra” on language model: inject, suppress, and compose human-understandable concepts at inference time (no retraining, no prompt engineering).
There’s an interactive demo on the post.
Would love feedback on: (1) what steering tasks you’d benchmark, (2) failure cases you’d want to see, (3) whether this kind of compositional control is useful in real products.
anon291|3 days ago
The suppression bit is very powerful. I would like to see a quantification of how often a steered 'normal' language model will mention things you asked it to suppress vs how often this one does
giang_at_glai|3 days ago
If you have joined our waitlist, we will notify you as soon as it is available.
didgeoridoo|3 days ago
luulinh90s|2 days ago
We haven’t published the concept dictionary yet.
We plan to release it in soon with other important artifacts.