(no title)
deepdarkforest | 5 days ago
How granular can you get the source data attribution? Down to individual let's say Wikipedia topics? Probably not urls?
Would be interested to see this scale to 30/70b
deepdarkforest | 5 days ago
How granular can you get the source data attribution? Down to individual let's say Wikipedia topics? Probably not urls?
Would be interested to see this scale to 30/70b
rao-v|5 days ago
Having said that, I worry that you run into Illusion of Conscious issues where the model changes attrition from “sandbagging” to “unctuous” when you control its response because the response is generated outside of the attribution modules (I don’t quite understand how cleanly everything flows through the concept modules and the residual). Either way this is a sophisticated problem to have. Would love to see if this can be trained to parity with modern 8B models.
giang_at_glai|3 days ago
adebayoj|5 days ago