(no title)
jbay808 | 1 year ago
As for avoiding certain cases, that could be done to some extent. But remember that the untrained transformer has no preconception of numbers or ordering (it doesn't use the hardware ALU or integer data type) so there has to be enough data in the training set to learn 0<1<2<3<4<5<6, etc.
dartos|1 year ago
This is the kind of thing I’d want it to generalize.
If I avoid having 2 and 6 in the same unsorted list in the training set, will sets containing those numbers be correctly sorted in the same list in the test set and at the same rate as other lists.
My intuition is that, yes, it would. But it’d be nice to see and would be a clear demonstration of the ability to generalize at all.