I mean that maybe gradient descent is a passable sorting algorithm, once the weights have been learned to properly describe ordering. It may be a speciality of transformers that they can sort things well. Which wouldn’t tell us that much about whether they are mentalists or not.
manmal|1 year ago