(no title)
empath-nirvana | 1 year ago
It sounds pretty obvious to say that the difference is whatever is different, but isn't that literally what both sides of this argument are saying?
edit: I do think that what the original linked essay is saying is slightly subtler than that, which is that _given_ that everyone is using the same transformer architecture, the exact hyperparameters and fine tuning that is done matters a lot less than the data set does.
No comments yet.