top | item 21456391 (no title) fermenflo | 6 years ago I agree, a lot of the code could be improved. But some of what you mentioned is fairly standard. Like "Gaussian Error Linear Units being GELU, w/b for weights/biases, etc... discuss order hn newest jsinai|6 years ago Not sure how standard that is ... steve_musk|6 years ago It’s very standard ML abbreviations.
jsinai|6 years ago Not sure how standard that is ... steve_musk|6 years ago It’s very standard ML abbreviations.
jsinai|6 years ago
steve_musk|6 years ago