This. Exactly this. No sophisticated tokenization. No interesting architecture using attention. And the author is completely clueless about overfitting... and even cross entropy loss. He could have gotten better results just using a bag of words approach.
But this ends up on frontpage anyway. Welcome to HN.
objektif|6 years ago
codesushi42|6 years ago
You will overfit an NN trained on only 1000 examples.
Also a simple train/test split will tell you that. But the author failed to take any time to learn the basics before spewing out this drivel.