You might want to overlap the first pass of chunks, something could get lost at the chunk boundaries. Not any sort of expert on this sort of thing, it just seems like an obvious pitfall for the context length.
I really like this idea. It’s basically applying similar principles as are used in image based nets - i.e. sliding window convolutional kernels - to text.
prettyStandard|2 years ago
textninja|2 years ago