marcos here (one of the authors). i know the word "breakthrough" in the title is a "little" ambitious, but i really think we've done something interesting ... we'd like to publish so this is a way to collect questions/comments! soot away.
the memory blows up with the length of encoder sequence. for that reason we truncate the email at ~300 tokens, which is for the vast majority of cases enough to capture the relevant info. other than that we don't get rid of any "garbage" lines. instead, we let the NN (eg. the attention layer) figure out which lines are irrelevant
marjimbel|6 years ago
wistfully|6 years ago
marjimbel|6 years ago