top | item 20778264

(no title)

ccccppppp | 6 years ago

Noob question: I have some 1D conv net for financial time series prediction. Could a transformer architecture be better for this task, is it worth a try?

discuss

order

hadsed|6 years ago

If you think a longer context length might be helpful consider stacking convolutions to give higher units a bigger receptive field, or try the convolutional LSTM. If that helps and you have a further argument for why an even larger context window would be helpful then perhaps try attention and in that case a Transformer would be reasonable. But your stacked conv net would be the fastest and most obvious thing that should work (with the caveat that I know nothing else about your data and it's characteristics, which is a really big caveat).

Consider looking at your errors and judging whether they stem from things your current model doesn't do well but that Transformers do, i.e., correlating two steps in a sequence across a large number of time steps. Attention is basically a memory module, so if you don't need that it's just a waste of compute resources.

ccccppppp|6 years ago

Thanks for the insight, also for mentioning convolutional LSTM, I wasn't aware such a thing existed.

> Attention is basically a memory module, so if you don't need that it's just a waste of compute resources.

But aren't CNNs also like a memory module (ie: they memorize how leopard skin looks like)? I guess attention is a more sophisticated kind of memory, "more dynamic" so to speak.

Anyway, I'm glad to hear that a transformer architecture isn't totally stupid for my task, I will look up the literature, there seems to be a bit on this matter.