subho406 | 6 years ago | on: Show HN: OmniNet:- A unified architecture for multi-modal multi-task learning
subho406's comments
subho406 | 6 years ago | on: Show HN: OmniNet:- A unified architecture for multi-modal multi-task learning
subho406 | 6 years ago | on: OmniNet: A unified architecture for multi-modal multi-task learning
subho406 | 6 years ago | on: OmniNet: A unified architecture for multi-modal multi-task learning
subho406 | 7 years ago | on: Text Normalization using Memory Augmented Neural Networks
The memory requirements of DNC is quite high. We used GTX 1060 for training. Increasing the context window anything more than 3 increases the sequence length by a huge amount, causing memory problems. However, we also found that DNC works quite well even on small batch size. We used a batch size of 16 for all our experiments. The training time for a batch size of 16, context window of size 3 and 200k steps is 48h on a GTX 1060 system.
subho406 | 7 years ago | on: Text Normalization using Memory Augmented Neural Networks
We modified the sentence to say, "An earlier version of the approach used here has secured the 6th position in the Kaggle Russian Text Normalization Challenge by Google's Text Normalization Research Group".
subho406 | 7 years ago | on: Text Normalization using Memory Augmented Neural Networks
subho406 | 7 years ago | on: Text Normalization using Memory Augmented Neural Networks
subho406 | 7 years ago | on: Text Normalization using Memory Augmented Neural Networks
On the other hand, as mentioned when comparing text normalization systems it is more important to look at the exact kinds of errors made by the system (not only the overall accuracy). Our model showed improvement over the baseline model in https://arxiv.org/abs/1611.00068. DNC showed improvement in certain semiotic classes such as DATE, CARDINAL and TIME making zero unacceptable predictions in these classes, LSTM was susceptible to these kinds of mistakes even when a lot of training data was available. Yes, we do not use internal computation steps, the model replaces a standard LSTM in a seq-to-seq model with a DNC. However thanks for the suggestions it would be interesting to see the performance improvements if the internal computation steps are increased.