(no title)
albertzeyer | 4 months ago
I also had a very similar bug a while ago, broken gradients due to non-contiguous data for masked_select: https://github.com/pytorch/pytorch/issues/99638
In my case, it was easier to identify: I had another implementation of my loss function before that did not use masked_select. But then I thought I can be clever and use masked_select to take out the non-masked frames and calculate the loss only on those. But it wasn't working. Also, it only happened for some models, not for all. It turns out, it was always happening when the data coming out of the model was non-contiguous.
I think the bugs with non-contiguous data are not so uncommon. I wonder how much of that we still have.
No comments yet.