top | item 45715575

(no title)

albertzeyer | 4 months ago

The bug was with non-contiguous data in tensors.

I also had a very similar bug a while ago, broken gradients due to non-contiguous data for masked_select: https://github.com/pytorch/pytorch/issues/99638

In my case, it was easier to identify: I had another implementation of my loss function before that did not use masked_select. But then I thought I can be clever and use masked_select to take out the non-masked frames and calculate the loss only on those. But it wasn't working. Also, it only happened for some models, not for all. It turns out, it was always happening when the data coming out of the model was non-contiguous.

I think the bugs with non-contiguous data are not so uncommon. I wonder how much of that we still have.

discuss

order

No comments yet.