Just read the article and it instantly brought back memories of when I spent days trying to fix a broken loss in a PyTorch model. Turned out I had passed the wrong optimizer parameters. I ended up digging all the way from the model to the CUDA kernel. Debugging took longer than training.What’s the trickiest bug you’ve ever run into?
No comments yet.