Curious about the masked diffusion IDM choice. They mention CTC loss and cross-entropy both underperformed — I'd love to see ablations on that. The claim that typos were "extremely common" with non-causal cross-entropy is interesting but hand-wavy without numbers.
nee1r|7 days ago