Today’s best LLMs mostly decode autoregressively from left-to-right, which gives great quality but is terribly slow. Diffusion LLM can decode many tokens in parallel thanks to their non-casual, any-order generation, but they must be trained from scratch, or expensively adapted from autoregressive (AR) checkpoints with a mismatched, non-casual diffusion objective; we find this mismatch often hurts quality and breaks many effective KV-cache related serving optimizations. This blog introduces Jacobi Forcing, a new training technique that converts LLMs into native casual parallel decoders. Jacobi forcing keeps the casual AR backbone and addresses the AR-to-diffusion mismatch by training the model to handle noisy future blocks along its own Jacobi decoding trajectories. This yields an AR model which behaves like a diffusion-style decoder—decoding multiple tokens per pass, but still from left to right—with up to 4.5x higher tokens-per-forward and 4x wall-clock speedup on coding and math, while retaining near-AR generation quality.
No comments yet.