top | item 43827323 (no title) mmoskal | 10 months ago Spec decoding only depends on the tokenizer used. It's transfering either the draft token sequence or at most draft logits to the main model. discuss order hn newest jasonjmcghee|10 months ago Could be an lm studio thing, but the qwen3-0.6B model works as a draft model for the qwen3-32B and qwen3-30B-A3B but not the qwen3-235B-A22B model jasonjmcghee|10 months ago I suppose that makes sense, for some reason I was under the impression that the models need to be aligned / have the same tuning or they'd have different probability distributions and would reject the draft model really often.
jasonjmcghee|10 months ago Could be an lm studio thing, but the qwen3-0.6B model works as a draft model for the qwen3-32B and qwen3-30B-A3B but not the qwen3-235B-A22B model
jasonjmcghee|10 months ago I suppose that makes sense, for some reason I was under the impression that the models need to be aligned / have the same tuning or they'd have different probability distributions and would reject the draft model really often.
jasonjmcghee|10 months ago
jasonjmcghee|10 months ago