top | item 47097698

(no title)

ssivark | 8 days ago

> speculative decoding for bread and butter frontier models. The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious

Can we use older (previous generation, smaller) models as a speculative decoder for the current model? I don't know whether the randomness in training (weight init, data ordering, etc) will affect this kind of use. To the extent that these models are learning the "true underlying token distribution" this should be possible, in principle. If that's the case, speculative decoding is an elegant vector to introduce this kind of tech, and the turnaround time is even less of a problem.

discuss

No comments yet.