(no title)
joha4270 | 10 days ago
> to get the first N tokens sorted, only when the big model and small model diverge do you infer on the big model
suggests there is something I'm unaware of. If you compare the small and big model, don't you have to wait for the big model anyway and then what's the point? I assume I'm missing some detail here, but what?
connorbrinton|10 days ago
More info:
* https://research.google/blog/looking-back-at-speculative-dec...
* https://pytorch.org/blog/hitchhikers-guide-speculative-decod...
sails|10 days ago
https://research.google/blog/speculative-cascades-a-hybrid-a...
speedping|10 days ago
vanviegen|10 days ago
cma|10 days ago
ml_basics|10 days ago