(no title)
jncraton | 1 year ago
> ANPD dynamically generates draft outputs via an adaptive N-gram module using real-time statistics, after which the drafts are verified by the LLM. This characteristic is exactly the difference between ANPD and the previous speculative decoding methods.
ANPD does provide a more general-purpose solution to drafting that does not require training, loading, and running draft LLMs.
MacsHeadroom|1 year ago
eshoyuan|1 year ago