top | item 37125589

(no title)

1wheel | 2 years ago

discuss

remilouf|2 years ago

LQML (and guidance https://github.com/guidance-ai/guidance) are much more inefficient. They loop over the entire vocabulary at each step, we only do it once at initialization.

potatoman22|2 years ago

Does looping over the vocabulary add much overhead to the tok/s? I imagine they're just checking if the input is in a set, and usually there's only ~30k tokens. That's somewhat intensive, but inference on the neural net feels like it'd take longer.