top | item 35491250

(no title)

marcfisc | 2 years ago

Author here. Happy to answer any questions you have.

discuss

What's the cost like? I looked at doing something similar but if you want to use the better trained OpenAI models it doesn't seem so easy to control things on a per token level without racking up large bills. Every time you stop the model so you can impose logit biases on the next set of tokens you have to restart the inference process from scratch, so cost ends up being multiplicative in the number of options. Also the four stop options limit seemed like a pain.

For stuff like llama.cpp where you can control the inference loop directly yes, this sort of thing can make sense albeit maybe more as an API rather than a programming language, but for OpenAI where you can't interact with the loop as it runs it feels like it'd get expensive really fast. I guess for research that doesn't matter?

lbeurerkellner|2 years ago

(Another LMQL author here)

Cost is definitely a dimension we are considering (research has limited funding after all :) ), especially with the OpenAI API. Lock-step token-level control is difficult to implement with the very limited OpenAI API. As a solution to this, we implement speculative execution, allowing us to lazily validate constraints against the generated output, while still failing early if necessary. This means, we don't re-query the API for each token (very expensive), but rather can do it in segments of continuous token streams, and backtrack where necessary,

This is still more expensive than doing all in one request, but it is an inherent limitation of the OpenAI API, and not LMQL. On the upside, you gain more control, scripting and constraints, even with OpenAI models.

Ideally, some program representation of a scripted prompt like LMQL queries could be send over to the inference service, and be executed locally with full model access. This way, model vendors would not have to expose their models fully (e.g. to protect against distillation), but API users would gain a lot more control and efficiency. Alternatively, of course, better open source models with full access to logits are the ultimate solution, which is also the context in which LMQL was initially conceived in.