top | item 39361365

(no title)

renchuw | 2 years ago

Fair question.

Evaluate refers to the phase after training to check if the training is good.

Usually the flow goes training -> evaluation -> deployment (what you called inference). This project is aimed for evaluation. Evaluation can be slow (might even be slower than training if you're finetuning on a small domain specific subset)!

So there are [quite](https://github.com/microsoft/promptbench) [a](https://github.com/confident-ai/deepeval) [few](https://github.com/openai/evals) [frameworks](https://github.com/EleutherAI/lm-evaluation-harness) working on evaluation, however, all of them are quite slow, because LLM are slow if you don't have infinite money. [This](https://github.com/open-compass/opencompass) one tries to speed up by parallelizing on multiple computers, but none of them takes advantage of the fact that many evaluation queries might be similar and all try to evaluate on all given queries. And that's where this project might come in handy.

discuss

observationist|2 years ago

Your explanations are still unclear.

I know what evaluation is, and inference, and training. Deployment means to deploy - to put a model in production. It does not mean inference. Inference means to input a prompt into a model and get the next token, or tokens as the case may be. Training and inference are closely related, since during training, inference is run and the error given by the difference between the prediction and target is backpropagated, etc.

Evaluation is running inference over a suite of tests and comparing the outcomes to some target ideal. An evaluation on the MMLU dataset lets you run inference on zero and few shot prompts to test the knowledge and function acquisition of your model, for example.

So is your code using Bayesian Optimization to select a subset of a corpus, like a small chunk of the MMLU dataset, that is representative of the whole, so you can test on that subset instead of the whole thing?