I tried it both ways; with individual prompts and prompts in bulk. I ran both tests the same way. There's a tradeoff in writing a legible/interesting blog post and relating step-by-step the way the evaluation was ran! Appreciate you reading and the feedback :)
No comments yet.