About having differents models challenging each other, I haven’t seen anything useful yet but I understand where you are going. Might be a future direction
I have in mind the following paper. It is called Self-Taught Evaluators (https://arxiv.org/pdf/2408.02666)by Meta . It is interesting as they get big improvements from LLM checking and improving solution. WDYT ? I don't know if you could generate an PR using AI with let's say Claude and then check the quality by using chatgpt or gemini.. I would be interested by knowing if that would provide quality and more trust or the opposite
anAiguy|1 year ago