Sorry, I probably phrased the question poorly. My question is more along the lines of "when you already scored e.g. OpenAI's o3 on ARC AGI 2 how did you guarantee OpenAI can't just look at its server logs to see question set 4"?
1. We had a no-data retention agreement with them. We were assured by the highest level of their company + security division that the box our test was run on would be wiped after testing
2. We only tested o3 against the semi-private set. We didn't test it with the private eval.
zamadatix|11 months ago
gkamradt|11 months ago
1. We had a no-data retention agreement with them. We were assured by the highest level of their company + security division that the box our test was run on would be wiped after testing
2. We only tested o3 against the semi-private set. We didn't test it with the private eval.