top | item 43466626

(no title)

gkamradt | 11 months ago

#4 (private test set) doesn't get used for any public model testing. It is only used on the Kaggle leaderboard where no internet access is allowed.

discuss

order

zamadatix|11 months ago

Sorry, I probably phrased the question poorly. My question is more along the lines of "when you already scored e.g. OpenAI's o3 on ARC AGI 2 how did you guarantee OpenAI can't just look at its server logs to see question set 4"?

gkamradt|11 months ago

Ah yes, two things

1. We had a no-data retention agreement with them. We were assured by the highest level of their company + security division that the box our test was run on would be wiped after testing

2. We only tested o3 against the semi-private set. We didn't test it with the private eval.