top | item 43465360

(no title)

gkamradt | 11 months ago

We have a few sets:

1. Public Train - 1,000 tasks that are public 2. Public Eval - 120 tasks that are public

So for those two we don't have protections.

3. Semi Private Eval - 120 tasks that are exposed to 3rd parties. We sign data agreements where we can, but we understand this is exposed and not 100% secure. It's a risk we are open to in order to keep testing velocity. In theory it is very difficulty to secure this 100%. The cost to create a new semi-private test set is lower than the effort needed to secure it 100%.

4. Private Eval - Only on Kaggle, not exposed to any 3rd parties at all. Very few people have access to this. Our trust vectors are with Kaggle and the internal team only.

discuss

order

zamadatix|11 months ago

What prevents everything in 4 from becoming a part of 3 the first time the test set is run on a proprietary model, do you require competitors like OpenAI provide models Kaggle can self host for the test?

gkamradt|11 months ago

#4 (private test set) doesn't get used for any public model testing. It is only used on the Kaggle leaderboard where no internet access is allowed.