top | item 46926447

(no title)

data_maan | 22 days ago

> these are problems of some practical interest, not just performative/competitive maths.

FrontierMath did this a year ago. Where is the novelty here?

> a solution is known, but is guaranteed to not be in the training set for any AI.

Wrong, as the questions were poses to commercial AI models and they can solve them.

This paper violates basic benchmarking principles.

discuss

offnominal|22 days ago

> Wrong, as the questions were poses to commercial AI models and they can solve them.

Why does this matter? As far as I can tell, because the solution is not known this only affects the time constant (i.e. the problems were known for longer than a week). It doesn't seem that I should care about that.

data_maan|22 days ago

Because the companies have the data and can solve them -- so providing the question to a company with the necessary manpower, one cannot guarantee anymore that the solution is not known, and not contained in the training sample.