Here is the big question: should it be equal or better then every single person? If we assume that every healthy person is 'generally intelligent' then probably this is a benchmark. Because not every person can do the tasks that other persons do routinely. Probably we shouldn't demand it from AGI either. At least not from a single model. But it makes sense to request that specialized model can be created (or trained, fine tuned) for every task humans can do.
No comments yet.