(no title)
mufasachan | 2 years ago
I agree that the author of the tweet fairly underestimates the potential portion of OCR'ed contents in OpenAI's training data. In late August, Nougat[1] is released by Meta, this is an OCR model. Its performance are wild and the model is open source.
I hardly believe that OpenAI does not spend effort on getting more training from OCR content. I also hardly believes that OpenAI waits for a Meta paper to have an internal performant OCR model.
No comments yet.