This is the most accurate and clear-eyed take I've seen on GPTs so far. They might be useful, but they're not magic and they're intended to enhance OpenAI's moat-building operations to make it harder for people/companies to walk away from the platform amid future competition.
visarga|2 years ago
Let's do an estimation - if they have 100M users and each of them generates 10K tokens in a month, that's 1T tokens per month. In a year they have generated 12T tokens, which is very close to the GPT-4 training set size of 13T. Looks like they can generate serious data with this method. They don't even need to train directly on it, they could rewrite it as high quality training examples, without copyright and PII risks, because LLMs are great at rewriting and rewording and MS has already shown that synthetic data is better.
Google lost the start and they don't have the human-AI chat logs OpenAI sits on. So they are trying to do the same trick but without the human in the loop. Hence the declarations that Gemini will use some techniques from AlphaZero. They are teaching models by feedback too.