top | item 47196780

(no title)

Look, I'm upvoting your posts in this thread because you make some good points, but I'm not really convinced that a) synthetic data will result in good models, nor that b) quality synthetic data can be generated by labs outside of those orgs that have a ton of user-info.

This is why I say that OpenAI has no moat - even if synthetic data (however it is generated) is 90% of training data, there are still only two possibilities:

1. Orgs like Google, Microsoft and Amazon have a ton of user-data with which to produce synthetic data (after all, it's not produced out of thin air).

and

2. You don't need a ton of real data to seed the synthetic generation.

In the first case, yes, that looks like a moat, but not for OpenAI, more like for Google, etc al.

In the second case, what's to stop an upstart from producing their own synthetic training data?

In either case, companies who provide only tokens (OpenAI, Anthropic, etc) don't have a moat. The moat is still the same as it was in the 90s - companies deeply embedded into users' workflows.

In my memory, like I said, I struggle to think of even a few successful moats that were technology. The moat is always something else.

discuss

No comments yet.