top | item 42843985

(no title)

fairity | 1 year ago

I think it's worth double clicking here. Why did Google have significantly better search results for a long time?

1) There was a data flywheel effect, wherein Google was able to improve search results by analyzing the vast amount of user activity on its site.

2) There were real economies of scale in managing the cost of data centers and servers

3) Their advertising business model benefited from network effects, wherein advertisers don't want to bother giving money to a search engine with a much smaller user base. This profitability funded R&D that competitors couldn't match.

There are probably more that I'm missing, but I think the primary takeaway is that Google's scale, in and of itself, led to a better product.

Can the same be said for OpenAI? I can't think of any strong economies of scale or network effects for them, but maybe I'm missing something. Put another way, how does OpenAI's product or business model get significantly better as more people use their service?

discuss

nyrikki|1 year ago

You are forgetting a bit, I worked in some of the large datacenters where both Google and Yahoo had cages.

1) Google copied the hotmail model of strapping commodity PC components to cheap boards and building software to deal with complexity.

2) Yahoo had a much larger cage, filled with very very expensive and large DEC machines, with one poor guy sitting in a desk in there almost full time rebooting the systems etc....I hope he has any hearing left today.

3) Just right before the .com crash, I was in a cage next to Google's racking dozens of brand new Netra T1s, which were pretty slow and expensive...that company I was working for died in the crash.

Look at Google's web page:

https://www.webdesignmuseum.org/gallery/google-1999

Compare that to Yahoo:

https://www.webdesignmuseum.org/gallery/yahoo-in-1999

Or the company they originaly tried to sell google to Excite:

https://www.webdesignmuseum.org/gallery/excite-2001

Google grew to be profitable because they controlled costs, invested in software vs service contracts and enterprise gear, had a simple non-intrusive text based ad model etc...

Most of what you mention above was well after that model focused on users and thrift allowed them to scale and is survivorship bias. Internal incentives that directed capitol expenditures to meet the mission vs protect peoples back was absolutely a related to their survival.

Even though it was a metasearch, my personal preference was SavvySearch until it was bought and killed or what ever that story way.

OpenAI is far more like Yahoo than Google.

WalterBright|1 year ago

> I hope he has any hearing left today

I opted for a fanless graphics board, for just that reason.

rayval|1 year ago

In theory, the more people use the product, the more OpenAI knows what they are asking about and what they do after the first result, the better it can align its model to deliver better results.

A similar dynamic occurred in the early days of search engines.

visarga|1 year ago

I call it the experience flywheel. Humans come with problems, AI asistant generates some ideas, human tries them out and comes back to iterate. The model gets feedback on prior ideas. So you could say AI tested an idea in the real world, using a human. This happens many times over for 300M users at OpenAI. They put a trillion tokens into human brains, and as many into their logs. The influence is bidirectional. People adapt to the model, and the model adapts to us.. But that is in theory.

In practice I never heard OpenAI mention how they use chat logs for improving the model. They are either afraid to say, for privacy reasons, or want to keep it secret for technical advantage. But just think about the billions of sessions per month. A large number of them contain extensive problem solving. So the LLMs can collect experience, and use it to improve problem solving. This makes them into a flywheel of human experience.

aurareturn|1 year ago

They have more data on what people want from models?

Their SOTA models can generate better synthetic data for the next training run - leading to a flywheel effect?