top | item 42009503

(no title)

cjf101 | 1 year ago

If the current iteration of search engines are producing garbage results (due to an influx of garbage + SEO gaming their ranking systems) and LLMs are producing inaccurate results without any clear method proposed to correct them, why would combining the two systems not also produce garbage?

The problem I see with search is that the input is deeply hostile to what the consumers of search want. If the LLM's are particularly tuned to try and filter out that hostility, maybe I can see this going somewhere, but I suspect that just starts another arms race that the garbage producers are likely to win.

discuss

order

hatthew|1 year ago

Search engines tend to produce neutral garbage, not harmful garbage (i.e. small tidbits of data between an ocean of SEO fluff, rather than completely incorrect facts). LLMs tend to be inaccurate because in an absence of knowledge given by the user, it will sometimes make up knowledge. It's plausible to imagine that they will cover each other's weaknesses: the search engine produces an ocean of mostly-useless data, and the LLM can find the small amount of useful data and interpret that into an answer to your question.

lolinder|1 year ago

The problem I see with this "cover for each other" theory is that as it stands having a good search engine is a prerequisite to having good outputs from RAG. If your search engine doesn't turn up something useful in the top 10 (which most search engines currently don't for many types of queries) then your llm will just be summarizing the garbage that was turned up.

Currently I do find that Perplexity works substantially better then Google for finding what I need, but it remains to be seen if they're able to stay useful as a larger and larger portion of online content just AI generated garbage.

DrammBA|1 year ago

> Search engines tend to produce neutral garbage, not harmful garbage (i.e. small tidbits of data between an ocean of SEO fluff, rather than completely incorrect facts)

Wasn't google AI surfacing results about making pizza with glue and eating rocks? how is that not harmful garbage?

eviks|1 year ago

That's not a plausible imagination that such a prefect complement exists

faizshah|1 year ago

You just described the value proposition of RAG.

lottin|1 year ago

Maybe it's just me but I have no interest in having a computer algorithm interpret data for me. That's a job that I want to do myself.

fulafel|1 year ago

Garbage-ness of search results is not binary, the right question is: can LLMs improve the quality of search results? But sure, it won't end the cat and mouse game.

cjf101|1 year ago

I think that's the right broad question. Though LLMs properties mean that for some number of cases they will either make the results worse, or more confidently present wrong answers. This prompts the question: what do we mean by "quality" of results? Since the way current LLM interfaces tend to present results is quite different from traditional search.

startupsfail|1 year ago

The question is what is the business model and who pays for it, that determines how much advertising you’re getting. It is not clear if OpenAI could compete in Ad-supported search. So maybe OpenAI is trying to do the basic research, outcompete the Bing research group at Microsoft and then serve as an engine for Bing. Alternatively they could be just improving the ability of LLMs to do search, targeting future uses in agentic applications.

kevin_thibedeau|1 year ago

> it won't end the cat and mouse game.

There is no way to SEO the entire corpus of human knowledge. ChatGPT is very good for gleaning facts that are hard to surface in today's garbage search engines.

shellfishgene|1 year ago

If I can pretty quickly tell a site is SEO spam, so should the LLM, no? Of course that would just start a new round in the SEO arms race, but could work for a while.

sangnoir|1 year ago

> If I can pretty quickly tell a site is SEO spam, so should the LLM, no?

Why would you assume that?

mplewis|1 year ago

The LLM is not a human and cannot distinguish between spam and high quality content.

valval|1 year ago

I’d be more cynical still and ask, where is correct information found in the first place? Humans of all shape and size have biases. Most research is faulty, fabricated, or not reproducible. Missing information tells a greater story than existing one.

We don’t have a way of finding objective information, why would we be able to train a model to do so?

realusername|1 year ago

Right now I basically can't find anything, the bar isn't "objective information" but "somewhat useful information". Google search quality became so bad we're past the debate of objective or subjective already, I'd be happy enough to get non-spam results.

qudat|1 year ago

Try perplexity and then come back and tell us how you feel