top | item 33910863

Show HN: Web search using a ChatGPT-like model that can cite its sources

318 points| rushingcreek | 3 years ago |beta.sayhello.so | reply

We’ve trained a generative AI model to browse the web and answer questions/retrieve code snippets directly. Unlike ChatGPT, it has access to primary sources and is able to cite them when you hover over an answer (click on the text to go to the source being cited). We also show regular Bing results side-by-side with our AI answer.

The model is an 11-billion parameter T5-derivative that has been fine-tuned on feedback given on hundreds of thousands of searches done (anonymously) on our platform. Giving the model web access lessens its burden to need to store a snapshot of human knowledge within its parameters. Rather, it knows how to piece together primary sources in a natural and informative way. Using our own model is also an order of magnitude cheaper than relying on GPT.

A drawback to aligning models to web results is that they are less inclined to generate complete solutions/answers to questions where good primary sources don’t exist. Answers generated without underlying citable sources can be more creative but are prone to errors. In the future, we will show both types of answers.

Examples:

https://beta.sayhello.so/search?q=set+cookie+in+fastapi

https://beta.sayhello.so/search?q=What+did+Paul+Graham+learn...

https://beta.sayhello.so/search?q=How+to+get+command+line+pa...

https://beta.sayhello.so/search?q=why+did+Elon+Musk+buy+twit...

Would love to hear your thoughts.

185 comments

order
[+] dvh|3 years ago|reply
It gives answer that vaguely look like answer should look like but all the important details are wrong.

> A solar eclipse is a partial eclipse of the Sun caused by a solar coronal mass ejection . A solar coronal mass ejection occurs when Earth passes directly in front of the sun.

> You will also need to calculate number of seconds it takes for earth to rotate around the sun which is calculated as Math.PI * 2 / 60

[+] tobr|3 years ago|reply
> how do i mix two colors (hexadecimal rgb string) in js

> ANSWER

> To mix two colors in JavaScript, you can use the color-mix property. This property takes three parameters: the colorspace, the color to be mixed, and the percentage of that color to mix. For example, to mix the colors #FFFFFF and #8FBC8, you would use the following code:

> let newColor = document.createElement;

> // Outputs: #8FBC8

This is all just wrong from beginning to end. It seems to be confusing JS and CSS, too.

[+] dannywarner|3 years ago|reply
They did a Launch HN recently: Launch HN: Hello (YC S22) – A search engine for developers'[1]

Nothing much has changed since. So they appear to be trying to cash in on the interest in ChatGPT.

Interesting they didn't include that they are backed by Y Combinator in the recent S23 cohort. Is being backed by YC a negative for startups here now?

[1] https://news.ycombinator.com/item?id=32003215

[+] acqq|3 years ago|reply
Yes, the answer I got was nonsensical too:

> what is the diameter of the sun in sun radii

> ANSWER

> The diameter of the sun in solar radii is 1,392,000 km. This is calculated by taking the mean radius of the sun, which is 432,450 miles , and multiplying it by the circumference of the sun, which is 2,715,396 miles .

The correct answer is, of course: 2.

BTW: Google is also confused, but not in the same way, returning:

2.0018 R(sun)

[+] isthispermanent|3 years ago|reply
There's a reason why you can't do this commercially and why Google isn't doing it already... Pulling the meat of the content from a site like StackOverflow ends up as a copyright/anti-trust violation.

I'm fairly certain that this was the reason why Google had to tamp down it's rich results that were made mostly from Wikipedia entries.

More recently it was shopping...

"Google argues that ‘rich results’ in Search provide more direct experience in antitrust suit response": https://9to5google.com/2020/12/17/google-search-antitrust-re...

"Google loses appeal, faces €2.4 billion shopping antitrust fine": https://arstechnica.com/gadgets/2021/11/google-loses-appeal-...

[+] moralestapia|3 years ago|reply
You're confused, copyright != antitrust violations.

Both sources you provide have zero mentions of the word 'copyright' in them.

Those lawsuits have to do with Google's dominating the search market and using that to their advantage in ways that are allegedly unfair.

Copyright law actually allows a service like Google to exist in the first place.

[+] isthispermanent|3 years ago|reply
Thinking more on this... I don't think any of these sites will live if they get big enough. And if enough of them pop up it'll draw tons of attention from content sites.

If you want to show that data you'll end up having to work out a license from StackOverflow. Possible, but far more difficult than the current ease of plug-and-play GPT drop-in.

Do we really think Google hasn't thought of this exact thing already?

[+] freediver|3 years ago|reply
The reason is likely simpler:

- It is expensive (~0.5c per generated answer)

- It is (currently) slow (2-3 seconds to result)

- It is hard to place ads inside direct answers (probably the most important)

[+] lossolo|3 years ago|reply
What's funny is that most of this ground breaking LLMs you see now are based on Google published research about transformers, and they have better performing models in house than anything publicly available on the market.
[+] notpushkin|3 years ago|reply
Note that pulling the meat of the content from StackOverflow isn't copyright violation though, as long as you follow the license (which is Creative Commons something-something but probably fine for this particular application).
[+] worldsavior|3 years ago|reply
But it's siting the sources, how is it a copyright violation?
[+] MuffinFlavored|3 years ago|reply
> Pulling the meat of the content from a site like StackOverflow ends up as a copyright/anti-trust violation.

Then how did ChatGPT do it?

[+] nemacol|3 years ago|reply
I am amused that, once upon a time, it was a not uncommon joke that an inept user might sit at their computer and type into a search bar "Please take me to my email" or ask their search engine "I would like a recipe for cookies".

Turns out those folks might have been ahead of their time. :D

[+] adamsmith143|3 years ago|reply
>I would like a recipe for cookies"

This has been doable on Google for years. Maybe not the same kind of intent understanding but enough keywords are there that you would end up getting lots of results for cookie recipes.

[+] lucb1e|3 years ago|reply
> "Please take me to my email"

Results in

"To take you to your email, you will need to sign in to your account. Once you are signed in, open your inbox to check your mail. You can also use the Help tab to contact customer support or see training videos. Additionally, you can right-click an email in the Message List to show more message options and right-click the Reading Pane for additional email commands such as translation. Finally, you can use Search to find other commands or search through your email, contacts, and calendar."

Citing resources from google and microsoft support.

[+] wayy|3 years ago|reply
yep, it's super interesting to watch how cheap and plentiful compute starts to change the way we live
[+] notafraudster|3 years ago|reply
I asked three clearly stated, graduate level questions in my field and all three answers were useless.

> what is the relationship between commodity prices and civil war? answer in a way that cites published scholarship.

> The relationship between commodity prices and civil war is complex. Research has found that a negative, one-standard-deviation commodity price shock in a single year can increase the risk of civil war outbreak in Sub-Saharan Africa by between 40% and 70% of the baseline risk of civil war outbreak. This holds true for all large countries in Africa, the Middle East, Latin America, and Asia. Additionally, there is evidence to suggest that a rise in international commodity prices may also lead to an increase in civil wars. For example, research published in the May 2010 issue of the Economic Journal found that a 10% increase in the price of international commodities was associated with a 22% increase in the probability of civil war outbreak in African countries that are big exporters. Furthermore, research from Oeindrila Dube and Juan F. Vargas of Universidad del Rosario concluded that “price shocks generate contradictory pressures on conflict.

This definitely synthesized information, but it's nonsense. The beginning of the thesis is that a negative price shock increases civil war risk. Then it suggests a positive price shock increases civil war risk. Then it concludes that price shocks generate contradictory pressures on conflict. Dube and Vargas is a famous paper, so it was correct to cite it.

If the answer is that research is mixed, specify that. If the answer is that the research is unambiguous but it depends on the location or commodity type, the answer should disambiguate.

> are ideologically extreme candidates punished in u.s. house of representatives elections? what about primary elections?

> No, ideologically extreme candidates are not punished in U.S. House of Representatives elections. In fact, there is evidence to suggest that extremism may actually be rewarded in some cases. For example, the landslide defeats of Barry Goldwater and George McGovern in presidential politics have been seen as evidence that American electorates punish extremism. However, this view is thinly supported by systematic evidence. Additionally, there will be at least one special election to the United States House of Representatives in 2023 during the 118th United States Congress. This suggests that ideologically extreme candidates may actually benefit from these elections.

Another nonsense answer. The first sentence appears to gesture to the finding in the Zaller et. al research linked. Then it plagiarizes another sentence from the abstract of that paper. Then it introduces a completely nonsense non sequitur and uses it in support of a point that is not in evidence. It never answers the second question at all. The papers linked are all relevant, but not sorted by any useful metric and there are major papers not listed.

> what are the disadvantages of using two-way fixed effects to measure treatment effects in a natural experiment

> The two-way fixed effects regression is a popular method for measuring treatment effects in natural experiments. However, it has some drawbacks that should be considered before using it. First, the model does not account for heterogeneous effects between groups or over time, which can lead to misleading estimates. Additionally, the LSDV estimator is inconsistent for a dynamic panel data model with individual effects, whether the effects are fixed or random. This means that the bias of the LSDV estimator in a dynamic model is generally higher than other estimators. Finally, the 2FE model may produce biased estimates due to the fact that it only adjusts for unobserved unit-specific and time-specific confounders at the same time. Alternative estimators such as the instrumental variable approach or the difference-in-differences approach may provide more accurate estimates of treatment effects.

First three sentences are fine. Fourth sentence introduces an estimator that is not previously defined, probably by plagiarizing one of the source papers. The point starting with "finally" is a restatement of the limitation under "first". The instrumental variable approach is another econometric technique but it is not a substitute for 2WFE. D-i-d is most commonly implemented as 2WFE as an estimator, and so it is also not a separate technique.

[+] ethanbond|3 years ago|reply
I think you're holding a (suitably) high bar and I'm still wildly impressed it got as close to it as it did. To me these seem extremely "synthesis-heavy" as opposed to merely fact-finding, which is what I would have seen as the more obvious/achievable task?

I do wonder if some of the first question is just that the answer an expert should give you is similarly equivocal (granted being more explicit about the equivocality) and we've just gotten too enamored with the idea of clear answers even when they don't actually exist.

[+] 6gvONxR4sf7o|3 years ago|reply
I plugged that first question into another of these kinds of products (called Elicit), and I think it did a much better job:

https://elicit.org/search?q=what+is+the+relationship+between...

It found relevant sources and had decent summaries of each.

(edit: I just noticed it also has a summary of the whole question on the left too. Maybe someone who knows the econ side of it can attest to whether it's correct:

> SUMMARY OF TOP 4 PAPERS

> These papers suggest that lower commodity prices are associated with increased civil conflict. Demuynck 2008 found that a drop in prices of tropical agricultural commodities increases the attractiveness of other ‘economic’ activities such as rebellion/warfare and can, therefore, trigger civil conflict. Morgan 2014 found that commodity price volatility causes a commitment problem that should increase the probability of armed conflict. Ciccone 2018 found that international commodity price downturns sparked civil wars in Sub-Saharan Africa. Dube 2013 found that a sharp fall in coffee prices during the 1990s lowered wages and increased violence differentially in municipalities cultivating more coffee.

)

[+] whatshisface|3 years ago|reply
That sounds like a glowing review to me, you have to be an expert in the field and ask graduate-level questions to be able to find fault with the fourth sentence after three good ones. :)
[+] jtxt|3 years ago|reply
Very interesting, likely useful for many types of questions. Here's a test of say hello:

>why is the earth flat Answer The Earth is flat because it is a level surface without raised areas or indentations. This is why people can see buildings, hills, trees, mountains and other things that are not actually there...

I've been working on this prompt to work with ChatGPT:

>Please answer the below question, and for each researched statement of fact, especially numbers, add a bracketed number like this: [1] If you need to use math, write each calculation as it can be written into a calculator in curly braces like this: {13 / 3 = 4.3333333333 } Then after your answer, for each of those facts, write search terms that could be used to verify each fact from academic sources.

Question: why is the earth flat

>The Earth is not flat, it is a sphere. This can be observed through various methods, such as observing the curvature of the horizon, the way shadows change throughout the day, and the way objects appear to change size as they move closer or farther away. [1] Additionally, the Earth's circumference has been measured to be 24,901 miles {24,901 / 3.14 = 7899.857143 }. [2]

Search terms to verify statements:

- curvature of the horizon, observation of shadows, changing sizes of objects

- measurement of Earth's circumference, circumference calculation using pi

Then here's a WIP follow up question:

>Below, I listed a result from each search term with URL and excerpts from the page. Please use the following to revise your answer, replacing the numbers with source, add actual quotes when useful, and cite sources, then instead of search terms, write your sources.

It's not perfect, but pretty neat. I imagine that some statements of facts and calculations could be corrected by having it talk to google/a calculator through a defined protocol, then use that information to revise its answer.

[+] rushingcreek|3 years ago|reply
Thanks for the comparison. At the moment Hello is highly dependent on the underlying results returned by Bing, which makes it non-deterministic.

I just tried your question and got an answer that seemed more plausible.

> why is the earth flat

> The Earth is not flat. It is a spherical surface with a horizon that is closer when viewed from a lower altitude, and it casts a round shadow no matter which direction it is pointed. This shape causes a number of phenomena that a flat Earth would not, such as ships disappearing over the horizon progressively, the coastline or mountain appearing normally, upside down, and broken into pieces from across Lake Michigan, and stars being visible from different locations on Earth at the same time. Additionally, surveys from all over Earth show that its shape is everywhere locally convex, confirming that it is very close to spherical.

[+] kirillzubovsky|3 years ago|reply
That's a pretty neat use of AI for a subset of use cases. I like it. I just asked it "Why does JavaScript keep crashing", with no particular details, and it was smart enough to show me how console.log() works in order to try and debug this unknown error. It feels good, like asking a friend for help.

I am curious, how do you decide on what kind of content goes in the links bar? It doesn't seem be just news, or just links. I am having hard time figuring out when to resort to the links, vs just using the AI response.

[+] rushingcreek|3 years ago|reply
The links are just regular Bing results.
[+] rahimnathwani|3 years ago|reply
This looks similar to https://www.perplexity.ai/

I tried Perplexity this morning with the following prompt:

"What is the evidence for and against systematic phonics instruction?"

It not only gave me the stuff 'everyone knows' but also pointed to a 2020 paper that goes against the conventional wisdom. (I had read that paper a few days ago, and was curious whether the AI would mention it.).

[+] vorpalhex|3 years ago|reply
how many ducks can fit into a cargo van

> Answer

> It is impossible to answer this question as there is no way to determine how many ducks can fit into a cargo van. The dimensions of the back doors on U-Haul cargo vans are 5’1-1/2′′ x 4’1-1/2′′ , and you have 9’6′′ x 5’7′′ x 4’7′′ in the cargo area totally 245 cubic feet of space for your household goods. This means that it would be difficult to fit more than 6,000 ducks at a time, while large farms usually have around 50,000 and 100,000 ducks reared simultaneously. The middle ground is 10,000 to 50,000 per batch.

[+] dinkleberg|3 years ago|reply
It looks like you’ve stumbled upon the next big “impossible question” that is going to require future technology to help us solve.

It is interesting to see how it attempted to “solve it” though (I put that in quotes since it isn’t actually doing the math)

[+] dzink|3 years ago|reply
If you guys want to launch a consumer business, you need a domain that can be said across the dinner and remembered and typed correctly by the person on the other side. Any other domain means you depend on your future users finding you through Google, your main competitor. Find a better domain if you want to grow.
[+] rushingcreek|3 years ago|reply
You're completely right. We have a much better one that we'll be switching to soon.
[+] rom16384|3 years ago|reply
why are froblowopps more expensive than klajdhs?

> Froblowopps are more expensive than Klajdhs because of the cost of ingredients, size of the company, and manual preparation process. The ingredients used in froblowopps are more expensive than those used in Klajdhs, such as roast beef which is more expensive than ground beef.

It's hard to trust the answers...

Do AI search engines make stuff up?

> AI search engines do not make stuff up. They use complex algorithms to determine how content gets ranked, and they use natural language processing to understand human language in order to find users the information they're looking for.

[+] s1k3s|3 years ago|reply
Other people calling copyright issues but this is far from GPT's biggest problem..

> how to build a successful startup

> Building a successful startup requires hard work, dedication, and the right resources. It is important to have a good business plan that includes both short-term and long-term goals, as well as how you plan to grow your business. Additionally, it is important to be original and not copy anyone else's ideas. Networking is also essential for building a successful startup, as word of mouth is stronger than ever. Professional networking can help you find the right people and talent, while also helping you build up an environment in which everyone participates. Finally, it is important to keep up with the latest trends and maintain a balance between life and work.

It's literally spelling out generic stuff that you can find in any reddit post. I understand this is what's been trained on, but relying on this as a knowledge base is very far fetched

[+] rushingcreek|3 years ago|reply
The more specific you are, the more detailed the answer will be. If you simply add "step by step":

> how to build a successful startup step by step

> Building a successful startup requires several steps. The first step is to identify a problem and solution that can be filled by your business idea. This will help you create a website that meets the needs of your target audience. You should also write a company plan, which is necessary for obtaining startup funding. Once you have identified a problem and solution, you can build an MVP . This should focus on the most crucial must-have features needed by your target market. It is important to involve potential customers in the building process and ask for their feedback. After building the MVP, it is time to launch it. If you have a large following or media attention, organizing a media launch with a presentable version of the product may be a good idea. However, most startups don't need to make a big deal out of their launch.

[+] fleddr|3 years ago|reply
That's actually a quality answer given the garbage input.
[+] AndrewKemendo|3 years ago|reply
Then it sounds like you want a "clarification" feature, as your query isn't in any kind of discernible language that doesn't require extreme context assumptions

For example, if I was a librarian and someone walked up and said literally: "how to build a startup" I would ask follow up questions to help you ask a better question, perhaps something like: "What would be the steps I would have to take in order to build a profitable company"

So is that what you are asking for it to do?

[+] wayy|3 years ago|reply
What is the correct/expected answer to that query?
[+] gavinray|3 years ago|reply
It would be great to be able to leave feedback on individual search results

For example, the explanation text of "What is RCU (Read-Copy-Update) and how is it different than a Mutex?" was (mostly) great:

https://beta.sayhello.so/search?q=what+is+rcu+%28ready-copy-...

The code examples were not the best. I looked for a way to relay this information on the search results but couldn't find it (only "GOOD/BAD" vote)

---

Can you just put a <form> tag or an <input> at the bottom and hook it up to some DB table?

EDIT: It seems to re-generate the text each time?? So, if someone takes my word for it and then gets an awful description, that's why.

[+] nonameiguess|3 years ago|reply
I like the idea. I think separating out creative writing from fact retrieval makes a lot of sense and the attempt at a single general AI that can do both is clearly not there yet.

It's slow as hell, though. Maybe a more limited invite-only beta would have been a good idea if your infra isn't there yet, compared to an announcement on Hacker News. It spun for 5 minutes doing nothing, then eventually gave me an inference engine down message. Retried and it spun for another 5 minutes doing nothing. Tried a third time, and at some point while I was back here reading the comments again, it loaded an answer finally. Tried a fourth time and got the inference engine down again.

[+] JadoJodo|3 years ago|reply
One of the things that worries me about ChatGPT-driven products is the issue of confidence. We’ve seen in other posts where it can generate convincing-sounding instructions from thin air. How can I trust that following the instructions for “How to set cookie in FastAPI?” is going to be legit?

We obviously also have this issue with the open web, but real authors are almost universally doing it for others’ benefit (as opposed to their detriment/trolling). ChatGPT generating instructions for processes isn’t “real” and so it has no notion of them being “correct” or “incorrect”; they just “look” the same as the training data.

[+] wayy|3 years ago|reply
We think the solution is simply having good sources and answer transparency. If you mouse over part of the answer, we try to show you the source of that sentence. Obviously this system is early and will improve over time, but if can easily check if an answer is from say, the FastAPI official documentation, then the false-confidence effect of these models become less of an issue.
[+] lucb1e|3 years ago|reply
Out of curiosity, what kind of costs are we incurring per query?

I don't know that much about these AI systems aside from that it's apparently too big to run on my own PC. Having a powerful but regular desktop system temporarily dedicated to answering my queries is quite a lot of compute power already and apparently that's not enough for something like Dall-E or chatGPT.

Perhaps to avoid being pinned down on this later ("why is this so expensive if it only costs X"), it's probably good to multiply any answer by five or ten, or factor in some of the development time/costs.

[+] vineyardmike|3 years ago|reply
I can't find the citation, but someone related to OpenAI said that ChatGpt cost a few cents per query.

Anecdotally, I looked at running an open-source GPT competitor from Meta (OPT), and the requirements for running that are vaguely on the scope of "8 GPUs with at least 400gb of GPU memory" which would be VERY expensive, even in the cloud.

An 8gpu AWS EC2 instance costs $3.40/hr (p2.8xlarge), and if the model runs for 20s per request (roughly how long chatGPT takes me to get a response), that gives you $0.02 a request, not including things like storage or idle capacity. So a request could likely be $0.05-0.1, not including overhead like dev costs, training, etc.

[+] MarcelOlsz|3 years ago|reply
I asked it what the fingerings for a C trumpet scale are and it instantly mentioned a fourth valve. Still a ways to go.
[+] williamstein|3 years ago|reply
Some minor feedback: It's a little confusing that right below the search it asks "Was this answer better than Google?" when the answers you get to compare with on the right are from Bing. That question could be changed to "Was this answer better than the Bing results to the right?" Also, shouldn't the Bing results to the right be labeled as Bing results? I only know they are from Bing because it says "We also show regular Bing results side-by-side with our AI answer." in the post.
[+] rushingcreek|3 years ago|reply
Interesting. We'll take a look at this. Thanks!
[+] trynewideas|3 years ago|reply
Asked the same history question[1], "what was the french-speaking population of france during the french revolution", multiple times and got different answers each time. Each answer was wrong about at least one significant detail, and in one answer it decided to just not try to answer the question I asked.

Attempt 1: The population was about 25 million (in line with estimates) and only 3 million spoke specifically Parisian French (a potentially accurate statement, but not what I asked). The sources it cited had contradicting information — 23 million instead of 25 million total population — or additional context that would've answered the question accurately, such as about half of the population speaking some form or quality of French.

Attempt 2: "The French-speaking population of France during the French Revolution is estimated to have been around 29 million people" — completely wrong across the board. One cited source is a Statista user-created graph that gives 24.8M for revolutionary France and 29M for France in 1800; it also has no details about language use. "This figure is based on demographic analysis conducted by Gudin and confirmed by a crosscheck with the graph showing population changes in France between 400 B.C. and 1975 contained in Colin McEvedy and Richard Jones, Atlas of World Population History." This is lifted almost verbatim from another cited source, but the source's next paragraph acknowledges discrepancies important to the context, including potential errors by Gudin or his translator. The source also has no details about language use.

Attempt 3: "The French-speaking population of France during the French Revolution was estimated to be around 55.81 million people, according to a 2007 Adult Education survey by the Insee. This survey found that French was the mother tongue of 87.2% of the total population, followed by Arabic , Portuguese , Spanish and Italian . People who spoke other languages natively made up the remaining 5.2% of the population." Completely wrong. The only cited source is the "Languages of France" Wikipedia article.

1: https://beta.sayhello.so/search?q=what+was+the+french-speaki...

[+] odysseus|3 years ago|reply
> how does a tuple differ from a list and a dictionary in python

> A tuple is an immutable data structure in Python that stores key-value pairs. It is similar to a list, but it does not allow for any changes after it has been created. Tuples are represented by and lists by [].

This is not the whole answer given (it talks about dictionaries as well), but it's missing a few things that could be useful:

* It doesn't actually say what tuples are represented by (the characters seem to be stripped out)

* It doesn't show examples

[+] aw1621107|3 years ago|reply
Isn't that answer technically incorrect as well? A more precise answer would be that tuples can store key-value pairs, but even then I think that part of the answer is unnecessarily narrow.