Show HN: Phind.com – Generative AI search engine for developers
292 points| rushingcreek | 3 years ago |phind.com
Today we're launching phind.com, a developer-focused search engine that uses generative AI to browse the web and answer technical questions, complete with code examples and detailed explanations. It's version 1.0 of what was previously known as Hello (beta.sayhello.so) and has been completely reworked to be more accurate and reliable.
Because it's connected to the internet, Phind is always up-to-date and has access to docs, issues, and bugs that ChatGPT hasn't seen. Like ChatGPT, you can ask followup questions. Phind is smart enough to perform a new search and join it with the existing conversation context. We're merging the best of ChatGPT with the best of Google.
You're probably wondering how it's different from the new Bing. For one, we don't dumb down a user's query the way that the new Bing does. We feed your question into the model exactly as it was asked, and are laser-focused on providing developers the most detailed and comprehensive explanations to code-related questions. Secondly, we've focused the model on providing answers instead of chatbot small talk. This is one of the major improvements we've made since exiting beta.
Phind has the creative abilities to generate code, write essays, and even compose some poems/raps but isn't interested in having a conversation for conversation's sake. It should refuse to state its own opinion and rather provide a comprehensive summary of what it found online. When it isn't sure, it's designed to say so. It's not perfect yet, and misinterprets answers ~5% of the time. An example of Phind's adversarial question answering ability is https://phind.com/search?q=why+is+replacing+NaCL+with+NaCN+i....
ChatGPT became useful by learning to generate answers it thinks humans will find helpful, via a technique called Reinforcement Learning from Human Feedback (RLHF). In RLHF, a model generates multiple candidate answers for a given question and a human rates which one is better. The comparison data is then fed back into the model through an algorithm such as PPO. To improve answer quality, we're deploying RLAIF — an improvement over RLHF where the AI itself generates comparison data instead of humans. Generative LLMs have already reached the point where they can review the quality of their own answers as good or better than an average human rater tasked with annotating data for RLHF.
We still have a long way to go, but Phind is state-of-the-art at answering complex technical questions and writing intricate guides all while citing its sources. We'd love to hear your feedback.
Examples:
https://phind.com/search?q=How+to+set+up+a+CI%2FCD+pipeline+...
https://phind.com/search?q=how+to+debug+pthread+race+conditi...
https://phind.com/search?q=example+of+a+c%2B%2B+semaphore
https://phind.com/search?q=What+is+the+best+way+to+deploy+a+...
https://phind.com/search?q=show+me+when+to+use+defaultdicts+...
Discord: https://discord.gg/qHj8pwYCNg
[+] [-] jfmc|3 years ago|reply
Theorem equiv_pq_qp : forall (p q : Prop), (p -> q) <-> (q -> p). Proof. intros p q. split. - intros p_imp_q q_imp_p. apply q_imp_p. apply p_imp_q. assumption. - intros q_imp_p p_imp_q. apply p_imp_q. apply q_imp_p. assumption. Qed.
... together with a lengthy and convincing explanation in natural language.
Sophists would be delighted by these mechanized post-truth AI systems.
[+] [-] andrepd|3 years ago|reply
[+] [-] dmje|3 years ago|reply
I've been actively using Kagi[0] for the past few months and what really works brilliantly there is the notion of "lenses", AKA "per-site" search filters. So you can say - "boost reddit.com, suppress bullshitspamsite.com" and set this up for specific contexts - PHP dev, recipes, etc.
I see that Phind has some notion of this, so will have a play - but it's just this relatively simple "limit to X sites" option that I think makes a big difference in the usefulness of search for me.
[0] https://kagi.com/
[+] [-] thefourthchime|3 years ago|reply
"Best California style burrito in Austin".
Nearly every engine shows me burrito shops in California, some give shove reddit links to the top. Google was the only decent response. Phind response is what I would expect from an assistant who researched this for 5-10 minutes of searching the web. Great work!
(now add maps to those results!)
[+] [-] rushingcreek|3 years ago|reply
[+] [-] joenot443|3 years ago|reply
Absolutely fantastic stuff, I’m excited to add this to my tool-belt. There’s a specific feeling of knowing that an answer to your question is very simple and exists somewhere on SO, but the mental effort of sifting pages of answers seems unappealing. It seems like Phind is well suited to do this job for you!
[+] [-] LrnByTeach|3 years ago|reply
That simple act reduces your cognitive load, as a programmer you mind is two steps ahead assuming this trivial solution you know conceptually but do not want to spend the mental cycles and looking for the quick code so your mind can jump to the next step in the flow.
[+] [-] rushingcreek|3 years ago|reply
[+] [-] kxrm|3 years ago|reply
https://phind.com/search?q=How+should+I+filter+a+dictionary+...
https://phind.com/search?q=How+would+I+use+php+to+encode+wit...
https://phind.com/search?q=How+do+I+get+my+data+from+a+datab...
For more complicated prompts it misses the mark a bit but this may be going outside of the intended use case.
https://phind.com/search?q=Create+a+python+class+that+can+pa...
It would be nice if it annotated the projects it suggested in the response. The above query talks about a project called "mov" in the "provided code" but I do not see any code provided.
[+] [-] rushingcreek|3 years ago|reply
Running it again, I got a code example:
> Create a python class that can parse MP4 headers
> To create a Python class that can parse MP4 headers, one can use the pymp4 library available on GitHub.
The Box class from this library can be used to build and parse MP4 headers.
The following code shows an example of using the Box class to build an MP4 header and then parse it:
from pymp4.parser import Box from io import BytesIO
header = Box.build(dict( type=b"ftyp", major_brand="iso5", minor_version=1, compatible_brands=["iso5", "avc1"])) print(header)
parsed_header = Box.parse(header) print(parsed_header)
[+] [-] raajg|3 years ago|reply
The performance could be improved. I've having to wait several seconds before the summary is created.
[+] [-] raajg|3 years ago|reply
https://phind.com/search?q=why+is+funcref+not+working+in+God...
[+] [-] rushingcreek|3 years ago|reply
[+] [-] attentive|3 years ago|reply
[+] [-] personjerry|3 years ago|reply
Code snippets are fine, but I'd actually rather just use ChatGPT to generate code snippets and ideas, and have it explain to me details and other options in a conversational way.
Thus in general this approach, like most other AI search options, doesn't excite me.
At a high level, I think AI search is terribly hard to solve. You're fighting against existing user patterns heavily tuned to the nuanced behaviours of existing search. So you have to find a specific use case that Google sucks at (and while StackOverflow's fall from grace somewhat lends an opportunity, I think it's not sufficient for most developers to start looking for a different solution) and that you're 10x better at.
But 10x better means your speed has to be just as fast, your answers have to be significantly better, and crucially you have to be nearly 100% consistent, which is brutally hard for most AI approaches.
At that point I wouldn't even market yourself as search. In this context maybe something like "AI-powered software development assistant".
[+] [-] rushingcreek|3 years ago|reply
For example, "how to debug pthread race conditions in c++" is asking for a guide where no single comprehensive source exists. Phind combines information from Stack Overflow and other websites to generate a more comprehensive guide: https://phind.com/search?q=how+to+debug+pthread+race+conditi....
Generating guides that aren't simply addressed by docs or individual Stack Overflow answers are where Phind really shines.
[+] [-] userbinator|3 years ago|reply
One of the things that comes to mind is searching for exact error messages or codes, but for that I want something more like grep rather than AI.
[+] [-] dteiml|3 years ago|reply
[+] [-] ncr100|3 years ago|reply
Cool. Conclusion is that I need to get better at writing queries:
I wanted an overview of Python APIs to choose from to play sound files, and importantly, I wanted to know how / if each API offered a non-blocking parameter. In my use-case I am playing sounds, after detecting some elements during video processing, and I find it blocks my realtime video stream UI.
My experience refining my query - I started with one query, and settled on an increased redundancy of my query:
[NOTE: I typo'd asynchronously as "asynchonously", unintentionally]
FIRST = how do i asynchonously play sounds in python (https://phind.com/search?q=how+do+i+asynchonously+play+sound...)
Result from PHIND for FIRST was promising enough for me to refine my query. I felt like the beginning portion of the answer it provided was useful. And I felt like the subsequent portion was not. It listed various Python API for playing sounds. And only for the first did it supply concurrency details.
SECOND TRY = how do i asynchonously play sounds in python, and what are the parameters for synchronous or blocking playback (https://phind.com/search?q=how+do+i+asynchonously+play+sound...)
Result for the SECOND was more loquacious, however it supplied more detail, and satisfyingly more concurrency configuration instructions for e.g. the "playsound" Python API.
So overall, kudos! And, I will use this product in the future.
[+] [-] rushingcreek|3 years ago|reply
[+] [-] johnfn|3 years ago|reply
> In TypeScript, how can I have a component restrict the types of its children to only certain types of components? I would like a static type error.
The correct answer is "this is impossible". Every AI I've tried this on hallucinates some nonsense code that doesn't actually work. Sadly Phind is the same. It says "here's an example of how to do this" with a code sample, and then links to a StackOverflow post saying it's not possible :)
[+] [-] tluyben2|3 years ago|reply
I know quite a lot of people, if not allowed a google search and only get vscode to try something, that would take a stab at your typescript question and not be able to tell you that it is impossible.
Asking gpt to be a search engine is the same as asking a human currently really: I vaguely remember dates from WOII history, so when you ask me in a pub quiz, I will confidently tell you something and then where to find or look up the actual answer which might be slightly or completely different, depending on if I actually matched the right event with the right date and if I remember the day and month or not.
[+] [-] rvz|3 years ago|reply
Once again proves that it is more untrustworthy than a normal search engine like Google. This AI hype of LLMs is truly going to subside very quickly given that the promises made by what ChatGPT, Bing AI, Bard and now Phind are collapsing right in front of us when tested.
The truth is, almost no-one here would trust their output, and now needs a extra review by a human for each output. Henceforth, there is no point in using it as a search engine or to generate anything factual given it is going to hallucinate nonsense like that and can be easily tricked to output incorrect answers, making it very unreliable.
[+] [-] bryanrasmussen|3 years ago|reply
[+] [-] ncr100|3 years ago|reply
Q: Is this general usability issue being discussed more, due to the growth of new product teams working with generative AIs?
Some discussion: I am drawn to wonder if there is an innate value to include permalinks (or semi-permanent) ... to save each query's results. Or perhaps do something better which I've not thought of yet.
The issue in more detail, (I'm naive on the subject so forgive the redundancy:) Results aren't highly deterministic - randomness and date-sensitive models seem to generate, probabilistically, different results for the same query. Especially for Generative AI products that exist as webapps, where hitting 'reload' could provide inferior results and simultaneously eliminates superior prior results.
Non-determinism is a 'surprise' quality which can enhance and degrade the value of a generative AI tool, in my experience when using this category of tools. It's fun (endorphin rush) to spin the Roulette wheel and query again. And it's painful when the prior results ranked More Useful in one's mind.
Ideally for usability's sake, these products would include some affordance to the user, to manage that endorphin / pain cycle.
[Note, A complication, here with PHIND being the context. PHIND's HTML renderer itself is being iterated on .. this is a product-launch after all. So, results also look different not for data/algorithm reasons.]
This could be realized as a web-browser feature - an enhanced History. The current "History" model of browsers is a trivial response to the REQUEST/RESPONSE HTTP mode. The HTTP spec (last time i read it was in 2005) leaned towards everything-is-deterministic except for time.
Example: I input this query https://phind.com/search?q=linq+query+to+filter+every+other+... and it listed one code snippet. I re-queried and received a different snippet. Honestly, I like both results. As of this moment I'm writing, I can only generate one of the two results.
[+] [-] Mariehane|3 years ago|reply
[+] [-] taneq|3 years ago|reply
> Based on the given context, there is no clear answer to the question "what's the best build system for Visual Studio Code on Windows". However, there are some relevant information that can be helpful in understanding the build system options available for Windows and Visual Studio Code.
> [... basic rundown on build systems and a list of the common suspects ...]
I'm impressed.
[+] [-] exodust|3 years ago|reply
I prefer search engines to do their job and provide quality links, not lengthy passages about the subject I'm searching for. For simple answers it's fine, but search engines pretending to know all about a subject, is cringey. Scraping the web and mashing together what it finds.
Each website has its own way to present things, and in what order. When hoovered up, taken elsewhere and squeezed out through AI's front end, it comes out a tangled mess.
Just now I needed a refresher on the HTML table scope attribute, so I used Phind to look up html table scope. It returned a wall of text about tables. Not inaccurate, but a wall of text. It was easier to click on the w3schools links where it's nicely laid out with clear examples of scope, with clear information about related attributes and use.
[+] [-] najarvg|3 years ago|reply
[+] [-] djhn|3 years ago|reply
[+] [-] telman17|3 years ago|reply
[+] [-] rushingcreek|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] NathanFlurry|3 years ago|reply
Naturally, the first thing I searched for was something I’ve put on the internet that is not well indexed by Google.
When I search “how to find a lobby with rivet.gg” (that’s our most basic functionality), it doesn’t come up with an answer. However, it does spit back a descent summary of relevant features we provide.
What’s odd – it always continues on to talk about an unrelated company. The first time, it started talking about top.gg as a way to add Fortnite-style bots to your game. (They’re actually a Discord bot directory.) The second time, it mentioned a completely random Reddit called Guilty Gear Strive.
Can you elaborate on how it finds these related topics?
[+] [-] rushingcreek|3 years ago|reply
In the case of “how to find a lobby with rivet.gg”, it finds your docs and looks at https://docs.rivet.gg/docs/concepts/matchmaker/ but fails to extract how to put a player in a lobby. We'll take a look at this case in more detail. Our goal is to be great at parsing most/all docs.
[+] [-] hooande|3 years ago|reply
I am curious about whether it hallucinates api methods that don't exist, like ChatGPT does. I haven't seen that yet but the underlying concept of an LLM is the same.
[+] [-] rushingcreek|3 years ago|reply
[+] [-] RobMurray|3 years ago|reply
There is a small accessibility problem with the way it interacts with screen readers. it speaks duplicates of the text for some reason. I am using NVDA.
[+] [-] dstala|3 years ago|reply
(a.) Generate a short summary to start with & expand using something like `show more` (b.) Include reference images if you can.
[+] [-] rushingcreek|3 years ago|reply