Ask HN: Is there a search engine which understands regular expressions?
I know http://www.google.com/codesearch allows them, but it's a special purpose search engine for searching through source code. I'm looking for a general purpose search engine, like Google/Bing, that can search through all the usual document types using regexes.
[+] [-] stevelosh|14 years ago|reply
Example: putting the following into the Google search box:
Finds any page with foo, bar and baz, even thought I tried to tell it that the asterisk and arrow were important.https://encrypted.google.com/search?hl=en&q=%22*foo%22+%...
[+] [-] pbhjpbhj|14 years ago|reply
For example I was searching for Groupon info in a particular market and google returns me results that mention "group" but doesn't tell me that it's doing so. Really, really annoying to find barely any of the returned results contain or relate to the word that you're searching.
I tried quoting groupon but this doesn't work you have to negate the search thusly: 'groupon -"group on"'. Why don't they give you the option to remove their guessed results (like "did you mean") or simply limit related/thematic results to those using the ~ modifier. Grrr.
[+] [-] pittsburgh|14 years ago|reply
Since parsing regular expressions is so slow compared with performing an indexed search, it's difficult to think of a way to make that scale for a dataset as large as the public web. There's also the problem of having to protect against regex denial of service attacks: http://en.m.wikipedia.org/wiki/ReDoS
I've been able to (very partially) make up for the lack of regex support by taking advantage of Google's operators and wildcards:
http://www.googleguide.com/wildcard_operator.html
http://www.googleguide.com/advanced_operators.html
Some examples:
This is hardly a replacement for regex, but it's the best I've been able to come up with.[+] [-] smoove|14 years ago|reply
Also, if you let a user search for any regex, it would be really easy to overload the server, by entering very complex regexes.
[+] [-] gnosis|14 years ago|reply
I'm not sure that allowing regexes would put an undue burden on the search engines. But if it ever becomes an issue, the search engine could easily deal with the problem by simply slowing down the search if it contains a regex.
I'd happily wait 2x, 5x, or even 10x as long for my query to complete if I could use a regex. For some important queries for which non-regex searches are inadequate, I'd even be willing to wait hours or days, since the alternative would be not being able to perform the search at all (or returning so many false positives as to be useless).
[+] [-] pbhjpbhj|14 years ago|reply
Before I adopted Google I used Teoma, AllTheWeb, Magellan/Excite and probably some others so it was possibly one of them? Anyone recall such a thing?
Edit: Looks like http://www.searchlores.org/main.htm#exalead (Exalead, private beta) is doing regular expression search.
[+] [-] motochristo|14 years ago|reply
[+] [-] gnosis|14 years ago|reply
Those are just predefined custom searches, a feature built in to Opera (my browser of choice), and probably other browsers as well.
Unfortunately, custom searches are still limited to using whatever syntax the search engines they query use, so if that search engine does not support regexes, using a custom search (or "bang search") won't help.
They can still be a valuable search tool, but not what I'm looking for.