top | item 2730871

Ask HN: Is there a search engine which understands regular expressions?

20 points| gnosis | 14 years ago | reply

Are there any general purpose search engines which allow searches to be made using regular expressions?

I know http://www.google.com/codesearch allows them, but it's a special purpose search engine for searching through source code. I'm looking for a general purpose search engine, like Google/Bing, that can search through all the usual document types using regexes.

20 comments

order
[+] stevelosh|14 years ago|reply
Regexes would be great, but I'd settle for a "raw" mode where the search engine just searches for the exact string.

Example: putting the following into the Google search box:

    "*foo" "bar->baz"
Finds any page with foo, bar and baz, even thought I tried to tell it that the asterisk and arrow were important.

https://encrypted.google.com/search?hl=en&q=%22*foo%22+%...

[+] pbhjpbhj|14 years ago|reply
Google is getting less and less usable for search for me because of something akin to what you describe.

For example I was searching for Groupon info in a particular market and google returns me results that mention "group" but doesn't tell me that it's doing so. Really, really annoying to find barely any of the returned results contain or relate to the word that you're searching.

I tried quoting groupon but this doesn't work you have to negate the search thusly: 'groupon -"group on"'. Why don't they give you the option to remove their guessed results (like "did you mean") or simply limit related/thematic results to those using the ~ modifier. Grrr.

[+] pittsburgh|14 years ago|reply
I've also tried to find a search engine that supports regex and have come up empty. I hope somebody on this thread pleasantly surprises me, but I now doubt one exists.

Since parsing regular expressions is so slow compared with performing an indexed search, it's difficult to think of a way to make that scale for a dataset as large as the public web. There's also the problem of having to protect against regex denial of service attacks: http://en.m.wikipedia.org/wiki/ReDoS

I've been able to (very partially) make up for the lack of regex support by taking advantage of Google's operators and wildcards:

http://www.googleguide.com/wildcard_operator.html

http://www.googleguide.com/advanced_operators.html

Some examples:

   "solar|lunar eclipse 1700..1800"

   "William * Clinton"

   Columbus -Ohio -Georgia -Christopher
This is hardly a replacement for regex, but it's the best I've been able to come up with.
[+] smoove|14 years ago|reply
I guess the main problem is that you really can't build an index for regexes, you would need to apply the search regex "live" to all the documents the searchengine knows - this will not scale at all.

Also, if you let a user search for any regex, it would be really easy to overload the server, by entering very complex regexes.

[+] gnosis|14 years ago|reply
Only a small minority of users even know what regexes are, and fewer still use them.

I'm not sure that allowing regexes would put an undue burden on the search engines. But if it ever becomes an issue, the search engine could easily deal with the problem by simply slowing down the search if it contains a regex.

I'd happily wait 2x, 5x, or even 10x as long for my query to complete if I could use a regex. For some important queries for which non-regex searches are inadequate, I'd even be willing to wait hours or days, since the alternative would be not being able to perform the search at all (or returning so many false positives as to be useless).

[+] pbhjpbhj|14 years ago|reply
I've a vague recollection of using some limited subset of regular expressions in maybe about year 1999-2000? However, I have a very bad memory and could be confusing with some specialist tech databases I used to access.

Before I adopted Google I used Teoma, AllTheWeb, Magellan/Excite and probably some others so it was possibly one of them? Anyone recall such a thing?

Edit: Looks like http://www.searchlores.org/main.htm#exalead (Exalead, private beta) is doing regular expression search.

[+] motochristo|14 years ago|reply
http://duckduckgo.com/ might help. I know it utilizes the bang syntax.
[+] gnosis|14 years ago|reply
Thanks, but that's really not the same at all.

Those are just predefined custom searches, a feature built in to Opera (my browser of choice), and probably other browsers as well.

Unfortunately, custom searches are still limited to using whatever syntax the search engines they query use, so if that search engine does not support regexes, using a custom search (or "bang search") won't help.

They can still be a valuable search tool, but not what I'm looking for.