top | item 9071637

(no title)

mumrah | 11 years ago

Lucene has had this capability for a while now. A regexp query (as defined by the RegExp class) is compiled into an automaton for quick comparison with the inverted index, also represented as an automaton. Automatons and FSTs are used extensively by Lucene these days. Out of curiosity, I did some playing around with Lucene's RegExp class when it first came out (or rather, when I first learned about it). It provides a really interesting way to build regular expressions: https://gist.github.com/mumrah/6104234

Javadoc: https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/...

Source: https://github.com/apache/lucene-solr/blob/trunk/lucene/core...

discuss

order

chrisconroy|11 years ago

Related: the dk.brics.automaton library (http://www.brics.dk/automaton/) is excellent.

It's got a much different API than java.util.RegEx, but it let's you work with Automatons as first class things instead of operating just with regexes. Being able to compute intersections, unions, shortest match examples, from multiple automata etc.. can be really useful.