I once wrote a website (http://regexone.com) to help people learn regular expressions using practical examples -- maybe you would like to give that a try and see if it helps you in understanding the different regexes a bit more?
Yes, the intention was to help solve those cases where you are staring at your screen because you don't know where the match went wrong.
I'm thinking of doing some tutorials geared towards teaching students in grade school how to use them. I think a visual representation would help significantly.
Seconded - it's superb. Having the example button for new users of regexes or people who haven't used one in a while is an excellent addition, as are the diagrams.
I find it sort of sad that several people have responded by linking to their preferred (but clearly inferior) Regex pages, which detracts from the accomplishment of this one.
I understand RegEx, but am (almost) completely unable to read it. For me, this site it perfect.
The way I learned RegEx was simply spending 2 work days writing a parser with it. I think the problem is that there is a moment when RegEx suddenly makes sense, and you cannot understand how anyone can be confused by it (even when you yourself were confused just 5 minutes ago).
Many other algorithms have exponential edge cases. This can open yourself to DoS'ing if you accept regular expressions from the user (e.g. a search feature.)
No. While you are correct a DFA is far superior for parsing this specific subset of javascript regex, it does in no way make it ideal for debugging purposes.
1) In the user's program the regex is not going to be run on a dfa (since we are talking about the javascript variation which has back references). It makes more sense to warn the user about bad performance, than making them believe they are safe.
2) A debugger has to be true to the input. If the user wants to debug (a) it doesn't help that the debugger just casually transforms it into a*. That wouldn't make the diagrams fun at all.
3) It is entirely possible that in the future, the author wants to expand the awesome tool to a larger subset of javascript regex. This would probably make it break out of the finite automa space.
I do however agree that it's a pitty how many good regular expressions are run on stupid backtracking systems out there.
Sorry for the delayed response. Spent all day yesterday responding to feedback. The reason this crashes is due to the internal javascript engine.
In order to ensure that my engine (I simulate a kind of NFA) matches what javascript's engine matches, every time I match on my engine, I also try to exact match using javascript's engine. Unfortunately, javascript's engine always uses backtracking, even when it doesn't need to. Obviously this code should have been turned off for production, and I'll fix it on the next push.
To replicate the crash on your own, try typing:
'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab'.match(/^(a)$/) in your console.
As a long time Rubular user, I will be switching to Debuggex.
It just feels right to me. It explains your regex to you, which, in my mind, is a much better way to debug a regex than to supply a large set of test strings.
Important caveat: it makes use of a hidden Java applet -- so that it can supports the somewhat larger Java regex syntax, doesn't send your data anywhere else for matching, and can hook into the string-probing to animate the process. So dig out whatever browser you use for Java applets (if you still have one) to test.
Regarding the animation, click the 'animate?' link to show the animation step/speed controls. For example, you can watch the regex that tests whether a number is prime (by failing) or composite (by succeeding) via these two animations:
I really want to get rid of the applet requirement; I might someday cross-compile the JDK7 regex support to JS so that the full syntax and animation can still be supported, without an applet.
Generally I try these sorts of things out on a small but non-trivial example. Unfortunately it failed, so while this regex debugger shows a great deal of potential, there is still a bit more work to be done. My inputs:
Regex: TVo[12].\d.* [Aa] ..[^k]
Test string: TVo1-0:01.0-1:01.0 A Nashville
Very nice. I'd recommend making the text to match field a text area and doing line based matches.
When I need to haul out the big guns, I load up RegexBuddy in a Wine bottle and dump a screenfull of text into it along with the regex to figure out where I went wrong.
They have a very different way of visualizing the step by step, but both are great tools.
The text to match field is a text area; it will auto-expand as you type into it.
However, only exact matches are supported for the first release. I wanted to get user feedback before I built any more features. I think I have an intuitive way to visualize findAll() type matches.
This is really cool! One thing that would be really awesome would be if you added a way to switch between disambiguation strategies. At the moment, it seems like the default strategy is greedy parsing (i.e. the "Perl way"). For instance, when matching the string "ab" against (ab)(b?) the first group matches "a" while the second matches "b". With the POSIX strategy, the first group will match "ab" and the second will match the empty string.
I think these subtle differences leads to a lot of confusion when users are not aware that the underlying implementation is different from what they are used to.
Not bad. Have you thought about supporting a much larger string area? The re editor in Slickedit lets me paste multiple lines of text and see what parts get matched by the regex which is super useful for searching and replacing code and also very useful for multi-line matches.
This is really awesome, and it's immediately going into my batbelt bookmark folder.
One quick UI note: The reference table is much easier to read if the lines are left-aligned. With centering and two columns, it's hard to tell at first which descriptions the escape sequences belong to.
This makes regex fun, I could actually see myself relying on it more. Not sure if I'm not writing them every day if I'll remember a year from now what \dd does but now I have a good site to go to to remember again. Nice site.
[+] [-] UnoriginalGuy|13 years ago|reply
Unfortunately if you don't "understand" RegEx it won't help much. It is more for people who already have it down.
For me I am still stuck in copy/paste land. I could never get my head around the "logic" of RegEx, it just seems completely random and arbitrary.
Plus they re-use the same characters but have multiple meanings (e.g. ^ for NOT and for START).
[+] [-] kgen|13 years ago|reply
[+] [-] tsergiu|13 years ago|reply
I'm thinking of doing some tutorials geared towards teaching students in grade school how to use them. I think a visual representation would help significantly.
[+] [-] anigbrowl|13 years ago|reply
I find it sort of sad that several people have responded by linking to their preferred (but clearly inferior) Regex pages, which detracts from the accomplishment of this one.
[+] [-] gizmo686|13 years ago|reply
The way I learned RegEx was simply spending 2 work days writing a parser with it. I think the problem is that there is a moment when RegEx suddenly makes sense, and you cannot understand how anyone can be confused by it (even when you yourself were confused just 5 minutes ago).
[+] [-] jacobparker|13 years ago|reply
String: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab
This kills the browser.
ADDENDUM:
A good read on executing regular expressions in linear (and thus predictable) time is http://swtch.com/~rsc/regexp/regexp1.html
Many other algorithms have exponential edge cases. This can open yourself to DoS'ing if you accept regular expressions from the user (e.g. a search feature.)
[+] [-] thomasahle|13 years ago|reply
1) In the user's program the regex is not going to be run on a dfa (since we are talking about the javascript variation which has back references). It makes more sense to warn the user about bad performance, than making them believe they are safe.
2) A debugger has to be true to the input. If the user wants to debug (a) it doesn't help that the debugger just casually transforms it into a*. That wouldn't make the diagrams fun at all.
3) It is entirely possible that in the future, the author wants to expand the awesome tool to a larger subset of javascript regex. This would probably make it break out of the finite automa space.
I do however agree that it's a pitty how many good regular expressions are run on stupid backtracking systems out there.
[+] [-] tsergiu|13 years ago|reply
In order to ensure that my engine (I simulate a kind of NFA) matches what javascript's engine matches, every time I match on my engine, I also try to exact match using javascript's engine. Unfortunately, javascript's engine always uses backtracking, even when it doesn't need to. Obviously this code should have been turned off for production, and I'll fix it on the next push.
To replicate the crash on your own, try typing: 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab'.match(/^(a)$/) in your console.
[+] [-] abecedarius|13 years ago|reply
[+] [-] tsergiu|13 years ago|reply
[+] [-] albemuth|13 years ago|reply
Surprised no one brought it up
[+] [-] andrewguenther|13 years ago|reply
It just feels right to me. It explains your regex to you, which, in my mind, is a much better way to debug a regex than to supply a large set of test strings.
[+] [-] gojomo|13 years ago|reply
My entry into this category:
http://regex.powertoy.org/
Important caveat: it makes use of a hidden Java applet -- so that it can supports the somewhat larger Java regex syntax, doesn't send your data anywhere else for matching, and can hook into the string-probing to animate the process. So dig out whatever browser you use for Java applets (if you still have one) to test.
Regarding the animation, click the 'animate?' link to show the animation step/speed controls. For example, you can watch the regex that tests whether a number is prime (by failing) or composite (by succeeding) via these two animations:
49: http://regex.powertoy.org/?pat=/^1%3F%24|^%2811+%3F%29\1+%24....
47: http://regex.powertoy.org/?pat=/^1%3F%24|^%2811+%3F%29\1+%24....
I really want to get rid of the applet requirement; I might someday cross-compile the JDK7 regex support to JS so that the full syntax and animation can still be supported, without an applet.
[+] [-] japaget|13 years ago|reply
[+] [-] tsergiu|13 years ago|reply
However, if you use the slider to slide to just past the "s" in Nashville, you can see that the end state does indeed light up.
[+] [-] Guillaume86|13 years ago|reply
Just use: TVo[12].\d.* [Aa] ..[^k].* and it works
[+] [-] DEinspanjer|13 years ago|reply
When I need to haul out the big guns, I load up RegexBuddy in a Wine bottle and dump a screenfull of text into it along with the regex to figure out where I went wrong.
They have a very different way of visualizing the step by step, but both are great tools.
[+] [-] tsergiu|13 years ago|reply
However, only exact matches are supported for the first release. I wanted to get user feedback before I built any more features. I think I have an intuitive way to visualize findAll() type matches.
[+] [-] eridius|13 years ago|reply
Sadly it doesn't seem to understand (?i).
[+] [-] tsergiu|13 years ago|reply
[+] [-] ajacksified|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] ulrikrasmussen|13 years ago|reply
I think these subtle differences leads to a lot of confusion when users are not aware that the underlying implementation is different from what they are used to.
[+] [-] tsergiu|13 years ago|reply
[+] [-] arrakeen|13 years ago|reply
[+] [-] martin_|13 years ago|reply
[+] [-] northisup|13 years ago|reply
[+] [-] cromwellian|13 years ago|reply
Here's another one http://ocpsoft.org/tutorials/regular-expressions/java-visual...
Done with GWT and Errai, source here: https://github.com/ocpsoft/regex-tester/tree/master/src/main...
[+] [-] greggman|13 years ago|reply
http://www.slickedit.com/demo/high/RegexEvaluator/RegexEvalu...
[+] [-] tsergiu|13 years ago|reply
[+] [-] ericcholis|13 years ago|reply
[+] [-] tsergiu|13 years ago|reply
[+] [-] benth|13 years ago|reply
[+] [-] michaelt|13 years ago|reply
Unless you want to do that match-across-newlines witchcraft.
[+] [-] tsergiu|13 years ago|reply
[+] [-] spankalee|13 years ago|reply
One quick UI note: The reference table is much easier to read if the lines are left-aligned. With centering and two columns, it's hard to tell at first which descriptions the escape sequences belong to.
[+] [-] fernly|13 years ago|reply
Chrome, mac os.
[+] [-] tsergiu|13 years ago|reply
[+] [-] jebblue|13 years ago|reply
[+] [-] tsergiu|13 years ago|reply
[+] [-] jes|13 years ago|reply
[+] [-] tsergiu|13 years ago|reply