It seems like the "ASCII puke" concrete syntax for regex patterns doesn't scale that well. Regular expressions have binary operators, parenthesization, named groups, lookaheads, etc.-- if you're building a sophisticated regex of more than 10 characters or so, why not have some kind of an object model for this stuff so you can have reasonable forms of composition, naming of intermediate values in construction of a larger pattern, and the ability to attach modifiers to things without needing to pack more @#$%&*!^ un-Googleable junk into string literals?
ebiester|9 years ago
sublimeloge|9 years ago
norswap|9 years ago
I should try to implement something like this, to see how hard it would be (I think not too much, but I might overlook something).
theamk|9 years ago
I think the article would be easier to read if all the regexes were in extended form, but I suppose that author is an expert regex user, so the examples were easy enough for him.
And finally, perl6 totally re-did text matching with "grammars" (https://docs.perl6.org/language/grammars.html) -- they use much more readable syntax, nameable groups, etc... It really is quite a wonderful thing, I with it was available in other languages.
petercooper|9 years ago
draegtun|9 years ago
Here's ebiester Icon example in the parse dialect:
And here is one translation of falsedan example:falsedan|9 years ago
e.g.
would be written aselchief|9 years ago
brudgers|9 years ago
It appears from time to time on Hacker News. The Javascript implementation has nearly 10k stars.
Johnny_Brahms|9 years ago
For anything more complex than small things that can be understood in less than 20 seconds, I use a parser generator, be it parsack (racket version of Haskell's parsec) or whichever I have at hand.
derefr|9 years ago
Regexps do not exist to be a self-documenting syntax for writing code that gets read and maintained. If you are going to sit down, write, debug, commit, and PR some code that matches strings, for heaven's sake just write your pattern in BNF and apply a parser generator to it, or use a parser combinator library.
Regexps are intended as a fluent syntax for interacting with data. Regexps exist to be arguments to sed, awk, and vim's :s command. Regexps exist to let you type an SQL query into psql that finds rows with columns matching a pattern. They're meant to be a hand-tool, used by a craftsman during the manual work of analysis that comes before the job is planned.
And as such, regexp syntax features aren't meant to be composed into multi-line monstrosities that do all the work at once; they're meant to let you match chunks, and then pipe that to another regexp that winnows those chunks down, and then another that substitutes one part of each chunk, etc.
If you've ever seen a PERL script written in "imperative mode", where every line is relying on the implicit assignment of its result to the $_ variable, each line doing one more thing to that variable, each little regexp sawing off one edge or patching one hole—that's an example of the proper use of regexps. Such a script is effectively less a "program", and more simply a record of someone's keystrokes at a REPL.
And because of that, I honestly find it a bit strange that modern compiled languages build in "first-class" native support for regexps. They make sense in "scripting" languages like Ruby and Python because those languages can indeed be used for "scripting": writing code in their REPLs to do some manual tinkering, and then maybe saving a record of what you just did in case you need it again. But in languages like Go or Elixir? Why not just give the developer a batteries-included parser-combinator library instead? (If you, as a developer, need to parse regexps to support your users querying your system by passing it regexps, they could still be available from a library. But there's no need for a literal syntax for them in such languages.)
That being said, I wouldn't mind if an IDE for a particular compiled language accepted regular-expression syntax as a sort of Input Method Editor: you'd hit Ctrl+Shift+R or somesuch, a little "Input regexp: " window would pop up over your cursor, and then as you wrote and modified the regexp in the window, the equivalent BNF grammar would appear inside a text-selection at the cursor. That's a good use of regexps: to allow you to fluently, quickly create BNF grammars. As if they were a synthesizer keyboard, with each keystroke immediately performing a function.
unknown|9 years ago
[deleted]