top | item 4390131

Is it a must for every programmer to learn regular expressions?

31 points| riyadparvez | 13 years ago |programmers.stackexchange.com | reply

58 comments

order
[+] bermanoid|13 years ago|reply
Yes, it is, without a doubt. It's one of the most universal tricks of the trade that you'll literally never regret learning, mainly because just about any environment you'll ever work in will by necessity support regexes, and in many, it will be the primary way you interact with text.

But it's also a must that you realize that despite the fact that they're exceptionally useful and widely supported, regexes are a disgusting abomination, one that we should be absolutely mortified to be associated with. It's one of the worst syntaxes to ever be invented, and every one of us should feel the cold stink of UX failure wash over us every time we write a regex. If we ever catch ourselves writing a DSL that in any way, shape, or form resembles regular expressions, we should stop immediately and ask what the fuck is wrong with us and why we're being so opaque and random. Regular expressions are quite literally one of the worst syntaxes to ever be introduced in our field.

I worry a lot when someone doesn't know regular expressions at all. But I worry far more when someone thinks they're beautiful. That person has far too high a tolerance for unintuitive syntax and code, and will cause vastly more damage to my codebase than even the rank amateur that still uses "goto" on a regular basis.

Which is not to say that we shouldn't lean on regexes heavily anyways when they're appropriate - as programmers our primary job description is that we're paid good money to work with shitty interfaces in order to express simple ideas and algorithms.

[+] padolsey|13 years ago|reply
How would you design a regular expression syntax more intuitively? Personally, I find beauty and simplicity in regular expressions. Sure, they can grow to hideous atrocities, but you can achieve such disastrous feats with any language/syntax. Maybe you could back up your claim of regexes being a disgusting abomination with, at the very least, anecdotal evidence.
[+] grovulent|13 years ago|reply
Hmm - you know I actually think it's not correct to look at regex as an interface - even though we use it as one. It's really more accurate to look at it as a grammar (type 3 if I remember correctly).

Anything that comes out of the whole chomskian hierarchy stuff isn't going to look intuitive. But the point is that it is a particular, very rigorously defined system of representation. And various systems of representation are always more or less intuitively accessible - and come with a whole set of trade offs around what they can represent vs their ease of use etc...

These things just are - written into the laws of the world. They are discovered - not invented. We pick them up and use them as we would rocks left lying around. We find better ones when we can and fashion better ones when we can too...

[+] Karunamon|13 years ago|reply
>It's one of the worst syntaxes to ever be invented, and every one of us should feel the cold stink of UX failure wash over us every time we write a regex.

Okay, so they're ugly and hard to read at first glance. Considering the purpose of a regex, I can't think of another way to implement them that doesn't involve typing more characters needlessly (therefore even making it more hard to comprehend).

[+] qlkzy|13 years ago|reply
I think that regular expressions have a good syntax for most typical (small) uses of regular expressions. Maybe the choice of special characters isn't ideal, and it might be nice to have 'English' versions of more special characters (similar to e.g. [[:digit:]] in POSIX), but for small regexes the (mostly) one-to-one mapping between characters in the pattern and characters in the string is a very nice and intuitive syntax.

I think the real problem is that we lack (or don't learn) good tools to bridge the gap between regular expressions and 'custom parser'. We're reluctant to refactor from '1 line of just-starting-to-be-horrible regex' to tens or hundreds of lines (depending on language and libraries) to do it 'properly', and so we end up stretching regular expressions beyond the point where they make life easier.

Perl has Parse::RecDescent (and probably several others), which is pretty close to the right thing, and clearly it's very doable in a lot of languages - anyone got any suggestions in other languages?

[+] itmag|13 years ago|reply
I could imagine a Jquery-like DSL where something like /^[A-Z]+[0-9]{2}$/ could be expressed like Match(str).BeginsWith().BigCaseAlpha().AtLeastOne().Numeric().FixedLength(2).EndsWith()

Of course, you would also have to be able to nest these for more advanced matching...

Is there something like this already in existence? :)

[+] InclinedPlane|13 years ago|reply
Please name 2 concrete ways to improve the "interface" of a standard regex.
[+] peter_l_downs|13 years ago|reply
I really liked eykanal's answer [1].

    > Regular expressions are a tool. It happens to be a
    > very useful tool, so many people choose to learn how
    > to use it. However, there's no "requirement" for 
    > you to learn how to use this particular tool, any
    > more than there is a "requirement" for you to learn
    > anything else.
Nails it. I do think most programmers will eventually run across a problem to which the solution is 'Use Regex', but it's not an absolute "must" like boolean logic.

[1] http://programmers.stackexchange.com/questions/133968/is-it-...

[+] stinos|13 years ago|reply
good point indeed. I spent my first years programming DSP algorythms on embedded systems. Hardly any strings used, let alone there was any need for regular expressions. Learning their ins and outs back then would have been a waste of time, and would have been like a bricklayer having a fork in his toolbag.
[+] Drakim|13 years ago|reply
No requirement to learn anything else? Reading and writing are pretty good skills no matter what your situation.
[+] wwweston|13 years ago|reply
Probably worth revisiting this bit of commentary from Rob Pike:

"Regular expressions are hard to write, hard to write well, and can be expensive relative to other technologies... Standard lexing and parsing techniques are so easy to write, so general, and so adaptable there's no reason to use regular expressions.

"Another way to look at it is that lexers and parsing are matching statically-defined patterns, but regular expressions' strength is that they provide a way to express patterns dynamically. They're great in text editors and search tools, but when you know at compile time whatall the things are you're looking for, regular expressions bring far more generality and flexibility than you need.

"Encouraging regular expressions as a panacea for all text processing problems is not only lazy and poor engineering, it also reinforces their use by people who shouldn't be using them at all."

http://commandcenter.blogspot.com/2011/08/regular-expression...

http://news.ycombinator.com/item?id=2915137

Personally, I think they're pretty darn useful and too powerful to not learn, but Pike's comment makes me think that maybe they're also a crutch that I've relied on too much rather than learning enough about lexing/parsing.

[+] xyzzyz|13 years ago|reply
While knowledge how to use regular expression is invaluable, I also recommend learning how they actually work under the hood. It really gives a good lesson when regular expressions are applicable, and when they're not. From my experience, while many programmers are apt in tools like regexps or grammar->parser generators, they very rarely know how it actually works, which results in people trying to parse HTML with regexps or similar things. It is also a good starting point to some very interesting theoretical stuff like the theory of computations.
[+] InclinedPlane|13 years ago|reply
Also, you see people doing simplistic string comparisons using regexes. Which is ok sometimes but is an easy target if your system has performance issues.
[+] johnwatson11218|13 years ago|reply
I find one the hardest aspects of using a regex is all the subtle differences between languages/environments. I use them about four times a year but I feel like I have go read a mini tutorial each time. I know the concepts but I can't remember the special char for whitespace or digits in the particular language I'm working in. Also the java implementation is so bad. Having to encode it as a string with its own escaping rules is not easy. Then the 3 line api usage is a real hassle.
[+] yen223|13 years ago|reply
The ability to parse text quickly is invaluable when writing code.

I would say it's a must to take your coding skills to the next level.

[+] acqq|13 years ago|reply
There's no "a must." But I don't consider a programmer to be a good one if he doesn't know regexps.

If you ask "do you personally really need regexps" I'll tell you: don't learn them. As you're asking that question at all, I understand that you're not interested to learn and that you are looking for an excuse not to learn, so do something that interests you.

[+] richardw|13 years ago|reply
I regularly forget regex intricacies, so now have an app called "regexbuddy" for Windows. Very very useful, has a killer help file, has the ability to adjust and test regexes for many languages.

Surprisingly, here: http://www.regexbuddy.com/

The help file alone will make you want to buy it.

[+] scott_w|13 years ago|reply
I think they're a valuable as a simple tool e.g. using :s/^#// in vim.

However, I try to avoid using them in my code unless they improve readability. Using re.VERBOSE can help in Python.

If you find a regex online, you should definitely reference it in your code, to help provide background understanding, such as validating a UK Post Code.

[+] aidos|13 years ago|reply
I taught my friend how to use regular expressions for search and replace. He's found them invaluable and he's NOT a developer. He uses then to clean up and filter lists of keywords from Adwords.

I reach for refex frequently but almost never as something I add to the code I'm writing. They're just amazing for filtering and bulk editing text.

[+] einhverfr|13 years ago|reply
I was thinking about this question and it occurred to me. The programmers who probably don't need to know regexes[1] almost certainly already learned them. So I guess that's a yes.

[1] Thinking of embedded systems developers creating code for, say, automotive entertainment systems, or control code for scientific or medical hardware.

[+] DavidSJ|13 years ago|reply
[1] Thinking of embedded systems developers creating code for, say, automotive entertainment systems, or control code for scientific or medical hardware.

Don't they have log files to read?

[+] Auguste|13 years ago|reply
I think every programmer should have an understanding of what Regular Expressions are, how they can be used, and where to find a cheat sheet or reference (like www.regular-expressions.info).

It helps to know the basic syntax by memory, but you can just as easily look it up if you understand how they work.

[+] lparry|13 years ago|reply
I think every programmer should know them, but should try to avoid using them unless they really are the best solution to your problem. Too many people reach for them too soon, leaving hard to understand and hard to maintain code that could have been better expressed some other way.
[+] Axsuul|13 years ago|reply
Yes! They're not that hard and there's only so much syntax to them (vs. an actual programming language). I essentially learned from this site by just entering in random strings and trying to match them. http://rubular.com/
[+] ExpiredLink|13 years ago|reply
Regex is a good example for a bad interface. When usually half of the input needs to be escaped something must have gone fundamentally wrong. I regularly forget regular expressions and look them up again when I really need them.
[+] mootothemax|13 years ago|reply
I think it's essential to know of their existence, but not necessarily to know their syntax inside-out. Basically, enough enough that you're not writing looping and parsing code yourself, using a regex where applicable.
[+] subsystem|13 years ago|reply
The problem is that regexp is often misused as things like html parsers and e-mail validators. Learning regexp syntax without learning when and how to use it, makes you a worse programmer not a better one.
[+] brunnsbe|13 years ago|reply
At least is good to know what can be solved with regular expressions, then if you need to solve something you can always look it up in a book or use some software for it (I use RegexBuddy).