The #regex IRC channel had an IRC bot with a quiz with 28 levels.
All sensibility ended after level 14 or so. At that point it was just "how deep does the PCRE rabbit-hole go?"
But there was a lot of useful, non-trivial stuff, too. Most specifically, look-aheads/lookbehinds, non-greedy matching, back-references, named capture-groups, character classes, anchors,
When I learned jq, I went much the same way: Started hanging out on #jq IRC channels and started trying to answer jq questions on StackOverflow. Sadly, I got outperformed the first six months, until it finally clicked.
The resources from Jan Goyvaerts / Just Great Software are great! His guides and to some extent tools is how I learned it too. Today I tend to often be the Regex go-to guy among colleagues, all seemingly because I learned to properly get the hang of the basics via Jan's resources.
I read Mastering Regular Expressions by Jeffrey Friedl 20 years ago when I was in middle school, front to back, and it's probably been the best investment of reading time I've ever made.
I had read on some tech site, some years ago, that Friedl worked at Yahoo for a while, and IIRC in a role involving a lot of text munging, which would probably have involved a lot of regular expression usage, maybe across the many web properties they had in that period, which included Yahoo Search, Yahoo Mail, Yahoo Groups, Yahoo Finance, and many more.
Found that interesting.
I had bought his book around that time or sometime later, but never read it fully, partly because I used to go cross-eyed from reading the text with all the italics and other highlighting (of the regexes in action) in a small font, which was probably needed to explain regex concepts, but still ...
but after reading this thread, I feel motivated to dive into regex again, at least at the shallow end of the pool, although I have dabbled in it and used it now and then in my work, before now.
I'm really surprised by the low quantity of people who learned by trying and instead read whole books or manuals. Had some code or whatnot that needed mass-replacing and used the built-in RegEx find-and-replace (I think it was EditPlus in those days). First learned how to match the exact string then extrapolate from there using {}, (), replacing, etc. It's a lot easier to learn when you need to solve a practical, immediate problem.
That's how I learned. I'm a self taught dev so I pretty much just took the same approach. Read documentation, try it out, read more docs, try it out, read some examples, search for how other people do it, etc. At a certain point, you just know it and can solve 90% of your needs without looking at docs. Although, tbh, I haven't written a very complicated regex in probably a decade and would need to do some warm up reps if I needed to today.
I agree you don’t need “whole books or manuals”, but how do you learn by trying alone?
The search space is enormous, and even if you stumble on a code fragment that appears to work, how do you know your code actually does what you think it does, and how do you know there isn’t a more efficient or readable way to do what you want to do? Case in point, you wrote:
> then extrapolate from there using {}, (), replacing, etc.
How did you find out about those, if not from reading (likely followed by some trying to check your understanding)? I think you have to read, not “whole books”, but ‘just’ the right documentation, where ‘right’ depends on the tool you use. For example man regex may be sufficient. That, you can read in a few minutes.
Yeah this is wild to me, maybe it’s a generational thing? I never “learned” regex. I’ve written hundreds of them but I figured out what I needed and then I moved on.
"Learned" in University but it wasn't until Jeff Friedl's Perl Conference talks that I really became one with the regex engine. He taught you how to think like the regex engine and thus how regular expressions would be interpreted and thus how to write them. Then I got a master class in RE from Tom Christiansen when we were writing the Perl Cookbook.
Jeff wrote "Mastering Regular Expressions", which grew from that talk. You probably want a copy even though it was first released in 1997. For the mindset of RE, you can't beat it.
Learning REs is a roll through:
* how matching happens (advancing, matching, backtracking)
* using * ? and {} to match repetitions
* greediness and stinginess within the RE
* character classes, both [manual] and escapes like \s \W etc
* anchors and "what a line is"
* grouping and backreferences
* accessing groups outside the RE
* substitution and access backrefs in substitutions
* find ALL the matches
* complex parsing (just don't, it's rare not to regret it)
and then it's an absolutely epic deep-dive into the minutiae of what line starts and ends might be, Unicode and regex, code to be executed from within the regex enging, using code to BUILD regex and worrying about when escaping happens or doesn't, denial of service regex, etc. that will take you through ASCII, various Unix tool chains over time, and a bunch of other fun stuff.
I need to build a Regex a couple of times a year, and have always wondered whether others learn it and store it in their brain-cache, or whether they too need to look it up each time.
Learned regex in the 90's from the Perl documentation, or possibly one of the oreilly perl references. That was a time where printed language references were more convenient than searching the internet. Perl still includes a shell component for accessing it's documentation, that was invaluable in those ancient times. Perl's regex documentation is rather fantastic.
A simple way to test a regex you're building is this website, which offers immediate parsing and documentation of your regex, lets you test it against various inputs, and lets you choose which language's regex parser you are targeting.
Practice, the more you use them the easier they become. I never studied them but knew when to use them, then just tinkered and iterated until the pattern did what I needed it to. After a while you can mostly just write and read them without much tinkering.
So, you can observe what kind of state machine is produced from any given Regular Expression. You can also use it to merge and such manipulate state machines, or simplify Regular Expressions.
Easy, I learn it every time I unfortunately need to use it through painful trial and error, and searching. Thankfully the are online evaluators now, but now you need to figure out which regex is being used.
Then I forget it, and have unreadable mystery functions laying around that I hope don’t have bugs.
But at least it’s a single line!
Seriously though, my actual need for them is low, so I avoid the things as much as I would avoid inlining assembly.
That is a hard question because there are so many ways that one can understand regex. I learned how to read and use them using Unix tools like sed, but I think that my path to starting to understand them probably began with papers like "Regular Expression Matching Can Be Simple And Fast" by Russ Cox (https://swtch.com/~rsc/regexp/regexp1.html), well after feeling like I was pretty good at using them.
Then, as an expert in linguistic morphology, I started learning about things like subregular languages, as talked about in works such as Aural Pattern Recognition Experiments and the Subregular Hierarchy, by Rogers and Pullum (https://www.cs.earlham.edu/~jrogers/JoLLI.pdf). And I continue to wonder what the relationship is between these classes of languages and word formation.
Piece by piece, googling "how to do X in regex". But that was slow and didn't have a great foundation.
Then I learned Perl and started learning RegEx properly. Now somehow I've turned into one of those wizards I admired in the Stack overflow answers section. It wasn't until I had to teach RegEx to a junior that I realized how far I'd come.
One of the things I remember being difficult at the beginning was the subtle differences between implementations, like `^` meaning "beginning of line" in Ruby (and others) but meaning "beginning of string" in JavaScript (and others).
If you're just starting out, it'd be helpful to read about how a regex engine evaluates an expression against a string so that you can understand the "order of operations" and how repeating elements are matched.
For me the biggest hurdle was learning what they were 'for' and that took a long time. The real magic for me was capture groups - I could now suddenly see why you'd have a regex and not just string matching.
Then it was about knowing a situation or a problem when regexes would apply and knowing how to look up the things I needed to solve that problem. Some regex 'phrases' are good for grepping, others for find and replace. Some will help you swap names around, some to reformat phone numbers.
After a while the phrases give way to general understanding and certain things become fluent.
I still only really write short or basic regexes, but I use them all the time in editing text or doing things that are a little bit complicated but actually a short regex just turns it from a hard problem into an easy problem.
Start with https://regexone.com/ fun puzzle style interactive tutorial to grasp the basics.
After that it's the matter of either using it with your CLI tools or applying it to problems you are working on.
My first jobs were heavily focused on parsing data from HTML and regex was (and still is) the most common solution for the majority of cases
To learn it, I played a lot of regex golf [1]
I also enabled regular expressions in my code editor's Find feature so every search I'd make used regex. Having it enabled in my editor made learning it more immersive and useful, especially when combined with things like find-and-replace. I highly recommend permanently enabling that in your editor as well
Also, challenge your coworkers to see who can make the shortest patterns for a variety of cases and see whose is the most versatile. It's always a fun time
That's my go-to these days, but sometimes I like to see a diagram from this one: https://regexper.com
I've just slowly learnt it by experimenting with it over the past few years. People have mostly mentioned matching, but I use it more for string manipulation.
I'm still not as intermediate a programmer as I'd like to be, so it's great when I need to invert a design decision for example. A similar code structure in multiple places, maybe across multiple files. It also means I don't miss anything, like I would if I did it manually.
Regex the "env specific variant" or regex the concept (as it applies to theory of computation) etc?
The former I can never remember beyond the basics (*, +, ?, |). Even the | I go extra cautious and put in tons of parenthesis. If I ever need matching and grouping I resort to rtfm.
Now that latter, that's the more interesting and fun one!! Learnt it in college decades ago but really drilled it in by reimplementing Russ Cox's amazing Thomson nfa blog and breakdown in typescript!
[+] [-] sshine|1 year ago|reply
I only used the bare minimum for years.
I also hung out on a #regex IRC channel, so I got exposed to questions and answers by many people.
Later I read up on https://www.regular-expressions.info/ which has a lot of very good explanations.
The #regex IRC channel had an IRC bot with a quiz with 28 levels.
All sensibility ended after level 14 or so. At that point it was just "how deep does the PCRE rabbit-hole go?"
But there was a lot of useful, non-trivial stuff, too. Most specifically, look-aheads/lookbehinds, non-greedy matching, back-references, named capture-groups, character classes, anchors,
When I learned jq, I went much the same way: Started hanging out on #jq IRC channels and started trying to answer jq questions on StackOverflow. Sadly, I got outperformed the first six months, until it finally clicked.
[+] [-] samuell|1 year ago|reply
[+] [-] murderfs|1 year ago|reply
[+] [-] fuzztester|1 year ago|reply
Found that interesting.
I had bought his book around that time or sometime later, but never read it fully, partly because I used to go cross-eyed from reading the text with all the italics and other highlighting (of the regexes in action) in a small font, which was probably needed to explain regex concepts, but still ...
but after reading this thread, I feel motivated to dive into regex again, at least at the shallow end of the pool, although I have dabbled in it and used it now and then in my work, before now.
[+] [-] proactivesvcs|1 year ago|reply
[+] [-] floxy|1 year ago|reply
[+] [-] RhysU|1 year ago|reply
[+] [-] hiAndrewQuinn|1 year ago|reply
[+] [-] orochimaaru|1 year ago|reply
That being said - regex is a superpower.
[+] [-] jeff-hykin|1 year ago|reply
[+] [-] gregjor|1 year ago|reply
[+] [-] pluc|1 year ago|reply
[+] [-] conductr|1 year ago|reply
[+] [-] Someone|1 year ago|reply
The search space is enormous, and even if you stumble on a code fragment that appears to work, how do you know your code actually does what you think it does, and how do you know there isn’t a more efficient or readable way to do what you want to do? Case in point, you wrote:
> then extrapolate from there using {}, (), replacing, etc.
How did you find out about those, if not from reading (likely followed by some trying to check your understanding)? I think you have to read, not “whole books”, but ‘just’ the right documentation, where ‘right’ depends on the tool you use. For example man regex may be sufficient. That, you can read in a few minutes.
[+] [-] idontwantthis|1 year ago|reply
[+] [-] gnat|1 year ago|reply
Jeff wrote "Mastering Regular Expressions", which grew from that talk. You probably want a copy even though it was first released in 1997. For the mindset of RE, you can't beat it.
Learning REs is a roll through:
and then it's an absolutely epic deep-dive into the minutiae of what line starts and ends might be, Unicode and regex, code to be executed from within the regex enging, using code to BUILD regex and worrying about when escaping happens or doesn't, denial of service regex, etc. that will take you through ASCII, various Unix tool chains over time, and a bunch of other fun stuff.[+] [-] urbandw311er|1 year ago|reply
[+] [-] profsummergig|1 year ago|reply
[+] [-] IncandescentGas|1 year ago|reply
`perldoc perlre` from your terminal.
or https://perldoc.perl.org/perlre
A simple way to test a regex you're building is this website, which offers immediate parsing and documentation of your regex, lets you test it against various inputs, and lets you choose which language's regex parser you are targeting.
https://regexr.com/
[+] [-] rtheunissen|1 year ago|reply
Regex101 is an excellent tool.
[+] [-] pjkundert|1 year ago|reply
https://github.com/qntm/greenery
So, you can observe what kind of state machine is produced from any given Regular Expression. You can also use it to merge and such manipulate state machines, or simplify Regular Expressions.
Quite helpful.
[+] [-] Modified3019|1 year ago|reply
Then I forget it, and have unreadable mystery functions laying around that I hope don’t have bugs.
But at least it’s a single line!
Seriously though, my actual need for them is low, so I avoid the things as much as I would avoid inlining assembly.
[+] [-] sinkasapa|1 year ago|reply
Then, as an expert in linguistic morphology, I started learning about things like subregular languages, as talked about in works such as Aural Pattern Recognition Experiments and the Subregular Hierarchy, by Rogers and Pullum (https://www.cs.earlham.edu/~jrogers/JoLLI.pdf). And I continue to wonder what the relationship is between these classes of languages and word formation.
[+] [-] RadiozRadioz|1 year ago|reply
Then I learned Perl and started learning RegEx properly. Now somehow I've turned into one of those wizards I admired in the Stack overflow answers section. It wasn't until I had to teach RegEx to a junior that I realized how far I'd come.
[+] [-] nnf|1 year ago|reply
One of the things I remember being difficult at the beginning was the subtle differences between implementations, like `^` meaning "beginning of line" in Ruby (and others) but meaning "beginning of string" in JavaScript (and others).
If you're just starting out, it'd be helpful to read about how a regex engine evaluates an expression against a string so that you can understand the "order of operations" and how repeating elements are matched.
[+] [-] riffraff|1 year ago|reply
It's been many years but I remember it as both thorough and easy to understand.
[+] [-] Zhyl|1 year ago|reply
Then it was about knowing a situation or a problem when regexes would apply and knowing how to look up the things I needed to solve that problem. Some regex 'phrases' are good for grepping, others for find and replace. Some will help you swap names around, some to reformat phone numbers.
After a while the phrases give way to general understanding and certain things become fluent.
I still only really write short or basic regexes, but I use them all the time in editing text or doing things that are a little bit complicated but actually a short regex just turns it from a hard problem into an easy problem.
[+] [-] userm0d|1 year ago|reply
[+] [-] Minor49er|1 year ago|reply
To learn it, I played a lot of regex golf [1]
I also enabled regular expressions in my code editor's Find feature so every search I'd make used regex. Having it enabled in my editor made learning it more immersive and useful, especially when combined with things like find-and-replace. I highly recommend permanently enabling that in your editor as well
Also, challenge your coworkers to see who can make the shortest patterns for a variety of cases and see whose is the most versatile. It's always a fun time
[1] https://alf.nu/RegexGolf
[+] [-] pseudo_meta|1 year ago|reply
[+] [-] Cordiali|1 year ago|reply
I've just slowly learnt it by experimenting with it over the past few years. People have mostly mentioned matching, but I use it more for string manipulation.
I'm still not as intermediate a programmer as I'd like to be, so it's great when I need to invert a design decision for example. A similar code structure in multiple places, maybe across multiple files. It also means I don't miss anything, like I would if I did it manually.
[+] [-] jacinda|1 year ago|reply
Super intuitive and great definitions / descriptions of everything.
[+] [-] mhotchen|1 year ago|reply
and
https://blog.stevenlevithan.com/
[+] [-] wcarss|1 year ago|reply
Later, grepping logs was a pretty similar application that needed and extended those skills.
[+] [-] flashgordon|1 year ago|reply
The former I can never remember beyond the basics (*, +, ?, |). Even the | I go extra cautious and put in tons of parenthesis. If I ever need matching and grouping I resort to rtfm.
Now that latter, that's the more interesting and fun one!! Learnt it in college decades ago but really drilled it in by reimplementing Russ Cox's amazing Thomson nfa blog and breakdown in typescript!