(no title)
danielparks | 1 year ago
I found the operator precedence unnatural, and it looks like a lot of other folks in this thread did too. I would naturally assume `cat:dog` would be equivalent to `(cat):(dog)` rather than `ca(t:d)og`.
danielparks | 1 year ago
I found the operator precedence unnatural, and it looks like a lot of other folks in this thread did too. I would naturally assume `cat:dog` would be equivalent to `(cat):(dog)` rather than `ca(t:d)og`.
bane|1 year ago
> I would naturally assume `cat:dog` would be equivalent to `(cat):(dog)` rather than `ca(t:d)og`
It was confusing to me too until I remembered that we all kind of use regexes sort of wrong. They're "really" supposed to be considered as generators and not matchers. So IIR cat|dog as a "regular expression" (not a regex) is supposed to formaly expand to
{catog,cadog}
For matching, this set of strings can then be substring matched against some larger text.
The problem is that almost no regex matching engine actually does this, and so now they'll do all kinds of strange things either to meet our expectations, or for efficiency or something.
If you go and try a bunch of different regex tools you'll get variations that either service (cat)|(dog) or (cat)|(dog)|(ca[td]og) or something else.
So from a more formal conceptualization I think cat:dog should produce ca(t:d)og not (cat):(dog). But our experience with "regex" tools has subverted that formalization and now everybody just puts parens around expressions they want to alternate.
My real minor issue with this proposal, as interesting and well thought out as it is, is that it feels like it's just trying to get back at regular expressions as generators, which they actually are and it's coming from a place on the other side of a few decades of how we've been abusing them as regexes for user expectations. In other words, the problem is the tooling, not the syntax.
source: I've worked adjacent to this space in the past and if you've never thought of regexes as string set generators you can toy with the idea here
https://onlinestringtools.com/generate-string-from-regex
but again, how these generator tools works is also very specific. The ones I used to work with had a variety of ways to specify constraints on closures and such to restrict the generators.
Timwi|1 year ago
In pure theoretical computer science, regular expressions exist as an abstract concept independent from syntax or parsers. They are an “algebra”, which means they are composed of elements connected with operators, but they are not inherently tied to a syntax. In the most fundamental formulation of regular expressions (the one in the Chomsky hierarchy), the only operators are alteration (which modern syntaxes express as “|”), the Kleene star (“*”) and — notably — concatenation, which modern syntaxes simply omit, in a way comparable to how modern mathematics notation omits the multiplication operator when you write “2x”.
In the same way that maths needs rules to define whether “2x²” means “(2x)²” or “2(x²)”, regex syntax needs such rules too. This is called operator precedence. I’m sure you’ve heard that before, but you just might not have realized that the regular expression “ab” has an operator in it because it is typically not written.
Now I’m not going to argue that the operator precedence in maths notation is haphazard or without reason — but it is arbirary. It was arbitrarily chosen to be the most useful to mathematicians using the notation. And it turns out that giving exponentiation higher precedence than (invisible) multiplication (meaning: “2x²” means “2(x²)” rather than “(2x)²”) is more useful.
So coming back to the original example, whether “cat:dog” means “ca(t:d)og” or “(cat):(dog)” is simply a matter of defining the precedence of the “:” operator relative to the concatenation operator. You can argue (and I would agree with you) that one is more useful than the other, and therefore preferable (in the same way that “(cat)|(dog)” is more useful than “ca(t|d)og”), but neither of them is more fundamentally correct or primal or, as you put it, “supposed to formally expand to”.
unknown|1 year ago
[deleted]
c0nstantine|1 year ago
If I shift it behind concatenation there could be another problem. E.g. with non-associative : should be illegal. And I am not sure how to treat this:
cat:dog:mouse
In the current version I inject the epsilon (empty string). It looks natural E.g. to remove every second letter I could run '..:' which is technically '.(.:eps)':
echo 'abcde' | ./trre '..:'
result: 'ace'
actually ':' association could have a meaning as a composition of regular relations; but I found it too complicated for now.
kccqzy|1 year ago
I would not worry about "cat:dog:mouse" because intuitively it is clearly correct and it means replacing cat with mouse. With parentheses it could be written as "((cat:dog):mouse)".
twiss|1 year ago
kazinator|1 year ago
Then there is no syntactic special case. This is just EXPR:EXPR; the special case is that both EXPR are character class syntax, and so the tr-like range mapping applies.