I am bad at regex, so I'm sure the example is a poor definition, but the idea is there. I can make variables that hold components of a regex, and since a regex is just a string I can compose these via concatenation.
If I did this a lot, I could build a small helper script (or probably just a set of shell functions) to maintain a library file of regex components that I can use in the shell with grep.
The article would be helped by using a full regex for ipv4 addresses - the one it uses would match invalid numbers (999.999.999.5 for instance), but the proper one is more complex (and would probably make for a better example as a result)
Also I think there's something wrong with this blog's formatting, it appears to be replacing underscores with italics even within code samples.
Though to be fair it's not obvious where to encode validation. You're saying it should be intrinsic in the parsing rules of ipv4. But there are reasons why you would want to validate in a separate pass.
For example, the more errors you parse, then the better error messages and introspection you have (structured data) when you want to validate.
It's a classic trade-off.
You're right though, doesn't make a very good example.
Kind of looks like grok. Grok lets you name patterns, then build up larger patterns by those names, and then also name the groups that it matches to those sub-patterns so you can refer to them in the data. It's built on top of regex, as each pattern can be defined by a mix of other patterns and/or regex.
refers to a pattern called TIMESTAMP_ISO8601 and calls it "timestamp" in the resulting output data structure.
In logstash, TIMESTAMP_ISO8601 is predefined in a patterns file, such as [2], which is made of up of a mix of regex and other patterns like YEAR, MONTHNUM, etc.
MONTHNUM is a regex (optional 0 followed by 1-9; or 10 through 12):
MONTHNUM (?:0?[1-9]|1[0-2])
I'm not sure what all of Rosie's base patterns are. This appears to be a valid regex though, from the example: [:alpha:]+ (regex character class and "+" meaning 1 or more). It's C instead of java which is useful in more/different places.
This is a very cool too but I feel like I'm missing something. This looks like any other PEG parser generator. The only difference I see is that it will automatically handle the case where a valid match starts somewhere other than at the start of the stream. I'm not sure that this constitutes calling it a whole language unto itself.
What separates this from tools like PEG.js[1] or pest[2]?
Yeah.. that got me thinking too. I would not have called it "new".
As of version 5.10, Perl regex engine implements a complete recursive-descent parsing. Allowing things like Regexp::Grammars[0] to exist. Perl also has a nice PEG parser framework called Pegex[1]
The example they give isn't really convincing, to me. I can see the usecase for this kind of language, but for e.g. searching for a pattern on the shell that isn't just one of a few predefined special cases, it seems like it'd still be a lot easier to compose regexes on the fly.
I don't see the difference between this and any other Context-Free Grammar specification language. Yacc is an industry standard, and even SNOBOL4 (1967) had first-class CFG datatypes. Maybe he's just excited about being able to use CFGs in the cmdline?
I was going to agree with everyone about how it’s not a language, but reading into it more, I proved myself wrong.
This is a different language insofar as it describes PEGs, not regexes, which is fundamentally different and more powerful (it can parse more things).
The naming of patterns isn’t unique, since you can just put regexes in variables in every other language too. However, the syntax in Rosie seems nicer, and sharing is easier.
This is nice within a specific usecase: Being able to make files with all the pattern chunks you use repeatedly, so you can reuse them and add to them. If you can't make files, it at least looks no worse than composing regexes on the command line, but it also doesn't look all that different.
Edit: OK, I was wrong. It is strictly more powerful than regexes, in that it can correctly match nested pairs.
Rosie has several benefits over traditional regexes, including the ability to parse recursive structures like HTML and JSON, to create new patterns by combining other patterns, and to name patterns. You can combine these named patterns into libraries which you can import and use elsewhere.
[+] [-] greggyb|7 years ago|reply
I can do this very easily on the command line or in a .<shell>rc file:
I am bad at regex, so I'm sure the example is a poor definition, but the idea is there. I can make variables that hold components of a regex, and since a regex is just a string I can compose these via concatenation.If I did this a lot, I could build a small helper script (or probably just a set of shell functions) to maintain a library file of regex components that I can use in the shell with grep.
[+] [-] taeric|7 years ago|reply
[+] [-] msoucy|7 years ago|reply
Also I think there's something wrong with this blog's formatting, it appears to be replacing underscores with italics even within code samples.
[+] [-] setr|7 years ago|reply
Ideally syntax vs semantics should be decoupled in most parsing (hence the AST)
[+] [-] wild_preference|7 years ago|reply
For example, the more errors you parse, then the better error messages and introspection you have (structured data) when you want to validate.
It's a classic trade-off.
You're right though, doesn't make a very good example.
[+] [-] drivers99|7 years ago|reply
For example this grok pattern (taken from [1] )
%{TIMESTAMP_ISO8601:timestamp} \[%{IPV4:ip};%{WORD:environment}\] %{LOGLEVEL:log_level} %{GREEDYDATA:message}
refers to a pattern called TIMESTAMP_ISO8601 and calls it "timestamp" in the resulting output data structure.
In logstash, TIMESTAMP_ISO8601 is predefined in a patterns file, such as [2], which is made of up of a mix of regex and other patterns like YEAR, MONTHNUM, etc.
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
MONTHNUM is a regex (optional 0 followed by 1-9; or 10 through 12):
MONTHNUM (?:0?[1-9]|1[0-2])
I'm not sure what all of Rosie's base patterns are. This appears to be a valid regex though, from the example: [:alpha:]+ (regex character class and "+" meaning 1 or more). It's C instead of java which is useful in more/different places.
[1] https://www.elastic.co/blog/do-you-grok-grok
[2] https://github.com/logstash-plugins/logstash-patterns-core/b...
[+] [-] eutropia|7 years ago|reply
[+] [-] neurotrace|7 years ago|reply
What separates this from tools like PEG.js[1] or pest[2]?
[1]: https://pegjs.org/ [2]: https://github.com/pest-parser/pest
[+] [-] yAak|7 years ago|reply
I guess pest is comparable then, but wasn't mature when the author started work on Rosie?
(I'd be curious for a proper comparison, but I'm not really knowledgeable in this area -- I had no idea there were so many alternatives to regex: https://en.wikipedia.org/wiki/Comparison_of_parser_generator...)
[+] [-] gabiruh|7 years ago|reply
As of version 5.10, Perl regex engine implements a complete recursive-descent parsing. Allowing things like Regexp::Grammars[0] to exist. Perl also has a nice PEG parser framework called Pegex[1]
-- [0] - https://metacpan.org/pod/Regexp::Grammars [1] - https://metacpan.org/pod/Pegex
[+] [-] KeyboardFire|7 years ago|reply
[+] [-] jmaa|7 years ago|reply
[+] [-] dblotsky|7 years ago|reply
This is a different language insofar as it describes PEGs, not regexes, which is fundamentally different and more powerful (it can parse more things).
The naming of patterns isn’t unique, since you can just put regexes in variables in every other language too. However, the syntax in Rosie seems nicer, and sharing is easier.
[+] [-] msla|7 years ago|reply
Edit: OK, I was wrong. It is strictly more powerful than regexes, in that it can correctly match nested pairs.
[+] [-] ketralnis|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] AndrewOMartin|7 years ago|reply
[+] [-] neurotrace|7 years ago|reply
> Rosie has several benefits over traditional regexes, including the ability to parse recursive structures like HTML and JSON
[+] [-] zwieback|7 years ago|reply
Rosie has several benefits over traditional regexes, including the ability to parse recursive structures like HTML and JSON, to create new patterns by combining other patterns, and to name patterns. You can combine these named patterns into libraries which you can import and use elsewhere.
[+] [-] busterarm|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]