top | item 41530615

(no title)

Tarean | 1 year ago

For Regex I like lens-regex-pcre

    > import Control.Regex.Lens.Text
    > "Foo, bar" ^.. [regex|\p{L}+|] . match
    ["Foo", "bar"]
    > "Foo, bar" & [regex|\p{L}+|] . ix 1 . match %~ T.intersperse '-' . T.toUpper
    "Foo, B-A-R"

For web requests wreq has a nice interface. The openssl bindings come from a different library so it does need an extra config line, the wreq docs have this example:

    import OpenSSL.Session (context)
    import Network.HTTP.Client.OpenSSL

    let opts = defaults & manager .~ Left (opensslManagerSettings context)
    withOpenSSL $
      getWith opts "https://httpbin.org/get"

There are native Haskell tls implementations that you could plug into the manager config. But openssl is probably the more mature option.

discuss

HelloNurse|1 year ago

You are matching ASCII letters? Cute. What about Unicode character classes like \p{Spacing_Combining_Mark} and non-BMP characters?

Can you translate the examples at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... to Haskell? This Control.Regex.Lens.Text library doesn't seem to believe in documenting the supported syntax, options, etc.

tome|1 year ago

"Cute" comes across as very dismissive. I'm not sure if you intended that. lens-regex-pcre is just a wrapper around PCRE, so anything that works in PCRE will work, for example, from your Mozilla reference:

    ghci> "California rolls $6.99\nCrunchy rolls $8.49\nShrimp tempura $10.99" ^.. [regex|\p{Sc}\s*[\d.,]+|] . match
    ["$6.99","$8.49","$10.99"]

"Spacing combining mark" seems to be "Mc" so this works:

https://unicode.org/reports/tr18/#General_Category_Property

    ghci> "foo bar \x093b baz" ^.. [regex|\p{Mc}|] . match

["\2363"]

(U+093b is a spacing combining mark, according to https://graphemica.com/categories/spacing-combining-mark)

I think in general that Haskellers would probably move to parser combinators in preference to regex when things get this complicated. I mean, who wants to read "\p{Sc}\s*[\d.,]+" in any case?

Tarean|1 year ago

Either hackernews or autocorrect ate the p, it was supposed to be \p{L} which is a unicode character class.

As the other comment mentioned pcre-compatible Regex are a standard, though the pcre spec isn't super readable. There are some projects that have more readable docs like mariadb and PHP, but it doesn't really make sense to repeat the spec in library docs https://www.php.net/manual/en/regexp.reference.unicode.php

There are libraries for pcre2 or gnu regex syntax with the same API if you prefer those