top | item 23627017

PEP 622 – Structural Pattern Matching

285 points| kragniz | 5 years ago |python.org | reply

122 comments

order
[+] justusw|5 years ago|reply
Very interesting. This PEP is still in draft state, but I am interested to see how the community will react. For me, I have a few thoughts:

1) This is really close to Erlang/Elixir pattern matching and will make fail-early code much easier to write and easier to reason about.

2) match/case means double indentation, which I see they reasoned about later in the "Rejected ideas". Might have a negative impact on readability.

3) Match is an already used word (as acknowledged by the authors), but I think this could have been a good case for actually using hard syntax. For me, perhaps because I'm used to it, Elixir's "{a, b, c} = {:hello, "world", 42}" just makes sense.

4) I hope there won't be a big flame-war debacle like with :=

5) And then finally there is the question of: "It's cool, but do we really need it? And will it increase the surprise factor?" And here I'm not sure. And again, this was the concern with the new assignment expression. The assignment expression is legitimately useful in some use cases (no more silly while True), but it might reduce the learnability of Python. Python is often used as an introductory programming language, so the impact would be that curricula need to be adjust or beginner programmers will encounter some surprising code along the road.

I can't say this is a good or bad proposal, I want to see what other opinions are out there, and what kind of projects out there in the world would really benefit from syntax like this.

[+] wool_gather|5 years ago|reply
One difference I noticed from Elixir was this:

> While matching against each case clause, a name may be bound at most once, having two name patterns with coinciding names is an error.

  match data:
    case [x, x]:  # Error!
      ...
Which is a bit of a shame. This comes in handy in Elixir to say "the same value must appear at these places in the collection". I.e. for a Python tuple pattern `(x, y, x)`, `(3, 4, 5)` would not match but `(3, 4, 3)` would.

Overall, though, I think this will be a great addition to Python. Pattern matching is generally a huge boost for expressiveness and readability, in my opinion.

[+] sillysaurusx|5 years ago|reply
I have a visceral dislike of pattern matching. Lisp shows just how much people will abuse it in real-world production codebases. It becomes impossible to understand even simple logic without comments. I’d link to some examples, but I’m on mobile; suffice to say, pull up the emacs codebase and read through some of the more advanced modules like edebug.el. I’m not certain that one uses pattern matching, but it’s a perfect example of “this codebase cannot be understood without extensive study of language features.”

You may argue that I am simply not versed enough in pattern matching. “You should study harder.” I would argue that simplicity is worth striving for.

I hope this PEP never moves beyond draft.

It’s also shocking that most people here seem to be tacitly supporting this, or happy about it. Yes, it’s cool. Yes, it might simplify a few cases. But it will also give birth to codebases that you can’t read in about, say, 5 years. And then you’ll have a bright line between people in the camp of “This is perfectly readable; it does so and so” and the rest of us regular humans that just want to build reliable systems.

And oh yes, it becomes impossible to backport to older python versions. Lovely.

[+] mdrachuk|5 years ago|reply
So it’s actually a smart switch statement.

Seems like it doesn’t create instances when you’re doing

  Node(children=[Leaf(value="("), Node(), Leaf(value=")")])
instead:

1. Node means "is instance of Node".

2. Everything in between () is "has an attribute with value".

3. List means "the attribute should be treated as a tuple of".. etc..

Very confusing, this definitely needs another syntax, because both newcomers and experienced devs will be prone to read it as plain `==`, since that's how enums and primitives will be working.

This syntax goes against Zen: It’s implicit -- when using match case expressions don't mean what they regularly mean. It’s complicated -- basically it’s another language (like regex) which is injected into Python.

I’m a big believer in this feature, it just needs some other syntax. Using {} instead of () makes it a lot better. Now no way to confuse it with simple equality.

  match node:
    case Node{children=[{Leaf{value="(", Node{}, ...}}
[+] andolanra|5 years ago|reply
It's worth noting that there's a truly massive amount of precedent in other languages for Python implementing it using the syntax as proposed. Languages that have or are planning to include pattern-matching where the pattern syntax exactly mirrors the expression syntax like this include Rust, Swift, OCaml, Haskell, C++, Ruby, Erlang, and many, many more.

I understand the worry that newcomers might struggle, but I don't think it's going to be the case: newcomers regularly learn the languages listed above without stumbling across that problem. And if Python did choose a syntax like the one you're proposing, it'd also be the odd one out among dozens of mainstream languages including this feature, which I think would be even more confusing!

[+] masklinn|5 years ago|reply
> Very confusing, this definitely needs another syntax

The entire point of structural pattern matching is that structuring and destructuring look the same.

> This syntax goes against Zen: It’s implicit -- when using match case expressions don't mean what they regularly mean.

There's nothing implicit to it. The match/case tells you that you're in a pattern-matching context.

> I’m a big believer in this feature, it just needs some other syntax. Using {} instead of () makes it a lot better. Now no way to confuse it with simple equality.

Makes it even better by… looking like set literals and losing the clear relationship between construction and deconstruction?

[+] jerf|5 years ago|reply
At the current rates, it seems like it's only going to be another 5 years or so before Python is straight-up a more complicated language than Perl 5. What it lacks in frankly bizarre corner cases it's going to make up for in subtly bizarre corner cases.

I used to feel like I could define __getattr__ or __setattr__ and understand the implications, but that's getting increasingly terrifying.

[+] BiteCode_dev|5 years ago|reply
I prefer the PEP syntax: it looks like the instanciation of the object I'm trying to match, so it makes sense to me.
[+] Tyr42|5 years ago|reply
I would say it's not a smart switch statement, since you can bind variables.

  match shape:
    case Point(x, y):
        ...
    case Rectangle(x, y, _, _):
        ...
  print(x, y)  # This works
[+] kunfuu|5 years ago|reply
I would prefer something like:

    match node:
        on (time _ 1 @ 0, foo _ 2, bar _ 3, baz _ any, status _ 1 @ -1):
            do_something()
            rematch do_something_2(foo)
'_' indicating span, and '@' indicating position of the first item. Both can be omitted.
[+] LockAndLol|5 years ago|reply
At first glance I like your suggestion better. It's really distinct from instantiation and won't lead to confusion (imo).
[+] uryga|5 years ago|reply
glad to see this! though it's a shame that the proposed `match/case` is a statement, not an expression:

> "We propose the match syntax to be a statement, not an expression. Although in many languages it is an expression, being a statement better suits the general logic of Python syntax."

no matching in lambdas unless those get overhauled too :(

instead, let's get excited for a whole bunch of this:

  match x
    case A:
      result = 'foo'
    case B:
      result = 'bar'
(i guess i'm a little salty...)
[+] rattray|5 years ago|reply
Yeah, came here to say the same thing. Disappointing to have to write this:

    match shape:
        case Square(l):
            area = l * l
        case Rectangle(l, w):
            area = l * w
        case Circle(r):
            area = (PI * r) ** 2
when I want to just write this:

    area = match shape:
        case Square(l):
            l * l
        case Rectangle(l, w):
            l * w
        case Circle(r):
            (PI * r) ** 2
I'm almost guaranteed to forget (or mistype) the `area = ` at least once in any match clause of length.
[+] BiteCode_dev|5 years ago|reply
> glad to see this! though it's a shame that the proposed `match/case` is a statement, not an expression:

> no matching in lambdas unless those get overhauled too :(

Hate it or like it, but it's congruent with the rest of Python design.

Guido has been hostile with introducing too much FP in Python.

While I find it frustrating from time to time, espacially when I come back from another language to Python, on the long run, I must say I understand.

I don't like to work with FP languages, because unless your team is really good, the code ends up hard to read and debug. It's possible to make very beautiful code in FP, but it makes it so easy to create huge chains of abstract things.

And devs are not reasonable creatures. Give them a gun with 6 bullets, they'll shoot them all, and throw some stones once it's empty.

To me, the average dev is not responsible enought with code quality, and should not be trusted. I'm all for tooling enforcing as much as you can. Linters and formatters help a lot. But sometimes, language design limiting them is a god send.

I've seen the result with Python: you can put a math teacher, a biologist and frontend dev on their first backend project, and they will make something I can stand reading.

[+] yowlingcat|5 years ago|reply
I wonder if this is intentional. FP constructs are increasingly discouraged in Python, taking reduce [1] as an example. Not that this is necessarily a bad thing, per sé -- I think it does simplify the language. But it's an opinionated decision that will discourage FP enthusiasm from finding a home in Python.

[1] http://lambda-the-ultimate.org/node/587

[+] dragonwriter|5 years ago|reply
Just as named functions replace multiline lambdas, I expect that named matcher functions (functions that consist entirely of a match where each arm is a return) will replace match expressions in python.

Abstractly, I'd rather have match expressions, too, but this does fit the rest of Python better.

[+] MattConfluence|5 years ago|reply
Lately, Elixir has dethroned Python as the language that I get the most joy from using. Pattern matching in one of the big reasons. Great to see that Python core contributors (including Guido!) wants to see this feature in Python as well! Hopefully it will be well integrated and not feel like a tacked on feature.

If you're curious about why this is so useful, and reading the (quite dry) PEP isn't your thing, I would heartily recommend playing with Elixir for a few hours. Pattern matching is a core feature of the language, you won't be able to avoid using it. The language is more Ruby-like than Python-like, but Python programmers should still have an easy time grokking it. When I was getting started I used Exercism [1] to have some simple tasks to solve.

[1] https://exercism.io/tracks/elixir

[+] mkl|5 years ago|reply
"The match and case keywords are proposed to be soft keywords, so that they are recognized as keywords at the beginning of a match statement or case block respectively, but are allowed to be used in other places as variable or argument names."

That's interesting. Python 3.6 had "async" and "await" as soft keywords, before they became reserved keywords in 3.7 [1]. However, soft keywords have just been added to Python more generally [2], so aren't such a special case anymore.

[1] https://www.python.org/dev/peps/pep-0530/

[2] https://github.com/python/cpython/pull/20370, https://github.com/python/cpython/pull/20370/files

[+] aparmentier|5 years ago|reply
Interesting proposal, but I'm cringing at yet another overload for the * symbol.

a * b == "a times b"

a * * b == "a to the power of b"

f(* a) == "call f by flattening the sequence a into args of f"

f(* * a) == "call f by flattening the map a into key value args for f"

[* a] == "match a sequence with 0 or more elements, call them a"

Am I missing something? I know these all occur in different contexts, still the general rule seems to be "* either means something multiplication-y, or means 'having something to do with a sequence' -- depends on the context". It's getting to be a bit much, no?

Note: HN is making me put spaces between * to avoid interpretation as italics.

[+] duckerude|5 years ago|reply
It's already used that way for unpacking, e.g.

  >>> [x, *other] = range(10)
  >>> x
  0
  >>> other
  [1, 2, 3, 4, 5, 6, 7, 8, 9]
So this instance of the syntax is not that novel. If it's a mistake, it's too late to fix it.

A unary asterisk before a name means the name represents a sequence of comma-separated items. If it's a name you're assigning to (LHS) that means packing the sequence into the name, if it's a name you're reading from (RHS) that means unpacking a sequence out of the name.

[+] strbean|5 years ago|reply
You're missing

    def f(*args, **kwargs):
doing the opposite of

    f(*a)
    f(**a)
[+] dragonwriter|5 years ago|reply
> Am I missing something?

Yes, the form used in function declaration (the inverse of the form in function calls) which is pretty much exactly the same as the new use, “collect a sequence of things specified individually into a list with the given name”.

[+] maxnoe|5 years ago|reply
Yes, that it is pretty much already part of the language like this.

This works:

a, *b = (1, 2, 3, 4)

b will be [2, 3, 4]

[+] BiteCode_dev|5 years ago|reply
So far I like it.

Unpacking was huge 10 years ago, but nowaday even JS has object destructuring, so python was lagging behind.

It feels like they really spent a lot of time in designing this: Python has clearly not be made for that, and they have to balance legacy design with the new feature.

I think the matching is a success in that regard, and the __match__ method is a great idea. The guards will be handy, while the '_' convention is finally something official. And thanks god for not doing the whole async/await debacle again. Breaking people's code is bad.

On the other hand, I understand the need for @sealed, but this is the kind of thing that shows that Python was not designed with type hints from the begining. Haskell devs must have a laught right now.

We can thank Guido for the PEG parser in 3.9 whichs allows him to co-author this as well.

I expect some ajustments to be made, because we will discover edge cases and performance issues, for sure. Maybe they'll change their mind on generalized unpacking: I do wish to be able to use that for dicts without having to create a whole block.

But all in all, I believe it will be the killer feature of 3.10, and while I didn't see the need to move from 3.7, walrus or not, 3.10 will be my next target for upgrade.

[+] csantini|5 years ago|reply
If you want it today, Pampy does most of it:

https://github.com/santinic/pampy

Even match on Point(x, y, _)

[+] JNRowe|5 years ago|reply
There is also switchlang¹ which isn't quite the same thing, but provides some of the functionality of PEP-622 and pampy. I believe it is notable for including a nice descriptive README, and also having a small/simple implementation.

I personally prefer the pampy internals, but quite like the context manager usage from switchlang. I don't even know which bikeshed I want to paint, let alone the colour.

1. https://github.com/mikeckennedy/python-switch

[+] whalesalad|5 years ago|reply
This is the first PEP I’m really excited about. I hope the design is given more careful consideration, though, because destructuring in this manner more broadly across the language would also be killer.

My biggest concern is the class matching syntax. I feel like that would be much better deferred to a lambda style function or similar. The syntax matches instantiating a new class instance exactly, which seems like it could cause a lot of problems for tools that read and manipulate syntax.

[+] hnlmorg|5 years ago|reply
Can someone explain to me the history behind Python's aversion to switch statements? I get Python is opinionated and I'm not trying to start a language war, it was just never clear to me why the `if ... elif` pattern was the preferred idiom.
[+] tclancy|5 years ago|reply
I came to Python after ~5 years of programming in C-style languages (about 15 years ago) and the lack of a switch statement may be the one thing about Python that demonstrably made me a better coder.

When I discovered there wasn't one, I was really annoyed and went digging for an explanation. The one I found was a suggestion if you are reaching for a switch statement, you're (probably) doing something wrong. This is not to impugn anyone else's approach or style and I will freely admit there are times when all you need is a switch and if/ else if/ else gets ugly, but most times I find the replacement for switch is not that but a dictionary holding a callable or similar. I recently showed this approach to a peer in PR and watching it click for her was awesome. She ripped out most of what she'd done and replaced some of our more ponderous permission checking with a dictionary of functions to apply.

I'd say the other thing I do when I wish I had a switch statement is realize I am writing code that is halfway to doing things The Right Way and refactor the block into smaller functions.

[+] lucideer|5 years ago|reply
Taking this question from the opposite angle, what benefits do switch statements give us over if...elif?

Not actually a heavy Python user, but even though most languages I use regularly are more switch/case-heavy, I've never quite grasped why there's two largely interchangeable ways to do the one thing.

[+] mumblemumble|5 years ago|reply
I'm not averse to switch statements in general, but, in Python specifically, I just fail to see the point. In a higher-syntax language like C, a switch statement can save a lot of clutter, and also has some different semantics. In a language like ML, match statements come with a whole lot of extra static checking.

But Python's if/elif/else syntax is already clean enough that there's just not much clutter to remove. And C-style semantics on switch statements wouldn't be acceptable. And Python is a very dynamic language. So, in the end, you would end up with something that's generally the same line count and the same semantics as an equivalent if-statement, meaning it's would be more like semantic Splenda than semantic sugar.

[+] lincolnq|5 years ago|reply
I think culturally, because “explicit is better than implicit” (from the Zen of Python). Switch statements have a lot of implicit-ness to them (implicit invocation of equality comparison, to start) and it never seemed quite necessary, given how spare Python syntax is anyway.
[+] quietbritishjim|5 years ago|reply
The article links to PEP 3103, which is a proposal to add a switch statement to the language. Interestingly, it was written by Guido himself, which you'd usually expect to give a pretty good chance of being accepted, especially since he was the sole decider of what would be accepted at the time! The rejection notice says simply:

> A quick poll during my keynote presentation at PyCon 2007 shows this proposal has no popular support. I therefore reject it.

[1] https://www.python.org/dev/peps/pep-3103/

[+] fulafel|5 years ago|reply
It's a remnant from the time when Python strived for simplicity and resisted adding syntax. Just like the decision against adding a ternary operator, etc. The alternative ways (dict of functions or if chain) were considered good enough weighed against the sin of adding relatively rarely used syntax.
[+] bjourne|5 years ago|reply
Case statements are an anti-pattern. The proper way to implement that design pattern is using either dispatch dictionaries or plain-old polymorphism.
[+] jjice|5 years ago|reply
Rust was my first experience with pattern matching, and I really learned to love it there. Seeing it come to Python will be great as well. I'm glad this is on the docket, and I can't wait to see how this draft evolves.
[+] henryiii|5 years ago|reply
I'm excited, but it seems like setting names by default is very odd. Quite a bit of the PEP is dedicated to the odd situations created by "case x" actually setting x rather than reading x ("case .x" would read x). Wouldn't this be a natural place for := ? So you would do:

  case x := _
to match and assign to x. "_" would always be the matcher. You always have access to the original item that the match was made on, so pulling out the matched items is often not needed, AFAICT. This would be explicit, and not too surprising. Then the whole dotted names part can be dropped - it works like normal Python at that point.

The PEP already suggests this for capturing parts of the match, why not just use it for all saved matches? It's more verbose, but consistent, with fewer caveats, and not always needed.

Disclaimer: My languages don't happen to include one with good pattern matching, so I'm not strongly familiar with it.

[+] dragonwriter|5 years ago|reply
With pattern matching you normally want bindings, at least local to the match construct (that the PEP proposes bindings with normal Python function scope, rather than local to the construct has plusses and minuses), so creating extra verbosity for bindings is complicating the normal case.
[+] lincolnq|5 years ago|reply
This is very exciting!

One subtle thing which I noticed is the distinction between class patterns and name patterns (bindings). In particular, it is possibly confusing that the code `case Point:` matches anything and binds it to the value Point, whereas `case Point():` checks if the thing is an instance of Point and doesn’t bind anything.

[+] duckerude|5 years ago|reply
Yeah, that seems like it's going to cause trouble, because you can make a mistake without noticing. If you write `case Point:` when you mean `case Point():` you won't get an exception or a missing name, it'll just look like it thinks all objects are Points.

Linters could help. You're shadowing `Point`, and because `case Point:` matches any value, if there's another case after that then something is wrong. But you can't always rely on linters.

[+] Spiritus|5 years ago|reply
Note that this just a draft/proposal. And there's heavy activity on the mailing list.

With that said, this has been suggested and discussed many times in the past. I imagine this will be as controversial as the walrus operator[1] was.

[1] https://www.python.org/dev/peps/pep-0572/

[+] Narann|5 years ago|reply
> case Node(children=[LParen(), RParen()]):

Is this will create a second Node instance and compare it to node?

If so, is it not less efficient performance wise than it's "counterpart" isinstance() + properties comparison?

If this method is less efficient, it could be confusing, specially for newcomer.

Am I missing something.

[+] fulafel|5 years ago|reply
The implicit isinstance the repurposing of the constructor syntax all seems very un-Pythonic.
[+] vslira|5 years ago|reply
This is great. I'm currently trying to rewrite a heavily object-oriented library into a more functional one (the rewrite is necessary because of licensing issues, and functional because the original code is a clusterf*ck of mutation) and despite the whole company working on top of Python, I was seriously considering implementing it in SML[1] specifically due to pattern matching making the underlying algorithm of the main data structure incredibly easier to reason about and implement.

[1] Yes I know coconut-lang is a thing, but I didn't want to introduce something that looks a lot like Python but isn't in our codebase

[+] loa_in_|5 years ago|reply
>Note that because equality (__eq__) is used, and the equivalency between Booleans and the integers 0 and 1, there is no practical difference between the following two:

>case True: ... case 1: ...

From practical perspective this is great, but I can imagine many cases where one could want to differentiate between those.

On one hand the number usually can be nested inside a structure and matching on == is more flexible. On the other hand matching on 'is' is still letting users relax this behaviour and allows matching on type of primitives as well.