PEP 584 – Add + and – operators to the built-in dict class

[+] kbd|7 years ago|reply

> An alternative to the + operator is the pipe | operator, which is used for set union. This suggestion did not receive much support on Python-Ideas.

That's disappointing. It's always been on my Python wish list that dicts would subclass sets, as dicts are essentially sets with values attached. Pretty much everywhere you can use a set you can use a dict and it acts like the set of its keys. For example:

    >>> s = {'a','b','c'}
    >>> d = {i: i.upper() for i in s}
    >>> list(d) == list(s)
    True

Dictionaries have been moving in this more ergonomic direction for a while. Originally, to union two dictionaries you had to say:

    >>> d2 = {'d': 'D'}
    >>>
    >>> d3 = d.copy()
    >>> d3.update(d2)
    >>> d3
    {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}

Nowadays, as the PEP points out, you can just say:

    >>> {**d, **d2}
    {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}

There's no reason you shouldn't have always been able to say d | d2, same as sets. Now I finally get my wish that dictionaries will behave more similarly to sets and they use the wrong set of operators.

[+] dan-robertson|7 years ago|reply

The most compelling reason to not do this is that (I claim) it’s not super obvious what to do when the keys are equal. In:

  { 'a' : 1 } | { 'a' : 2 }

Should the result be:

  { 'a' : 1 }

(prioritise the left hand side), or

  { 'a' : 2 }

(prioritise the right hand side), or should it raise an error? Maybe a fourth option would be do downgrade to sets of keys and give:

  { 'a' }

A fifth option is to magically merge values:

  { 'a' : 3 } or { 'a' : (1,2) }

For the first two choices one loses commutativity which means that code then suddenly has to have previously cared about it (or it will do the wrong thing), even though it didn’t previously matter, and one is always potentially losing data. The third choice is safe but could cause unforeseen problems later if shared keys only happen rarely. The fourth choice also forgets a bunch of information held in the dict.

In a language like Haskell, one can use traits to specify how to merge values (Monoid) but without traits (and a way to choose which trait to use) I think some kind of magic merge is not great.

I claim the operations one should really want with dicts are not set operations but rather more relational ones, ie {inner,outer,left,right} joins on the keys followed by some mapping to decide how to merge values.

[+] eesmith|7 years ago|reply

"dicts would subclass sets, as dicts are essentially sets with values attached"

Such a derivation would violate the Liskov substitution principle. Consider the following with set:

  x = {"one", "two"}
  y = set()
  y.update(x)
  y

It result in y being {'two', 'one'} .

Now, do the same with dict:

  y = dict()
  y.update(x)

This gives the exception: "ValueError: dictionary update sequence element #0 has length 3; 2 is required"

This means that dict cannot be used anywhere that a set can be used, which means it violates the Liskov substitution principle (see https://en.wikipedia.org/wiki/Liskov_substitution_principle ) which means that if covariant methods are needed for good design then dict cannot be a subclass of set.

[+] Sean1708|7 years ago|reply

> as dicts are essentially sets with values attached.

Interestingly enough some languages actually do the opposite. In Rust for example a set is literally just a dictionary with unit as the value[0] and unit is essentially a way of expressing the absence of a value (it takes up no space in memory, and you just can't do anything with it).

[0]: https://doc.rust-lang.org/stable/src/std/collections/hash/se...

For posterity, the above link shows:

  pub struct HashSet<T, S = RandomState> {
      map: HashMap<T, (), S>,
  }

[+] dtech|7 years ago|reply

> It's always been on my Python wish list that dicts would subclass sets, as dicts are essentially sets with values attached.

> There's no reason you shouldn't have always been able to say d | d2, same as sets.

I don't agree with this view, mainly because merging a dict is not associative, while unionizing a set is.

The actual operation for "a + b" is "add everything from b set to a", and + more closely resembles that than |.

[+] guitarbill|7 years ago|reply

dict.keys() pretty much does just that:

    >>> a = {"foo": 1}
    >>> b = {"bar"}
    >>> a.keys() | b
    {'bar', 'foo'}

As an aside, I like the plus operator. Begin able to merge two dictionaries in one line and have the result be a new dict is something I've needed often enough.

    {**d, **d2}

works, but is pretty recent and still feels weird to me (not coming from a language that makes use of destructuring a lot, like Javascript).

[+] Someone|7 years ago|reply

”that dicts would subclass sets, as dicts are essentially sets with values attached.”

I think a variant of https://en.wikipedia.org/wiki/Composition_over_inheritance applies here. Inheritance, in general, only is a good idea if there is a strong isa relation. A dictionary isn’t a set of keys, it has a set of keys.

If you want to see a dictionary as a set, I think the better view would be to see it as a set of (key,value) pairs where equality of pairs is defined as equality of the key parts, ignoring the ‘value’ parts.

I think it makes sense to require that such a set should behave identical to a dictionary, and that providing a ‘real’ dictionary is just an optimization, plus the addition of convenience functions, e.g. to get the set of keys.

If one sees things that way, one could even define the dictionary interface as taking an equality operation on the keys and a ‘value combiner’ function that combines values, and will be used in the cases you outline (that function could add integers, concatenate strings, keep the larger value, or whatever the programmer specifies)

[+] 103e|7 years ago|reply

If you treat dictionaries as sets of tuples union doesn’t work as expected: {(‘a’,1)} | {(‘a’, 2)} = {(‘a’,1), (‘a’, 2)} Same key maps two values.

[+] sametmax|7 years ago|reply

Practically, what your wish would accomplish ? Will it make most people more productive ? Produce less bug ? Learn faster ?

Most Python coders don't even use sets more than once a year. Hell, I use collections.deque more than sets.

But dicts ? We use it all the time. In fact, failing a {} + {} is a recurring disapointment in all my classrooms.

Plus, in PHP and JS, arrays/objects are the "do-it-all" data structure. And it's horrible. You see the same data structure everywhere. You have to read in details what something is, and what it's for.

It very nice that dict and set are very distincts, and that they have a distint set of operators. This way it's super easy to scan the code and know what the data structure is, and what it's used for. That's why I always teach set([1, 2]) and not {1, 2} first. It helps people to make a clear distinction in their mind.

[+] unknown|7 years ago|reply

[deleted]

[+] benj111|7 years ago|reply

Yes, its not so bad for the + case, but the - case seems non obvious at best.

[+] zestyping|7 years ago|reply

len(dict1 + dict2) does not equal len(dict1) + len(dict2) so using the + operator is nonsense.

The operators should be |, &, and -, exactly as for sets, and the behaviour defined with just three rules:

1. The keys of dict1 [op] dict2 are the elements of dict1.keys() [op] dict2.keys().

2. The values of dict2 overwrite the values of dict1.

3. When either operand is a set, it is treated as a dict whose values are None.

This yields many useful operations and is simple to explain.

merge and update:

    {'a': 1, 'b': 2} | {'b': 3, 'c': 4} => {'a': 1, 'b': 3, 'c': 4}

pick some items:

    {'a': 1, 'b': 2} & {'b': 3, 'c': 4} => {'b': 3}

remove some items:

    {'a': 1, 'b': 2} - {'b': 3, 'c': 4} => {'a': 1}

reset values of some keys:

    {'a': 1, 'b': 2} | {'b', 'c'} => {'a': 1, 'b': None, 'c': None}

ensure all keys are present:

    {'b', 'c'} | {'a': 1, 'b': 2} => {'a': 1, 'b': 2, 'c': None}

pick some items:

    {'b', 'c'} | {'a': 1, 'b': 2} => {'b': 2}

remove some items:

    {'a': 1, 'b': 2} - {'b', 'c'} => {'a': 1}

[+] gpderetta|7 years ago|reply

The length (i.e magnitude) of the sum of two algebraic vectors is also not the length of the two original vectors.

Would you not use + to represent vector sum?

[+] mwkaufma|7 years ago|reply

Why should that constraint hold? It's not even true for simple vectors under the euclidean norm:

||<1,0>|| + ||<0,1>|| != ||<1,0> + <0,1>||

[+] rbanffy|7 years ago|reply

    > {'a': 1, 'b': 2} - {'b': 3, 'c': 4} => {'a': 1}

I think that removing the key from d1 would be a bad idea if the value is not the same on both dicts. If you think the dict is a vector of named dimensions, should 'c' be -4 in the result?

I'd totally support it resulting in:

    {'a': 1, 'b': -1, 'c': -4}

[+] petters|7 years ago|reply

So much better! Overloading addition for something that behaves differently is not good.

[+] antt|7 years ago|reply

>The operators should be |, &, and -, exactly as for sets, and the behaviour defined with just three rules:

len(a) - len(b) != len(a-b) either.

I'm not really sure why you think that length should be a linear map for dictionaries. Their length is their least interesting property.

[+] jerf|7 years ago|reply

Unlike some of the other commenters, I'm fine with the + specification. + hasn't been commutative in Python for a long time.

But the - bothers me, and nobody else seems to have mentioned this. {"a": 1} - {"a": 1} = {}, sure, but it is way less obvious to me that {"a": 1} - {"a": 2} = {}, and not {"a": 1}. If you consider dictionaries as an unordered list of tuples (key, value) where keys happen to be unique and as a result of that you get nice O()-factors on access, that doesn't make sense. You went to remove ("a", 2), but saw ("a", 1) and thought, "eh, close enough". But it's not the same thing.

If you think of a dict as a set that happens to have associated values, the specification makes more sense, but if you dig into that line of thought, that turns out to be a rather weird way of thinking of them. Values really shouldn't be thought of as second-class citizens of a dict. If you are going to go this route though, {"a": 1} - {"a"} = {} (where the right-hand side is a set) actually makes more sense, without the spurious value on the right-hand side.

I'd actually rather conceive of the - operation as a "dict minus an iterable that will yield keys to remove". This has the advantage of recovering the original {"a": 1} - {"a": 2} = {} semantics that probably is what people want in practice, just via a different method. But locking the right-hand side to a dict makes it weird.

[+] marcosdumay|7 years ago|reply

> Values really shouldn't be thought of as second-class citizens of a dict.

Aren't they? If I do `d["a"] = 1; d["a"] = 2` the first assignment is completely gone after the second, I don't get a set with superimposed values on the "a" key.

[+] Areading314|7 years ago|reply

Its consistent with iteration over a dict:

  for k in my_dict:
      print(my_dict[k])

In this example it is implied that unless you specify .items(), you are only considering keys in the iteration. This would apply to the + and - operations too as I understand

[+] wodenokoto|7 years ago|reply

> Analogously with list addition, the operator version is more restrictive, and requires that both arguments are dicts, while the augmented assignment version allows anything the update method allows, such as iterables of key/value pairs.

    >>> d + [('spam', 999)]
    Traceback (most recent call last):
      ...
    TypeError: can only merge dict (not "list") to dict
    >>> d += [('spam', 999)]
    >>> print(d)
    {'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

While I get the "Because this is what lists do"-argument, I am still wondering why there is a difference in the types allowed for `+` and `+=`?

[+] amelius|7 years ago|reply

For sets I can understand what + and - means: you can add or subtract the sets (not add or remove an element directly). This should be like lists, e.g.

    [10,20] + [30]

But what + and - would mean in the case of dicts is obscure. Better to just use full method names imho.

[+] sametmax|7 years ago|reply

All my students disagree with you. They all try addition, and all expect a resulting dict with keys from both dicts. The fact the keys from the one side are prioritized is something they will learn once, just like with dict.update().

[+] comex|7 years ago|reply

The + operator looks great – I've personally experienced the papercut this solves multiple times, where it would be most natural to have "combine two dicts" operator:

    return {'a': 'b'} + other_dict

but instead I had to assign to a variable and mutate with .update(), which is much more verbose:

    x = {'a': 'b'}
    x.update(other_dict)
    return x

However, I was working in Python 2; Python 3 has

    {'a': 'b', **other_dict}

and even

    {**one_dict, **other_dict}

though the PEP mentions that the latter doesn't work in all circumstances. Still, it will be nice to have a more general operator; I personally don't really care whether it's called + or |.

On the other hand, the - operator seems... strange, in that it only considers the keys of its right-hand argument, and ignores the values. Seems like a footgun.

[+] petters|7 years ago|reply

I think overloading + so that a + b != b + a is problematic.

I know this is the case for strings and lists, but those cases are very well established.

[+] albntomat0|7 years ago|reply

It intuitively makes sense for lists and strings, as those have an order that matters.

I agree with you (and disagree with other commenters) that this particular case is more problematic

[+] quietbritishjim|7 years ago|reply

That is already the case for + on sequences though (e.g. string and list). In my experience it never causes confusion in practice in those circumstances.

[+] johncolanduoni|7 years ago|reply

I would think that adding two dictionaries will make people think of the established cases for other collections, rather than the less related case of numbers.

[+] sametmax|7 years ago|reply

And a = 1 is not equality.

Welcome to the world of programming, where we don't all try to match mathematical conventions because many of us suck at maths and are practical.

[+] mk89|7 years ago|reply

Reminds me of Scala Maps[0].

Edit: after reading more carefully,...

> Analogously with list addition, the operator version is more restrictive, and requires that both arguments are dicts, while the augmented assignment version allows anything the update method allows, such as iterables of key/value pairs.

But why? Consistency in API behavior is important, and as a user I don't want to have to read that I can add lists of pairs only with assignments. I hope the draft gets fixed.

[0]: https://docs.scala-lang.org/overviews/collections/maps.html

[+] guitarbill|7 years ago|reply

You can always allow it later, but deprecating such a "feature" is a pain. And subtle errors/outright abuse can happen with some of these automatic coercions, so Python tends to be a bit more conservative than other dynamic languages. The most (in)famous example being comparison (edit: not addition) of an integer and `None`. Allowed in Python 2, non-intuitive IMO, and responsible for a few bugs in its time. Disallowed in Python 3:

> TypeError: '<' not supported between instances of 'int' and 'NoneType'

[+] sametmax|7 years ago|reply

It's consistent with [] + () not working, but [* * , * * ()] works.

[+] speedplane|7 years ago|reply

Python continues to introduce more non-intuitive semantics that may be a small boon to the the expert class of programmers, but comes at the expense of ease of adoption for beginners. It started by making everything a generator, which are not very easy to master, and for which there were plenty of perfectly good substitutes (e.g., xrange, iteritems). And now you "add" sets of items (which you can't do in math) and when the update function worked well.

Python 3 is such a sad mess.

[+] hjk05|7 years ago|reply

In my teaching of python to newcomers (mostly coming from matlab/R or no programming background) they often try to do dict_a + dict_b, and are confused as to why that doesn’t work when list_a + list_b works fine.

It think it’s an extreme stretch to claim it’s non-intuitive.

[+] meowface|7 years ago|reply

I couldn't disagree more. Python 2 was a mess. range vs. xrange, items vs. iteritems, keys vs. iterkeys, input vs. raw_input, strings vs. Unicode strings, integer vs. float division were a mess, and were especially confusing and inconsistent for beginners.

Teaching Python 2 to beginners was always annoying for them: "ok so there's this function called input() but NEVER use it, always use raw_input(), unless you like RCE", "although all the tutorials say `for i in range()`, you should really get in the habit of using xrange() because...". Generators don't need to be explained in detail or understood by a beginner; all that really needs to be taught is the concept of iterators, and eventually, at an intermediate stage, the idea that some iterators are lazily-evaluated.

A simple dict "copy + merge" addition operator is a perfectly reasonable idea that will help beginners, not hurt them.

[+] dbrgn|7 years ago|reply

To me, the + operator for merging lists seems very intuitive.

[+] Areading314|7 years ago|reply

The update function does not work well. It is very cumbersome to have to do an in-place update. A frequent bug I see is

  def my_func(d1, d2):
      """Returns a merged dict"""
      d1.update(d2)
      return d1

The problem here is that now the d1 you have passed in has been modified to contain all the keys of d2, overriding any keys that appear in both with d2's value. Having a first-class operation that does a merge without mutating the inputs will make the language easier, not harder.

[+] varelaz|7 years ago|reply

Yeah, I hate this. Now dict.items() become not thread safe just because of iterators. It could crash anytime just because you modified dict in another thread while iteration is in progress

[+] unknown|7 years ago|reply

[deleted]

[+] rurban|7 years ago|reply

dict.merge(d, ...) and dict.diff(d, ...) are more expressive and have a cleaner semantic.

overloading arithmetic ops for string, list or dict ops might only look elegant at first sight, but discrimination needs to be done at runtime, slowing down the most important arithmetic ops, and do not help much the casual code reader. It also cannot be used in normal python code as older python will fail, only in special internal code.

normal method names can be provided by external modules, so they are backwards compatible and will find more widely adoption.

[+] sametmax|7 years ago|reply

Teacher here.

All my students eventually try {} + {}.

I'll bet on it to be the most intuitive.

[+] Znafon|7 years ago|reply

There is a conter proposal to use classmethod like this from another core developer.

[+] s17n|7 years ago|reply

It's weird to use + for a non-commutative operation, right?

[+] js2|7 years ago|reply

> The implementation will be in C. (The author of this PEP would like to make it known that he is not able to write the implementation.)

I hope this is for a reason other than the author being unfamiliar with C. Otherwise the author is cheating themselves, because adding functionality to an existing code base is probably my favorite motivator for learning a new language.

[+] varelaz|7 years ago|reply

I don't like the idea, because a + b should produce new list c without modification of both. Which is not memory optimal and cost of it is not obvious. Also dict is used a lot for sub classes and that could break a lot of existing functionality with potentially no benefit for most of the developers. I don't think that merging is very common operation for dicts and even so it could be done with 1 or 2 update function calls, but that will be obvious in that case, while '+' in deeps of duck typing code is not. Also absence of '+' operation for dicts is kind of guard for type validation in case if someone passed dict instead of integer. Which is pretty common when you parse some JSON from client.

[+] sametmax|7 years ago|reply

One of the long awaited features, rejected by Guido many times, and finally accepted. Maybe we'll get list.get(), functools.partial() as a c written builtin, pathlib.Path() as a primitive or inline try/except, one day.

[+] mroche|7 years ago|reply

    def __add__(self, other):
        if isinstance(other, dict):
            new = type(self)()  # May be a subclass of dict.
            new.update(self)
            new.update(other)
            return new

Is there something I’m missing? To me it would be cleaner and more memory/time performant to just `self.update(other)` rather than having a third list instance at operation time. But that would really only apply if you have truly massive dicts.

[+] projektfu|7 years ago|reply

Is there a real-world demand for the dictionary difference operator or is it just being proposed for completeness? I'm racking my brain to think of reasons to use it that would be more expressive than simply giving a list of keys to delete.

[+] Grue3|7 years ago|reply

I like this. The current kwargs syntax is very confusing since it behaves very differently from funcall kwargs syntax.

[+] IceDane|7 years ago|reply

This is what you get when you try to implement general mathematical concepts in a language that is horrible at expressing them. What a clusterfuck python is going to be in a few years.

144 comments