Python Default Dict

[+] abcxjeb284|4 years ago|reply

Great for expressivity of multi-level dicts (excuse the goofy example):

    state2name2visited = defaultdict(lambda: defaultdict(list))
     state2name2visited[“PA”][“Joe].append(“Pittsburgh”)

[+] rekwah|4 years ago|reply

You can expand this a bit to make it n-level depth (until you blow the stack).

  def tree():
    return defaultdict(tree)

  >>> t = tree()
  >>> t['a']['b']['c'] = 10
  >>> t
  defaultdict(<function tree at 0x10c40df28>, {'a': 
  defaultdict(<function tree at 0x10c40df28>, {'b': 
  defaultdict(<function tree at 0x10c40df28>, {'c': 10})})})

[+] earthboundkid|4 years ago|reply

Python has too many ways to do it:

    >>> state2name2visited = {}
    >>> state2name2visited.setdefault("PA", {}).setdefault("Joe", []).append("Pittsburgh")
    >>> state2name2visited
    {'PA': {'Joe': ['Pittsburgh']}}

[+] solaxun|4 years ago|reply

I came across this once in Peter Norvig's Udacity CS 212 course - I think it was in the discussion forums for one of the lessons. My head promptly exploded.

[+] colpabar|4 years ago|reply

And here I was thinking you could only pass a type.

[+] mylons|4 years ago|reply

oo nice. i haven't seen this before and was doing something similar in a much uglier fashion.

[+] hyperpl|4 years ago|reply

python-box does a pretty good job on this I find.

[+] jbotz|4 years ago|reply

Here's a little "default dict puzzle" for you. A common use case for default dict is counting the number of occurrences of some string in an input file/stream by providing a default of zero and just incrementing the value for every key you see. But what if you want to use it to count the order in which we see keys, i.e. the first failed lookup initilizes to 1, the second to 2, etc? There are a number of solutions to this, I found 5 that are a few lines each, although a couple of these can be shoehorned into one-liners.

[+] BugsJustFindMe|4 years ago|reply

> The most common use case for default dict is counting the number of occurrences of some string

I don't see how you can declare what the most common use of a standard function is, so, uh, citation needed.

> But what if you want to use it to count the order in which we see keys

Don't bother. Since python 3.7 (really 3.6) you can just look at the key order after adding your strings to a normal dict comprehension because dicts are ordered now.

{s: 0 for s in strings}.keys() will give you the order. If you really want a number associated with each, you can wrap that in enumerate.

[+] klyrs|4 years ago|reply

One horribly fragile solution to this depends on global name resolution:

  d = defaultdict(lambda: len(d))

However, it's probably better to avoid defaultdict altogether and implement __missing__ in a subclass of dict.

[+] scienceman|4 years ago|reply

Most "straightforward" way I could think of:

```

import itertools, collections

cnt = itertools.count(1)

d = collections.defaultdict(lambda: next(cnt))

s = "abcad"

[d[c] for c in s]

```

[+] qwertox|4 years ago|reply

  days = {}

  day = '2021-05-05'
  if day not in days:
    days[day] = []
  days[day].append(event)

vs. just

  days = defaultdict(list)

  day = '2021-05-05'
  days[day].append(event)

what a blessing! Thanks for posting this. I do this so often.

Now maybe add an `OrderedDefaultDict`.

[+] kazinator|4 years ago|reply

  This is the TXR Lisp interactive listener of TXR 257.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  Caution: objects in heap are farther from reality than they appear.
  1> (defun pend (list item) (append list (list item)))
  pend
  2> (define-modify-macro pendto (item) pend)
  pendto
  3> (defvar x)
  x
  4> (pendto x 3)
  (3)
  5> (pendto x 4)
  (3 4)
  6> (defvar days (hash))
  days
  7> (pendto [days "2021-05-01"] 'event0)
  (event0)
  8> (pendto [days "2021-05-01"] 'event1)
  (event0 event1)
  9> (pendto [days "2021-05-01"] 'event2)
  (event0 event1 event2)
  10> days
  #H(() ("2021-05-01" (event0 event1 event2)))
  11> [days "2021-05-01"]
  (event0 event1 event2)

[+] Xophmeister|4 years ago|reply

Dictionaries are ordered since, I think, 3.6. I believe this is an implementation detail of CPython, rather than it being “standard” (e.g., I don’t know if you can rely on, say, PyPy respecting that, although I suspect it does.)

[+] nick238|4 years ago|reply

There's also the `setdefault` method you could use on ordinary dicts:

  -if day not in days:
  -  days[day] = []
  +days.setdefault(day, [])

[+] unknown|4 years ago|reply

[deleted]

[+] jhgb|4 years ago|reply

Interestingly, in Common Lisp,

  (push event (gethash day days))

does this with zero extra effort, since the default value for fetching from a hashtable is NIL. The magic of reasonable defaults... I feel like it's underappreciated sometimes.

[+] m_mueller|4 years ago|reply

If you didn't know that yet, watch Raymond Hettinger's talk "Beautiful and idiomatic python". I consider it mandatory for any professional python programmer.

[+] luizfzs|4 years ago|reply

This post shows up in the same day I was looking exactly at it. Quite interesting.

Another way of avoiding KeyError is using

  dict.get(val, default_val)

I find it a bit cleaner, since you don't have to create a function or import collection.

[1] https://docs.python.org/3/library/stdtypes.html?highlight=di...

[+] BugsJustFindMe|4 years ago|reply

> Another way of avoiding KeyError is using dict.get(val, default_val)

Except that

   my_dict = {}
   my_dict[val] = my_dict.get(val, []).append('foo')

doesn't work because list.append() doesn't return anything, while

   my_dict = defaultdict(list)
   my_dict[val].append('foo')

does the right thing.

Defaultdict is sugar for dict.setdefault, not dict.get.

[+] northisup|4 years ago|reply

You may also be interested in `dict.setdefault`. It has the same behavior as above but also sets the value for the key if it was previously missing.

    dict.setdefault(val, default_val)

[1] https://docs.python.org/3/library/stdtypes.html#dict.setdefa...

[+] tedmiston|4 years ago|reply

But of course doing it the way you described is subtly unintuitive because it leads to either (1) duplication of the default value (or the need to pass it around as a constant to every access point) and (2) does not work anywhere the dict is accessed directly like:

    d['foo']

whereas defaultdict handles the default value for you correctly independently of the access method.

IMO defaultdict is ideal for this use case.

[+] unknown|4 years ago|reply

[deleted]

[+] prepend|4 years ago|reply

This is a simple class but has been a real time saver of not having to check if keys exist all the time.

I use it quite a bit for metadata structures with optional elements so reads will still give something back, even if an empty string or a default value.

[+] WhyCause|4 years ago|reply

You do know about `.get()`, right?

if `a = {'a': 1, 'b': 2}`, and I do `cval = a.get('c', 3)`, `cval` will contain 3.

If I do `cval = a['c']`, I get an exception

[+] mazatta|4 years ago|reply

This is one of those things that I always assume other Python developers know about, but often don't.

[+] ali_m|4 years ago|reply

IMO defaultdicts are kind of dangerous, especially when passed as arguments to other calls that expect a normal dict. Silently returning a default value instead of a raising KeyError can lead to hard-to-find bugs.

I generally prefer to use .setdefault(key, default_value) with regular dicts, as it's much more explicit. If I do use a defaultdict for convenience, I will usually only use it within a limited scope, and if I'm returning it then I'll cast it back to a normal dict to avoid surprising the caller.

[+] dragonwriter|4 years ago|reply

> IMO defaultdicts are kind of dangerous, especially when passed as arguments to other calls that expect a normal dict.

A defaultdict is a normal dict[0], its just a space-efficient way of expressing a large normal dict, most of whose keys won’t be accessed.

The default function itself should throw KeyError on values that are logically not in the dict (including, due to Python’s dynamically typed nature, those which are outside of the key domain because of type.) Though in some uses you can skip out on this because its used in a very lonited scope where you know its not going to be indexed improperly.

> I generally prefer to use .setdefault(key, default_value) with regular dicts, as it's much more explicit.

I’m not sure why one would prefer one of those over the other, as they have very different use cases; certainly defaultdict isn’t a great choice for places where .setdefault makes sense, but that’s true in reverse, too.

[0] well, except for the unfortunate .get() behavior; a ReallyDefaultDict where rdd.get(k, default) works more sensibly, returning rdd[k] unless that throws KeyError, and default otherwise, would be better.

[+] st0le|4 years ago|reply

Personally think, ruby got it right where we can pass a lambda with the `key` parameter.

[+] orf|4 years ago|reply

You can with Python, it accepts any callable

[+] ReflectedImage|4 years ago|reply

Used it all the time in my commercial python jobs. The other common one is namedtuple.

[+] gradys|4 years ago|reply

Shout out to the attrs library. If you find yourself using namedtuple a lot, it's worth considering despite it being an external dependency.

https://www.attrs.org/en/stable/

[+] skindoe|4 years ago|reply

[deleted]

[+] unknown|4 years ago|reply

[deleted]

[+] sungri|4 years ago|reply

Learned it from the book “Elements of programming interviews” alongside with namedtuple and few other things

[+] unknown|4 years ago|reply

[deleted]

[+] jl2718|4 years ago|reply

Unnecessary and awkward unless you really need a functional default. Use dict.setdefault for static defaults.

[+] klyrs|4 years ago|reply

You've got that backwards. If you have a static default, that's what defaultdict is best at. If you need the default to depend on the key and you're the sole consumer of the dict, then you can use getdefault and setdefault -- but you incur the expense of computing that default whether or not you use it. If you need the default to depend on the key, and you can't control the scope that your dict will be used in, or you want to avoid computing the default when it's not needed, do that with a custom __missing__ method.

58 comments