top | item 24815618

The Set data structure saved me from a world of pain

6 points| root993 | 5 years ago |sankalpjonna.com

8 comments

order

hprotagonist|5 years ago

sets are great. A slightly janky but very tidy way to know if a list of (hashable) things has any duplicates in python is to test

  len(set(mylist)) == len(mylist) 
which probably loses on big O to some more clever approaches but is just fine for small n, and very readable.

Other trivially answerable questions include “what’s in me but not in this other collection”, “what’s in both of us”, “what’s in exactly one of us”, ... etc.

cgriswald|5 years ago

I've had numerous occasions where doing set() on a python list was demonstrably faster for very large n with an unknown distribution rather than iterating through and potentially exiting early. I've always just assumed set() is doing the equivalent thing, but in the underlying C rather than in python.

root993|5 years ago

Yup, this is exactly why I went a Set. The fact that it can't contain duplicates is super handy

abhishekjha|5 years ago

Yes. Any kind of membership question is very easy to answer with a set/dict data structure.

afandian|5 years ago

Surely a SQL query with SELECT, COUNT and GROUP would be simpler (and give you all counts for the given user in one go). It would also be faster to implement reliably, faster to run, more scalable and more maintainable.

This solution requires an extra instance of Redis in addition to a SQL db (which I assume is already in the mix), and has the potential for consistency drift.

And SQL is a very close cousin of set theory.

root993|5 years ago

For a very large database where the count would be filtered based on status of the conversations and running this query for every single support channel would be expensive would it not?

aYsY4dDQ2NrcNzA|5 years ago

When I am reviewing junior engineers’ code, one of their common blind spots is recognizing when to use sets.

Time and again I have found nested loops for determining whether any elements of A are missing from B.