top | item 36913403

(no title)

I mean, of course? Concurrency != parallelism, so that makes sense.

Where multiprocessing shines is when you have an algorithm that can be fully parallelized and represented in a baby map-reduce framework, where the data being sent to each process isn't too big. The idiom I often reuse is literally the first example in the python docs:

  from multiprocessing import Pool

  def f(x):
    return x\*x

  if __name__ == '__main__':
    with Pool(5) as p:
      print(p.map(f, [1, 2, 3]))
      
      # to give more of a map reduce flavor:
      print(sum(p.map(f, [1, 2, 3])))

This can be modified to fit a pretty big range of tasks in my use cases and chop hours off my workflows. It's so much faster in DS workflows to just have extra cores and use them than to spin up a cluster for distributed compute.

discuss

No comments yet.