(no title)
mmmmpancakes | 2 years ago
Where multiprocessing shines is when you have an algorithm that can be fully parallelized and represented in a baby map-reduce framework, where the data being sent to each process isn't too big. The idiom I often reuse is literally the first example in the python docs:
from multiprocessing import Pool
def f(x):
return x\*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
# to give more of a map reduce flavor:
print(sum(p.map(f, [1, 2, 3])))
This can be modified to fit a pretty big range of tasks in my use cases and chop hours off my workflows. It's so much faster in DS workflows to just have extra cores and use them than to spin up a cluster for distributed compute.
No comments yet.