top | item 35212232

(no title)

pidge | 3 years ago

Yeah, given that everything is now multi-core, it makes sense to use a natively parallel tool for anything compute-bound. And Spark will happily run locally and (unlike previous big data paradigms) doesn’t require excessive mental contortions.

Of course while you’re at it, you should probably just convert all your JSON into Parquet to speed up successive queries…

discuss

No comments yet.