top | item 37129687

(no title)

alex-korr | 2 years ago

Maybe I am missing something but would there ever be a scenario where taking a single albeit large sql statement and rewriting it as several pyspark scripts would result in faster runtime for your data pipeline? In most cases, this will be much much slower.

discuss

0cf8612b2e1e|2 years ago

Greatly depends on your environment. I am thankfully in an area where there are very modest timeliness requirements. Improving the speed of a job means little to me. However, improving debugability or checkpointing when things go wrong is always valuable.