(no title)
nevi-me | 3 months ago
Beyond Spark (one shouldn't really be using vanilla Spark anyways, see Apache Comet or Databricks Photon), distributing my compute makes sense because if a job takes an hour to run, (ignoring overnight jobs) there will be a bunch of people waiting for that data for an hour.
If I run a 6 node cluster that makes the data available in 10 minutes, then I save in waiting time. And if I have 10 of those jobs that need to run at the same time, then I need a burst of compute to handle that.
That 6 node cluster might not make sense on-prem unless I can use the compute for something else, which is where PAYG on some cloud vendor makes sense.
No comments yet.