top | item 35348675

(no title)

goodwanghan | 2 years ago

Hello whinvik, I totally agree the debugging experience of Spark is suboptimal.

But why?

I don't think the main reason is too much 'magic', I think it is because of the coupling of user's logic and the complicated backend logic. When there is an issue, not only you have to wait for a long time to get the error back, but also you get the error of the whole execution chain, and you have to find the real error in it, which is challenging even for experienced Spark users.

Not only Spark, if you use Dask or Ray, you will find very similar experience. Can we directly make Spark Dask Ray easier to debug?

I doubt. This is because the complexity of these systems are necessary. And since the problem can happen anywhere either on the user side or on the backend side, the whole stack has to be investigated. It is inevitable, otherwise, people will start to complain the real errors are not captured by the system.

What can we do about it?

Fugue's idea is to let you iterate on your own logic locally before using these complex systems. So you can get quicker and less noisy feedbacks. The challenge is how to unify the semantic so you don't need to change code moving between local and distributed. And how to trim down the system errors so you can focus more on your own problems. These are what Fugue solves.

It's all about good practices

It's common sense that if you can keep your logic agnostic to backends, and if you can iterate locally on small data, the development experience can be a lot better. But why big data developers tend not to follow these practices? Because there is too much discrepancy created by the distributed backends intentionally or unintentionally. We created such abstraction for big data developers to get more rewards than punishment when following good practices, so they are more willing to follow good practices.

discuss

order

No comments yet.