(no title)
robertkoss | 5 months ago
If you use Athena you still have to worry about shuffling and joining, it is just hidden.. It is Trino / Presto under the hood and if you click explain you can see the execution plan, which is essentially the same as looking into the SparkUI.
Who cares about JVM versions nowadays? No one is hosting Spark themselves.
Literally every tool now supports DataFrame AND SQL APIs and to me there is no reason to pick up SQL if you are familiar with a little bit of Python
datadrivenangel|5 months ago
https://ludic.mataroa.blog/blog/get-me-out-of-data-hell/
drej|5 months ago
Yes, I did write tests and no, I did not write 1000-line SQL (or any SQL for that matter). But I could see analysts struggle and I could see other people in other orgs just firing off simple SQL queries that did the same as non-portable Python mess that we had to keep alive. (Not to mention the far superior performance of database queries.)
But I knew how this all came to be - a manager wanted to pad their resume with some big data acronyms and as a result, we spent way too much time and money migrating to an architecture, that made everyone worse off.