top | item 33791080

(no title)

apohn | 3 years ago

IMO the reason behind this is that a lot of "data science" driven decisions are short term decisions. So you can look at something on a PowerPoint, not really care if it's wrong unless you personally will get fired if it turns out to be wrong, and back out of it a quarter later when it turns out to be wrong. IME there's no shortage of justifications or pivoting when it comes to a decision you made a quarter ago. The consequences are relatively small, so the caring is only bravado, not really caring.

When it comes to disastrous long term decisions, there's plenty of time to get input from multiple stakeholders. I always remember the armies of companies who went chasing after Hadoop because Big Data was going to transform something or the other. All the stakeholders were on board, from the CEO and CTO to IT and Engineering management. How much money and time got flushed down the toilet trying to implement and extract value from data with Hadoop. They only people who paid the consequences were the employees at Hadoop companies who thought their stock options would be worth something.

discuss

icedchai|3 years ago

About 10 years ago, I worked at a company that really wanted to use Hadoop for some reason, so I was forced to use it for a project. The amount of data we were processing was minuscule (a few hundred megabytes per run) It could've been done with a simple script on a single EC2 instance for the entire duration of the project without any scalability issues. Instead, I had to provision Hadoop clusters (dev, staging, production), fit the script into the map-reduce paradigm, write another script to kick off the job and process the results, etc. At least we were using Hadoop.