(no title)
hobos_delight | 2 years ago
The reason we use these is for when we have a data set _larger_ than what can be done on a single machine.
hobos_delight | 2 years ago
The reason we use these is for when we have a data set _larger_ than what can be done on a single machine.
ralph84|2 years ago
faet|2 years ago
hobos_delight|2 years ago
saberience|2 years ago
I’ve heard this anecdote on HN before but without ever seeing actual evidence it happened, it reads like an old wives tale and I’m not sure I believe it.
I’ve worked on a Hadoop cluster and setting it up and running it takes quite serious technical skills and experience and those same technical skills and experience would mean the team wouldn’t be doing it unless they needed it.
Can you really imagine some senior data and infrastructure engineers setting up 100 nodes knowing it was for 60GB of data? Does that make any sense at all?
hiAndrewQuinn|2 years ago
OskarS|2 years ago
> Hopefully this has illustrated some points about using and abusing tools like Hadoop for data processing tasks that can better be accomplished on a single machine with simple shell commands and tools.
the8472|2 years ago
MrBuddyCasino|2 years ago
And this isn’t even wrong, bc what they need is a long-term maintainable method that scales up IF needed (rarely), is documented and survives loss of institutional knowledge three layoffs down the line.
hobos_delight|2 years ago
dagw|2 years ago
Some times the killer feature of that data analytics pipeline isn't scalability, but robustness, reproducibility and consistency.