top | item 38657347

(no title)

joshhart | 2 years ago

This was cancelled over a year ago - which the articles notes and is old news. It was clear the effort would have needed a very significant push that would have required a large halt in product development and management wasn't willing to stomach it due to high growth in 2020/2021. Which made sense. But LinkedIn revenue growth has heavily slowed with the pullback in tech hiring and they had the space to do it and consider it optimization time.

Also as part of Blueshift the plan was to do batch processing first but LinkedIn had a culture belief in colocation of batch compute & storage, which is against the disaggregated storage paradigm we see now. IMO this led to some dragging of feet.

Source: Worked at LinkedIn 12 years, am a director at Databricks now.

discuss

ThomasMoll|2 years ago

Not only that but the Hadoop team literally had the guy who wrote the original HDFS whitepaper. Moving a service with that much in house expertise first never made sense. I worked on one of the original Azure PoCs for Hadoop, even before Blueshift and it was immediately clear that we operated at a scale that Azure couldn't handle at the time. Our biggest cluster had over 500PB and total we had over an exabyte as of 2021 [1]. It was exorbitantly expensive to run a similar setup on VMs, and at the scale that we had I think it would have taken over 4,000 - 5,000 separate Azure Data Lake namespaces to support one of our R&D clusters. I believe most of this "make the biggest cluster you can" mentality was a hold over from the Yahoo! days.

[1] https://engineering.linkedin.com/blog/2021/the-exabyte-club-...