top | item 35647254

(no title)

ericjang | 2 years ago

Jeff was very early on in the "just scale up the big brain" idea, perhaps as early as 2012 (Andrew Ng training networks on 1000s of CPUs). This vision is sort of summarized in https://blog.google/technology/ai/introducing-pathways-next-... and fleshed out more in https://arxiv.org/abs/2203.12533, but he had been internally promoting this idea since before 2016.

When I joined Brain in 2016, I had thought the idea of training billion/trillion-parameter sparsely gated mixtures of experts was a huge waste of resources, and that the idea was incredibly naive. But it turns out he was right, and it would take ~6 more years before that was abundantly obvious to the rest of the research community.

Here's his scholar page (H index of 94) https://scholar.google.com/citations?hl=en&user=NMS69lQAAAAJ...

As a leader, he also managed the development of TensorFlow and TPU. Consider the context / time frame - the year is 2014/2015 and a lot of academics still don't believe deep learning works. Jeff pivots a >100-person org to go all-in on deep learning, invest in an upgraded version of Theano (TF) and then give it away to the community for free, and develop Google's own training chip to compete with Nvidia. These are highly non-obvious ideas that show much more spine & vision than most tech leaders. Not to mention he designed & coded large parts of TF himself!

And before that, he was doing systems engineering on non-ML stuff. It's rare to pivot as a very senior-level engineer to a completely new field and then do what he did.

Jeff certainly has made mistakes as a leader (failing to translate Google Brain's numerous fundamental breakthroughs to more ambitious AI products, and consolidating the redundant big model efforts in google research) but I would consider his high level directional bets to be incredibly prescient.

discuss

HarHarVeryFunny|2 years ago

OK - I can see the early ML push as obviously massively impactful, although by 2014/2015 we're already a couple of years after AlexNet, other frameworks such as Theano, Torch (already 10+ yrs old at that point), etc already existed, so the idea of another ML framework wasn't exactly revolutionary. I'm not sure how you'd characterize Jeff Dean's role in TensorFlow given that you're saying he lead a 100-person org, yet coded much of himself.... a hands-on technical lead perhaps?

I wonder if you know any of the history of exactly how TF's predecessor DistBelief came into being, given that this was during Andrew Ng's time at Google - who's idea was it?

The Pathways architecture is very interesting... what is the current status of this project? Is it still going to be a focus after the reorg, or too early to tell ?

ericjang|2 years ago

Jeff was the first author on the DistBelief paper - he's always been big on model-parallelism + distributing neural network knowledge on many computers https://research.google/pubs/pub40565/ . I really have to emphasize that model-parallelism of a big network sounds obvious today, but it was totally non-obvious in 2011 when they were building it out.

DistBelief was tricky to program because it was written all in C++ and Protobufs IIRC. The development of TFv1 preceded my time at Google, so I can't comment on who contributed what.

panabee|2 years ago

thanks for this insightful perspective.

1. what was the reasoning behind thinking billion/trillion parameters would be naive and wasteful? perhaps part are right and could inform improvements today.

2. can you elaborate on the failure to translate research breakthroughs, of which there are many, into ambitious AI products? do you mean commercialize them, or pursue something like alphafold? this question is especially relevant. everyone is watching to see if recent changes can bring google to its rightful place at the forefront of applied AI.