top | item 44048935

(no title)

erinaceousjones | 9 months ago

I heard it's been a difficult project to justify spending the research/computer time on at scale, because the models use an equivalent amount of compute for training and inference, but more parallelizable. So 5 times more compute units can be required and they get the work done 5 times faster. On a Google scale, that meant the hard internal sell of justifying burning through $25 million worth of compute units over 1 day instead of $5 million each day for 5 days. Something like that.

discuss

order