top | item 38391503

(no title)

electricships | 2 years ago

here is my good deed for the day:

modern AI is just vector multipication. any AI chip is just 10,000s of very simple cores which can do vector float operations and little else. this also entails clever trade offs of shared cache and internal bandwidth.

(as a thought experiment, consider a naive million by million matrix multipication. this will take a single cpu about 1 year! how do we reduce this to 1s ?)

the end

discuss

order

Symmetry|2 years ago

Nowadays AI chips are specialized in not just vector multiplication but matrix multiplication. Just as moving from scalar math to vectors brings savings in control and routing logic, moving from vector to matrix does the same. Taking a result from a floating point unit and moving it to a big, multi-ported register file and then reading it out again to feed into another floating point unit is often a much bigger draw of power than the multiplication or addition itself and to the extent you can minimize that by feeding the results of one operation direction into the processing structures for the next you've got a big win.

automatic6131|2 years ago

>vector float operations and little else

I thought they were generally int8 or int16 vector multiply adds and occasionally float16 added in.

exikyut|2 years ago

As someone with a lot of interest in but no fluency with chip design, or the dividing and conquering of math within silicon, for that matter, how would you multiply a 1m² matrix?

financltravsty|2 years ago

Parallelization.

Each "unit of work" in matrix multiplication is not dependent on any other unit of work. Stuff as many cores as you can into a chip, and then simply feed in all your vectors at the same time.

I.e. basically a beefed up GPU or an "AI" chip.