top | item 39634643

(no title)

Tensor cores massively accelerate matrix multiplication with no algorithmic breakthrough. Just by being smart about how you move/cache/reuse values, operations which are considered O(1).

There should be an O notation which takes into account everything - various kinds of memory latencies, speed of logic operations, .... Obviously we have wall clock or total energy used.

discuss

david-gpu|2 years ago

> There should be an O notation which takes into account everything - various kinds of memory latencies, speed of logic operations

O notation is useful as-is because it simplifies an algorithm down to the bare bones. If such simplification doesn't capture what is important to you, you need a more complex model.

The folks designing GPUs use multiple different such models, ranging from the simple, inaccurate, understandable and fast; to the obnoxiously detailed, precise, abstruse and slow.

Not one model will ever have all the desirable properties, because they are in direct contradiction.

_kulang|2 years ago

Isn’t O notation asymptotic and a truncation? Adding an additional O(n) time to a O(n^2) algorithm would result in an O(n^2) algorithm? As far as I understand anyway. But you could make a more descriptive expression for the time based on what you know about the operations for sure

unknown|2 years ago

[deleted]