(no title)
flebron | 10 months ago
It's a chip (and associated hardware) that can do linear algebra operations really fast. XLA and TPUs were co-designed, so as long as what you are doing is expressible in XLA's HLO language (https://openxla.org/xla/operation_semantics), the TPU can run it, and in many cases run it very efficiently. TPUs have different scaling properties than GPUs (think sparser but much larger communication), no graphics hardware inside them (no shader hardware, no raytracing hardware, etc), and a different control flow regime ("single-threaded" with very-wide SIMD primitives, as opposed to massively-multithreaded GPUs).
jxjnskkzxxhx|10 months ago
Edit: And btw, another question that I had had before was what's the difference between a tensor core and a GPU, and based on your answer, my speculative answer to that would be that the tensor core is the part inside the GPU that actually does the matmuls.