top | item 40957959

(no title)

austinvhuang | 1 year ago

The data that is out there is reasonably promising with WebGPU already in use in some production ML inference engines. TVM of course is way ahead of the curve as usual - https://tvm.apache.org/2020/05/14/compiling-machine-learning... though this post is quite old now.

It's still early days for pushing compute use cases to WebGPU (OctoML being super early notwithstanding). There's a small matmul in the examples directory but it only has the most basic tiling optimizations. One of my goals the next few weeks is porting the transformer block kernels from llm.c - I think that will flesh out the picture far better. If there's interest, happy to collaborate + could potentially do a writeup if there's enough interest.

There's always some tradeoffs that comes with portability, but part of my goal with gpu.cpp is to create a scaffold to experiment and see how far we can push portable GPU performance.

discuss

order

No comments yet.