(no title)
Eliovp | 3 months ago
We actually think solutions like theirs are good for the ecosystem, they make it easier for people to at least try AMD without throwing away their CUDA code.
Our point is simply this: if you want top-end performance (big LLMs, specific floating point support, serious throughput/latency), translation alone is not enough. At that point you have to focus on hardware-specific tuning: CDNA kernel shapes, MFMA GEMMs, ROCm-specific attention/TP, KV-cache, etc.
That’s the layer we work on: we don’t replace people’s engines, we just push the AMD hardware as hard as it can go.
No comments yet.