(no title)
areddyyt | 1 year ago
Part of making the one line of code work is addressing programmability. If you're on Jetson, we should load the CUDA kernels for Jetson's. If you're using a CPU, we should load the CPU kernels. CPU with AVX512, load the appropriate kernels with AVX512 instruction, etc.
The end goal is that when we introduce our custom silicon, one line of code should make it far easier to bring customers over from Jetson/any other platform because we handle loading the correct backend for them.
We know this will be bordering impossible, but it's critical to ensure we take on that burden rather than shifting it to the ML engineer.
danjl|1 year ago
areddyyt|1 year ago
On a side note, I deeply looked into every company in the space and was thoroughly unimpressed with how little they cared about the software stack to make their hardware seamlessly work. So, even if I did go to work at some other hardware company, I doubt a lot of customers would utilize the hardware.