(no title)
jgoertler | 1 year ago
I would imagine that the model compilation works quite similar, but I'm not sure if TVM supports palletization.
What I believe is unique to Talaria, is that it can make recommendations for optimizations to the user for each of the layer in the network.
The system allows the user to quickly identify "problematic" layers either through the table view or the graph viewer. This works based on simulated metrics (energy consumption, latency, ...) that are collected for each layers. It then gives optimization choices for each layer, together with the implied changes to the overall (total) metrics. I'm not sure if TVM collects / exposes similar metrics.
So a large part of the system focus on the user-in-the-loop aspect of optimizing a network for inference, which is also why this paper was presented at a conference on human-computer interaction (SIGCHI).
efnx|1 year ago