Embeddings tables aren't hard on the GPU (being only a lookup table), and the output softmax still requires you do the full matrix-multiply. The label may be sparse, but the computation is far from sparse.
> The reverse is true, embeddings are both the performance and memory-footprint bottleneck of modern NN models.
They may be a bottleneck, but the alternative is worse -- you can't fit complex models with large vocabularies into GPU memory using sparse one-hot encodings.
yvdriess|6 years ago
Check figure 6. of : https://arxiv.org/pdf/1906.00091.pdf
Embeddings are used to lookup sparse features, so you have those pesky data-dependent lookups.
zeroxfe|6 years ago
They may be a bottleneck, but the alternative is worse -- you can't fit complex models with large vocabularies into GPU memory using sparse one-hot encodings.
Der_Einzige|6 years ago
See libraries like magnitude for proper embedding lookup implementations