(no title)
k2so | 1 year ago
Recently I was trying to generate text embeddings from a huggingface model. Nvidia triton and text-embedding-inference (built by huggingface) were my two options.
> why large companies are generally incapable of delivering great developer experience. I wanted to curl up and cry while trying to make nvidia-triton spit out embeddings . The error messages are cryptic and you need to have jedi like intuition to get it to work. I finally managed to get it work after like 2 days of wrangling with the extremely verbose and long-winded documentation (thanks in part to claude, helped me understand with better examples)
Triton's documentation starts off with core-principles and throughout the entire documentation, they have hyper links to other badly written documentation to ensure you know the core concepts. The only reason I had endured this was because of the supposed performance gains triton promised but underdelivered (this highly likely being I had missed some config/core-concept and did get all the juice)
On the other hand, text-embedding-inference has a two line front and centre command to pull the docker image and get running. The only delay was due to my internet speed before it started serving the embeddings. Then deploying this on our k8s infra was a breeze, minor modifications to the dockerfile and we are running. And on top, it's more performant than triton!
No comments yet.