Show HN: UForm v2 – tiny CLIP-like embeddings in 21 languages and Graphcore API
16 points| vov_or | 2 years ago |github.com
It has 40% fewer parameters than vanilla CLIP while performing much better on text-to-image retrieval, where it's also beneficial that our output embeddings have 2x fewer dimensions (256 vs. 512).
Moreover, it supports 21 languages, including popular English, Hindi, Chinese, Arabic, and lower-resource languages like Ukrainian, Hebrew, and Armenian.
We have packed the library into ONNX and CoreML, providing PyTorch inference code for CPUs and GPUs and PopTorch code for Graphcore IPUs.
Demo: http://usearch-images.com/ Blog: https://www.unum.cloud/blog/2023-08-17-uform-graphcore
Looking forward to your feedback!
isaacfung|2 years ago
It seems clip performs better for prompts like "three birds", "man and woman"