top | item 37174954

Show HN: UForm v2 – tiny CLIP-like embeddings in 21 languages and Graphcore API

16 points| vov_or | 2 years ago |github.com

I want to share the most recent model release we have prepared. It's a Vision-Language understanding Transformer.

It has 40% fewer parameters than vanilla CLIP while performing much better on text-to-image retrieval, where it's also beneficial that our output embeddings have 2x fewer dimensions (256 vs. 512).

Moreover, it supports 21 languages, including popular English, Hindi, Chinese, Arabic, and lower-resource languages like Ukrainian, Hebrew, and Armenian.

We have packed the library into ONNX and CoreML, providing PyTorch inference code for CPUs and GPUs and PopTorch code for Graphcore IPUs.

Demo: http://usearch-images.com/ Blog: https://www.unum.cloud/blog/2023-08-17-uform-graphcore

Looking forward to your feedback!

1 comment

isaacfung|2 years ago

I did some tests and compared with the clip demo on https://huggingface.co/spaces/vivien/clip

It seems clip performs better for prompts like "three birds", "man and woman"