top | item 46046556 (no title) jabron | 3 months ago What do you mean "bounding boxes"? They were talking about captions and embeddings, so a vision language model is required. discuss order hn newest Glemkloksdjf|3 months ago I suggested YOLO and non llm-vl as a lot faster alternative.Of course CLIP would be otherwise the other option than a big llm-vl one.
Glemkloksdjf|3 months ago I suggested YOLO and non llm-vl as a lot faster alternative.Of course CLIP would be otherwise the other option than a big llm-vl one.
Glemkloksdjf|3 months ago
Of course CLIP would be otherwise the other option than a big llm-vl one.