top | item 46139074 Unified Vision-Language Agents – Detect, Segment, OCR, Generate and More 5 points| fzysingularity | 2 months ago |github.com 1 comment order hn newest fzysingularity|2 months ago Here's a short cookbook exploring an agentic approach to vision–language tasks: detection, segmentation, OCR, generation, and combining classical CV tools with VLM reasoning.Happy to run examples if you leave a comment.[1] IPython notebook: https://github.com/vlm-run/vlmrun-cookbook/blob/main/noteboo...[2] Colab: https://colab.research.google.com/github/vlm-run/vlmrun-cook...
fzysingularity|2 months ago Here's a short cookbook exploring an agentic approach to vision–language tasks: detection, segmentation, OCR, generation, and combining classical CV tools with VLM reasoning.Happy to run examples if you leave a comment.[1] IPython notebook: https://github.com/vlm-run/vlmrun-cookbook/blob/main/noteboo...[2] Colab: https://colab.research.google.com/github/vlm-run/vlmrun-cook...
fzysingularity|2 months ago
Happy to run examples if you leave a comment.
[1] IPython notebook: https://github.com/vlm-run/vlmrun-cookbook/blob/main/noteboo...
[2] Colab: https://colab.research.google.com/github/vlm-run/vlmrun-cook...