p.s. This was lobbed onto the frontpage by the second-chance pool (https://news.ycombinator.com/item?id=26998308) and I need to make sure we don't end up with duplicate threads that way.
Released last week. Looks like all the weights are now out and published. Don’t sleep on the SAM 3D series — it’s seriously impressive. They have a human pose model which actually rigs and keeps multiple humans in a scene with objects, all from one 2D photo (!), and their straight object 3D model is by far the best I’ve played with - it got a really very good lamp with translucency and woven gems in usable shape in under 15 seconds.
Are those the actual wireframes they're showing in the demos on that page? As in, do the produced models have "normal" topology? Or are they still just kinda blobby with a ton of polygons
Side question: what are the current top goto open models for image captioning and building image embeddings dbs, with somewhat reasonable hardware requirements?
For pure image embedding, I find DINOv3 to be quite good. For multimodal embedding, maybe RzenEmbed. For captioning I would use a regular multimodal LLM, Qwen 3 or Gemma 3 or something, if your compute budget allows.
I would suggest YOLO. Depending on your domain, you might also finetune these models. Its relativly easy as they are not big LLMs but either image classification or bounding boxes.
Alternative downloads exist. You can find torrents, and match checksums against the HF downloads, but there are also mirrors and clones right there in HF which you can download without even having to log in.
For a long time I've wanted to use something like this to remove advertisements from hockey games.The moving ads on the boards are really annoying. Maybe I'll get around to actually doing that one of these days.
This would be convenient for post-production and editing of video, e.g. to aid colour grading in Davinci Resolve. Currently a lot of manual labour goes into tracking and hand-masking in grading.
I wonder if this can be used to track an object's speed. E.g. a vehicle on a road. It would need to recognize shapes, e.g. car model or average size of a bike, to guess a speed.
I do a test on multimodal LLMs where I show them a dog with 5 legs, and ask them to count how many legs the dog has. So far none of them can do it. They all say "4 legs".
Segment anything however was able to segment all 5 dog legs when prompted to. Which means that meta is doing something else under the hood here, and may lend itself to a very powerful future LLM.
Right now some of the biggest complaints people have with LLMs stems from their incompetence processing visual data. Maybe meta is onto something here.
trevorhlynn|3 months ago
https://news.ycombinator.com/item?id=45982073
dang|3 months ago
Meta Segment Anything Model 3 - https://news.ycombinator.com/item?id=45982073 - Nov 2025 (133 comments)
p.s. This was lobbed onto the frontpage by the second-chance pool (https://news.ycombinator.com/item?id=26998308) and I need to make sure we don't end up with duplicate threads that way.
stronglikedan|3 months ago
vessenes|3 months ago
Qwuke|3 months ago
nl|3 months ago
Fraterkes|3 months ago
visioninmyblood|3 months ago
retinaros|3 months ago
enoch2090|3 months ago
zubiaur|3 months ago
retinaros|3 months ago
the_duke|3 months ago
daemonologist|3 months ago
NitpickLawyer|3 months ago
Glemkloksdjf|3 months ago
I would recommend bounding boxes.
phkahler|3 months ago
aliljet|3 months ago
colkassad|3 months ago
observationist|3 months ago
knicholes|3 months ago
tschellenbach|3 months ago
shashanoid|3 months ago
bradyriddle|3 months ago
Checkout https://github.com/MiscellaneousStuff/meta-sam-demo
It's a rip of the previous sam playground. I use it for a bunch of things.
Sam 3 is incredible. I'm surprised it's not getting more attention.
vanjoe|3 months ago
cheesecompiler|3 months ago
maelito|3 months ago
Will-Reppeto|3 months ago
[deleted]
Workaccount2|3 months ago
Segment anything however was able to segment all 5 dog legs when prompted to. Which means that meta is doing something else under the hood here, and may lend itself to a very powerful future LLM.
Right now some of the biggest complaints people have with LLMs stems from their incompetence processing visual data. Maybe meta is onto something here.
jampekka|3 months ago
nerdsniper|3 months ago
PunchTornado|3 months ago