top | item 46331330

(no title)

bobnarizes | 2 months ago

Not Apple Foundation Models — unfortunately they’re not capable enough (yet) for understanding content and matching it to folders.

I’m using SBERT-style embedding models for the semantic matching, which works very well in practice.

For non-text content, the app also analyzes images (OCR + object recognition) using Apple’s Vision framework. That part is surprisingly powerful, especially on Apple Silicon.

> I need to do something for images that are already classified/tagged via FastVLM

What’s the concrete use case you’re targeting with this?

discuss

cpursley|2 months ago

Classifying real estate / property images. Also using Apple Vision which ain't half-bad for something on device and feeding that metadata along with what FastVLM returns into Foundation model to turn into structured output - trying to see how far a I can push that. But feels pretty limited/dated in term of capabilities vs lead edge models.

bobnarizes|2 months ago

I’ve seen a huge advantage in running everything fully local and private. Not sure if that fits your use case, though. Nearly 90% of Floxtop users choose the app mainly for that privacy focus.