(no title)
wizee | 6 months ago
I don’t have much experience with local vision models, but for text questions the latest local models are quite good. I’ve been using Qwen 3 Coder 30B-A3B a lot to analyze code locally and it has been great. While not as good as the latest big cloud models, it’s roughly on par with SOTA cloud models from late last year in my usage. I also run Qwen 3 235B-A22B 2507 Instruct on my home server, and it’s great, roughly on par with Claude 4 Sonnet in my usage (but slow of course running on my DDR4-equipped server with no GPU).
M4R5H4LL|6 months ago
filoleg|6 months ago
LinXitoW|6 months ago
Basically, isn't your data as safe/unsafe in a sharepoint folder as it is sending it to a paid inference provider?
Foobar8568|6 months ago
Managing private clients direct data is still a concern if it can be directly linked to them.
Only JB I believe have on premise infrastructure for these use cases.
helsinki|6 months ago
undefuser|6 months ago
arkonrad|6 months ago
unknown|6 months ago
[deleted]
captainregex|6 months ago
exasperaited|6 months ago
(Worth noting that "run it locally" is already Canva/Affinity's approach for Affinity Photo. Instead of a cloud-based model like Photoshop, their optional AI tools run using a local model you can download. Which I feel is the only responsible solution.)
mark_l_watson|6 months ago
Someone else responded to you about working for a financial organization and not using public APIs - another great use case.
gorbypark|6 months ago
Here's the ollama version (4.6bit quant, I think?) run with --verbose total duration: 21.193519667s load duration: 94.88375ms prompt eval count: 77 token(s) prompt eval duration: 1.482405875s prompt eval rate: 51.94 tokens/s eval count: 308 token(s) eval duration: 19.615023208s eval rate: 15.70 tokens/s
15 tokens/s is pretty decent for a low end MacBook Air (M2, 24gb of ram). Yes, it's not the ~250 tokens/s of 2.5-flash, but for my use case anything above 10 tokens/sec is good enough.
robwwilliams|6 months ago