Just ordered a $12k mac studio w/ 512GB of integrated RAM.
Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.
LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.
Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.
Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.
I'm using it on MacBook Air M1 / 8 GB RAM with Qwen3-4B to generate summaries and tags for my vibe-coded Bloomberg Terminal-style RSS reader :-) It works fine (the laptop gets hot and slow, but fine).
Probably should just use llama.cpp server/ollama and not waste a gig of memory on Electron, but I like GUIs.
I'd love to host my own LLMs but I keep getting held back from the quality and affordability of Cloud LLMs. Why go local unless there's private data involved?
I did this a month ago and don't regret it one bit. I had a long laundry list of ML "stuff" I wanted to play with or questions to answer. There's no world in which I'm paying by the request, or token, or whatever, for hacking on fun projects. Keeping an eye on the meter is the opposite of having fun and I have absolutely nowhere I can put a loud, hot GPU (that probably has "gamer" lighting no less) in my fam's small apartment.
I genuinely cannot wrap my head around spending this much money on hardware that is dramatically inferior to hardware that costs half the price. MacOS is not even great anymore, they stopped improving their UX like a decade ago.
If the rumors about splitting CPU/GPU in new Macs are true, your MacStudio will be the last one capable of running DeepSeek R1 671B Q4. It looks like Apple had an accidental winner that will go away with the end of unified RAM.
LM Studio has quickly become the best way to run local LLMs on an Apple Silicon Mac: no offense to vllm/ollama and other terminal-based approaches, but LLMs have many levers for tweaking output and sometimes you need a UI to manage it. Now that LM Studio supports MLX models, it's one of the most efficient too.
I'm not bullish on MCP, but at the least this approach gives a good way to experiment with it for free.
I just wish they did some facelifting of UI. Right now is too colorfull for me and many different shades of similar colors. I wish they copy some color pallet from google ai studio or from trae or pycharm.
MCP terminology is already super confusing, but this seems to just introduce "MCP Host" randomly in a way that makes no sense to me at all.
> "MCP Host": applications (like LM Studio or Claude Desktop) that can connect to MCP servers, and make their resources available to models.
I think everyone else is calling this an "MCP Client", so I'm not sure why they would want to call themselves a host - makes it sound like they are hosting MCP servers (definitely something that people are doing, even though often the server is run on the same machine as the client), when in fact they are just a client? Or am I confused?
The initial experience with LMStudio and MCP doesn't seem to be great, I think their docs could do with a happy path demo for newcomers.
Upon installing the first model offered is google/gemma-3-12b - which in fairness is pretty decent compared to others.
It's not obvious how to show the right sidebar they're talking about, it's the flask icon which turns into a collapse icon when you click it.
I set the MCP up with playwright, asked it to read the top headline from HN and it got stuck on an infinite loop of navigating to Hacker News, but doing nothing with the output.
I wanted to try it out with a few other models, but figuring out how to download new models isn't obvious either, it turned out to be the search icon. Anyway other models didn't fare much better either, some outright ignored the tools despite having the capacity for 'tool use'.
Gemma3 models can follow instructions but were not trained to call tools, which is the backbone of MCP support. You would likely have a better experience with models from the Qwen3 family.
Others mentioned qwen3, but which works fine with HN stories for me, but the comments still trip it up and it'll start thinking the comments are part of the original question after a while.
I also tried the recent deepseek 8b distill, but it was much worse for tool calling than qwen3 8b.
Great to see more local AI tools supporting MCP! Recently I've also added MCP support to recurse.chat. When running locally (LLaMA.cpp and Ollama) it still needs to catch up in terms of tool calling capabilities (for example tool call accuracy / parallel tool calls) compared to the well known providers but it's starting to get pretty usable.
What models are you using on LM Studio for what task and with how much memory?
I have a 48GB macbook pro and Gemma3 (one of the abliterated ones) fits my non-code use case perfectly (generating crime stories which the reader tries to guess the killer).
I wish LM Studio had a pure daemon mode. It's better than ollama in a lot of ways but I'd rather be able to use BoltAI as the UI, as well as use it from Zed and VSCode and aider.
What I like about ollama is that it provides a self-hosted AI provider that can be used by a variety of things. LM Studio has that too, but you have to have the whole big chonky Electron UI running. Its UI is powerful but a lot less nice than e.g. BoltAI for casual use.
Oh, that horrible Electron UI. Under Windows it pegs a core on my CPU at all times!
If you're just working as a single user via the OpenAI protocol, you might want to consider koboldcpp. It bundles a GUI launcher, then starts in text-only mode. You can also tell it to just run a saved configuration, bypassing the GUI; I've successfully run it as a system service on Windows using nssm.
Though there are a lot of roleplay-centric gimmicks in its feature set, its context-shifting feature is singular. It caches the intermediate state used by your last query, extending it to build the next one. As a result you save on generation time with large contexts, and also any conversation that has been pushed out of the context window still indirectly influences the current exchange.
I wonder how LM Studio and AnythingLLM contrasts especially in upcoming months... I like AnythingLLM's workflow editor.
I'd like something to grow into for my doc-heavy job. Don't want to be installing and trying both.
I’ve been wanting to try LM Studio but I can’t figure out how to use it over local network. My desktop in the living room has the beefy GPU, but I want to use LM Studio from my laptop in bed.
Use an openai compatible API client on your laptop, and LM Studio on your server, and point the client to your server. LM Server can serve an LLM on a desired port using the openai style chat completion API. You can also install openwebui on your server and connect to it via a web browser, and configure it to use the LM Studio connection for its LLM.
I really like LM Studio but their license / terms of use are very hostile. You're in breach if you use it for anything work related - so just be careful folks!
I’m looking for something like this too. Msty is my favourite LLM UI (supports remote + local models) but unfortunately has no MCP support. It looks like they’re trying to nudge people into their web SaaS offering which I have no interest in.
chisleu|8 months ago
Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.
LM Studio is newish, and it's not a perfect interface yet, but it's fantastic at what it does which is bring local LLMs to the masses w/o them having to know much.
There is another project that people should be aware of: https://github.com/exo-explore/exo
Exo is this radically cool tool that automatically clusters all hosts on your network running Exo and uses their combined GPUs for increased throughput.
Like HPC environments, you are going to need ultra fast interconnects, but it's just IP based.
zackify|8 months ago
Get the RTX Pro 6000 for 8.5k with double the bandwidth. It will be way better
dchest|8 months ago
Probably should just use llama.cpp server/ollama and not waste a gig of memory on Electron, but I like GUIs.
imranq|8 months ago
noman-land|8 months ago
incognito124|8 months ago
Oof you were NOT joking
whatevsmate|8 months ago
datpuz|8 months ago
storus|8 months ago
prettyblocks|8 months ago
unknown|8 months ago
[deleted]
karmakaze|8 months ago
teaearlgraycold|8 months ago
sneak|8 months ago
I haven’t been using it much. All it has on it is LM Studio, Ollama, and Stats.app.
> Can't wait for it to arrive and crank up LM Studio. It's literally the first install. I'm going to download it with safari.
lol, yup. same.
wangbang|8 months ago
[deleted]
tt726259|8 months ago
[deleted]
mkagenius|8 months ago
I have one running locally with this config:
1. CodeRunner: https://github.com/BandarLabs/coderunner (I am one of the authors)minimaxir|8 months ago
I'm not bullish on MCP, but at the least this approach gives a good way to experiment with it for free.
zackify|8 months ago
pzo|8 months ago
chisleu|8 months ago
You gotta help me out. What do you see holding it back?
nix0n|8 months ago
sixhobbits|8 months ago
> "MCP Host": applications (like LM Studio or Claude Desktop) that can connect to MCP servers, and make their resources available to models.
I think everyone else is calling this an "MCP Client", so I'm not sure why they would want to call themselves a host - makes it sound like they are hosting MCP servers (definitely something that people are doing, even though often the server is run on the same machine as the client), when in fact they are just a client? Or am I confused?
guywhocodes|8 months ago
qntty|8 months ago
https://modelcontextprotocol.io/specification/2025-03-26/arc...
politelemon|8 months ago
Upon installing the first model offered is google/gemma-3-12b - which in fairness is pretty decent compared to others.
It's not obvious how to show the right sidebar they're talking about, it's the flask icon which turns into a collapse icon when you click it.
I set the MCP up with playwright, asked it to read the top headline from HN and it got stuck on an infinite loop of navigating to Hacker News, but doing nothing with the output.
I wanted to try it out with a few other models, but figuring out how to download new models isn't obvious either, it turned out to be the search icon. Anyway other models didn't fare much better either, some outright ignored the tools despite having the capacity for 'tool use'.
t1amat|8 months ago
cchance|8 months ago
Thews|8 months ago
I also tried the recent deepseek 8b distill, but it was much worse for tool calling than qwen3 8b.
xyc|8 months ago
rshemet|8 months ago
I'd love to learn more about your MCP implementation. Wanna chat?
visiondude|8 months ago
Nice to have a local option, especially for some prompts.
patates|8 months ago
I have a 48GB macbook pro and Gemma3 (one of the abliterated ones) fits my non-code use case perfectly (generating crime stories which the reader tries to guess the killer).
For code, I still call Google to use Gemini.
unknown|8 months ago
[deleted]
unknown|8 months ago
[deleted]
robbru|8 months ago
t1amat|8 months ago
api|8 months ago
What I like about ollama is that it provides a self-hosted AI provider that can be used by a variety of things. LM Studio has that too, but you have to have the whole big chonky Electron UI running. Its UI is powerful but a lot less nice than e.g. BoltAI for casual use.
rhet0rica|8 months ago
If you're just working as a single user via the OpenAI protocol, you might want to consider koboldcpp. It bundles a GUI launcher, then starts in text-only mode. You can also tell it to just run a saved configuration, bypassing the GUI; I've successfully run it as a system service on Windows using nssm.
https://github.com/LostRuins/koboldcpp/releases
Though there are a lot of roleplay-centric gimmicks in its feature set, its context-shifting feature is singular. It caches the intermediate state used by your last query, extending it to build the next one. As a result you save on generation time with large contexts, and also any conversation that has been pushed out of the context window still indirectly influences the current exchange.
SparkyMcUnicorn|8 months ago
b0dhimind|8 months ago
jtreminio|8 months ago
Any suggestions?
numpad0|8 months ago
skygazer|8 months ago
unknown|8 months ago
[deleted]
smcleod|8 months ago
jmetrikat|8 months ago
just added the `Add to LM Studio` button to the anytype mcp server, looks nice: https://github.com/anyproto/anytype-mcp
bbno4|8 months ago
cedws|8 months ago
eajr|8 months ago
cchance|8 months ago
squanchingio|8 months ago
unknown|8 months ago
[deleted]
zaps|8 months ago
maxcomperatore|8 months ago
b0a04gl|8 months ago
[deleted]
elizabethadavis|8 months ago
[deleted]
gregorym|8 months ago
[deleted]
usef-|8 months ago
simonw|8 months ago
Are you sharing any of your revenue from that $79 license fee with the https://ollama.com/ project that your app builds on top of?
v3ss0n|8 months ago