I run lmstudio for ease of use on several mac studios that are fronted by a small token aware router that estimates resource usage on the mac studios.
Lots of optimization left there, but the systems are pinned most of the time so not focused on that at the moment as the gpus are the issue not the queuing.
I would like to hear more about your set up if you’re willing. Is the token aware router you’re using publicly available or something you’ve written yourself?
nickreese|3 months ago
Lots of optimization left there, but the systems are pinned most of the time so not focused on that at the moment as the gpus are the issue not the queuing.
grosswait|3 months ago