top | item 47073784

(no title)

terhechte | 10 days ago

Did you try an MLX version of this model? In theory it should run a bit faster. I'm hesitant to download multiple versions though.

discuss

tarruda|10 days ago

Haven't tried. I'm too used to llama.cpp at this point to switch to something else. I like being able to just run a model and automatically get:

- OpenAI completions endpoint

- Anthropic messages endpoint

- OpenAI responses endpoint

- A slick looking web UI

Without having to install anything else.

KerrAvon|10 days ago

Is there a reliable way to run MLX models? On my M1 Max, LM Studio seems to output garbage through the API server sometimes even when the LM Studio chat with the same model is perfectly fine. llama.cpp variants generally always just work.