(no title)
cschneid | 3 months ago
They've claimed repeatedly in their discord that they don't quantize models.
The speed of things does change how you interact with it I think. I had this new GLM model hooked up to opencode as the harness with their $50/mo subscription plan. It was seriously fast to answer questions, although there are still big pauses in workflow when the per-minute request cap is hit.
I got a meaningful refactor done, maybe a touch faster than I would have in claude code + sonnet? But my human interaction with it felt like the slow part.
alyxya|3 months ago