(no title)
rfoo | 4 months ago
IMO this likely is what you get from running the model correctly as-is (i.e. using the same weight and activation dtype), so Together is not bad.
Moonshot AI themselves and Groq likely uses some sampler tricks to eliminate schema validation errors.
So really the only thing this shows is: Nebius, Chutes, AtlasCloud could be running something else (for example further quantized model). Or bugs.
wishawa|4 months ago
Anyway, Novita is doing significantly better on the vendor verifier chart than Together, so the low quality must be partially Together's fault at least.
rfoo|4 months ago