Fancy hardware with bespoke production process, smaller economies of scale, utilization probably not that great since they are user-speed positioning and purportedly under-invested in their compiler, which has a hard job compiling for such an arch anyways. Ignoring for the moment the cost for their bespoke software stack, which they can probably amortize away eventually.
according to OpenRouter, Cerebras charges $0.65/$0.85 for 1m input/output tokens for Llama 4 Scout. Google charges $0.25/$0.70; lambda.ai charges $0.08/$0.30 for the same model.
recursivecaveat|8 months ago
totaa|7 months ago