Groq's API performance reaches close to this level of performance as well. We've benchmarked performance over time and >400 tokens/s has sustained - can see here https://artificialanalysis.ai/models/mixtral-8x7b-instruct (bottom of page for over time view)
No comments yet.