It's even more difficult because, while all the benchmarks provide some kind of 'averaged' performance metric for comparison, in my experience most users have pretty specific regular use cases, and pretty specific personal background knowledge. For instance I have a background in ML, 15 years experience in full stack programming, and primarily use LLMs for generating interface prototypes for new product concepts. We use a lot of react and chakraui for that, and I consistently get the best results out of Gemini pro for that. I tried all the available options and settled on that as the best for me and my use case. It's not the best for marketing boilerplate, or probably a million other use cases, but for me, in this particular niche it's clearly the best. Beyond that the benchmarks are irrelevant.
No comments yet.