top | item 39671950

(no title)

IainIreland | 2 years ago

(I work on SpiderMonkey.)

Benchmarking is hard. It is very easy to write a benchmark where improving your score does not improve real-world performance, and over time even a good benchmark will become less useful as the important improvements are all made. This V8 blog post about Octane is a good description of some of the issues: https://v8.dev/blog/retiring-octane

Speedometer 3, in my experience, is the least bad browser benchmark. It hits code that we know from independent evidence is important for real-world performance. We've been targeting our performance work at Speedometer 3 for the last year, and we've seen good results. My favourite example: a few years ago, we decided that initial pageload performance was our performance priority for the year, and we spent some time trying to optimize for that. Speedometer 3 is not primarily a pageload benchmark. Nevertheless, our pageload telemetry improved more from targeting Speedometer 3 than it did when we were deliberately targeting pageload. (See the pretty graphs here: https://hacks.mozilla.org/2023/10/down-and-to-the-right-fire...) This is the advantage of having a good benchmark; it speeds up the iterative cycle of identifying a potential issue, writing a patch, and evaluating the results.

discuss

order

lapcat|2 years ago

This doesn't say anything about what the scores mean.

21 is apparently better than 20, but how much better? You could say "1 better", tautologically, but how does that relate to the real world?

Driving a car 1 mile per hour faster may be better, in a sense, but even if you drove 24 hours straight, it would only gain you 24 total miles, which is almost negligible on such a long trip. Nobody would be impressed by that difference.

charcircuit|2 years ago

It means it is 5% faster. You are overcomplicating it.

Vinnl|2 years ago

Iain explained that in a reply to your other comment: https://news.ycombinator.com/item?id=39672279

> "The score is a rescaled version of inverse time" is the key here.

> If you run all the tests in half the time, your Speedometer score will double. If your score improves by 1%, it implies that you are 1% faster on the subtests.

> (There are probably some subtleties here because we're using the geometric mean to avoid putting too much weight on any individual subtest, but the rough intuition should still hold.)

bigfudge|2 years ago

I guess that’s why it’s fairly interesting to see scores thrown out in this thread on random hardware. It’s anexdata, but gives a sense of the spread/variance of scores for common platforms. I don’t think this is a number that is ever going to make much sense for consumers to use because without this sort of context it’s just going to be like the spinal tap ‘this one goes to 11’ sort of problem.