top | item 45495456

(no title)

jgraham | 4 months ago

As someone who's been quite heavily involved with web-platform-tests, I'd caution against any use of the test pass rate as a metric for anything.

That's not to belittle the considerable achievements of Ladybird; their progress is really impressive, and if web-platform-tests are helping their engineering efforts I consider that a win. New implementations of the web platform, including Ladybird, Servo, and Flow, are exciting to see.

However, web-platform-tests specifically decided to optimise for being a useful engineering tool rather than being a good metric. That means there's no real attempt to balance the testsuite across the platform; for example a surprising fraction of the overall test count is encoding tests because they're easy to generate, not because it's an especially hard problem in browser development.

We've also consciously wanted to ensure that contributing tests is low friction, both technically and socially, in order that people don't feel inclined to withhold useful tests. Again that's not the tradeoff you make for a good metric, but is the right one for a good engineering resource.

The Interop Project is designed with different tradeoffs in mind, and overcomes some of these problems by selecting a subsets of tests which are broadly agreed to represent a useful level of coverage of an important feature. But unfortunately the current setup is designed for engines that are already implementing enough feature to be usable as general purpose web-browsers.

discuss

order

tssva|4 months ago

The tweet mentions that this is an arbitrary metric thrust upon them by Apple, so I don’t think they would necessarily disagree with you. During the monthly updates they do also show the passing number of tests without including the encoding tests because of how much they skew things.

troupo|4 months ago

The problem is, there's no other good metric. We used to have Acid tests for CSS, but in absence of that, it's as good metric as any.

munchlax|4 months ago

Ladybird will be faster than anything with an arbitrary metric thrust

koolala|4 months ago

Could a hand-picked subset be selected to make that metric?

culi|4 months ago

Everything you said sounds very reasonable, yet the "Browser-Specific Failures" graph on the main page of the wpt.fyi website explicitly misleads us into thinking

PS I'm a big fan of the work and appreciate what you do. I check the interop page about once a week!

jebronie|4 months ago

As someone who's been quite heavily involved with having a brain, I'd advocate for using of the test pass rate as a metric for how many tests are passed.

manmal|4 months ago

Why are you bringing this up, when it’s not been implemented as a metric here, but because Apple requires it for iOS.

Klonoar|4 months ago

This is a headline that is very easy to misread and or misunderstand. I don’t find their comment to be that out of place at all.

hamandcheese|4 months ago

> but because Apple requires it for iOS

Therefore it is a metric used by Apple.

sleepybrett|4 months ago

Then talk to apple. They are the ones who put this bar in place.