top | item 4239595

(no title)

tartley | 13 years ago

Hey. It's a nice idea, but there are problems with this. The actual output varies depending on your hardware & driver combo, and is substantially affected by the state of the graphics drivers (e.g. What types of anti-aliasing or interpolation are enabled? What color profiles are loaded?) The RGB pixel values on one machine will not match those on another machine.

You could imagine writing an 'is image almost equal' comparison, but I'm informed by those who have tried (pyglet developers) that this is substantially harder than it sounds - the differences between images are not what you would expect.

The alternative, if you want anyone else to be able to run your tests, is to tie yourself to a particular OS/hardware/driver combo. Not appealing for many projects.

Even if this could be done, this sort of 'compare snapshot' test is brittle, because, of course, we're talking about high level functional tests here, so you'd be snapshotting your whole game/application, not just limited aspects of it in a limited environment. Hence the screenshots would change all the time. Every time you added or modified any functionality you'd get a failing test and have to manually compare the images and assert that the differences were OK and then commit the new screenshot. This is ripe for overlooking small regressions, and makes subsequent bisection very difficult.

Of course, we haven't even got into the aspect that, as an end-to-end test, your test code would actually have to interpret the images and send mouse/key inputs to successfully play your game. Through to completion, of course - how else would you know your game-completion conditions were all wired up correctly?

discuss

jaredsohn|13 years ago

I agree that there are generally problems doing this; we had thought a bit about doing this at a previous job when testing an AfterEffects plug-in we were developing, but we didn't actually do it.

Just wanted to add that one technique that could allow this to work better when testing across different kinds of hardware / driver settings would be to share high-level results (i.e. for release 1200, these images seem okay) rather than actual images among testers. (So each tester would generate its own "correct" images.) Yes, it is possible that some of the images this other tester assumes are correct aren't actually correct on their machine due to the hardware configuration. But if you care about this, you would not be able to share test results, anyway. And you could still test actual rendering on different kinds of hardware in a separate pass.