It would be useful to have a tool to compare web site rendering between Firefox versions. For example, this tool could determine if any popular web sites change in appearance when html5.enable is true, or when XBL is disabled for content.
To avoid false positives, we might need to:
* Load the sites simultaneously in each browser rather than minutes apart.
* If the renderings differ on the first load, load again to detect rotating advertisements, and heuristically ignore changes near the expected change.
* Involve humans a bit, through mturk if needed.
Eventually I'd like to use this to automatically identify web-facing regressions from security fixes. Security fixes aren't supposed to affect web sites (or even reftest renderings), but can if they're buggy. The sooner we find out about regressions from security fixes, the better.
We did this once before for testing HTML5 parsing enabled versus HTML5 parsing disabled for the same build of firefox. You can check out the code here:
What we found doing that was that things didn't change all that much, it produced about 3-4 bugs, I think jgriffin would know more as he dealt with tracking down the bugs and ensuring they are filed.
We originally thought we'd use this system for comparing between versions of Firefox, but we immediately hit two issues:
1. Websites can change depending on the UA string you hit it with
2. We found very subtle false-positives when comparing between different versions of firefox due to expected changes in the platform code. It became really tough to tell the false positives from the real bugs because so much changes from release to release.
All that said, if this looks like a tool you could find a way to use, let me know as we'd be happy to help you get it into production.
Awesome! Yes, please set this up to automatically test each branch nightly against the latest branch release.
For security fixes, the false-positive problems are minor.
1. If a popular site breaks due to a user-agent string change in a security update, we want to know.
* We can blacklist http://whatsmyuseragent.com/
* We can automatically tell which changes are UA issues by spoofing the UA.
2. Security fixes are not expected to change rendering, so there shouldn't be subtle false positives.
How many sites can this tool test in a day?
It tests a site in 4-10 seconds, so quite a few per day, but the tool doesn't produce a rigid pass/fail.
If the sites differ, and they almost always do, it produces a floating point number that represents the difference.
We run the tests with flashblock and adblock but there is still enough dynamic content on lots of sites that there will be small differences.
The tool has a web interface to view the results and all sites that differ will have "diff images" created. These are 5 different images that highlight the pixel differences between each site to help track down any actual issues.
If i remember correctly in the HTML5 stuff jgriffin only found a few bugs, but they were bugs that probably couldn't have been found any other way.
Currently the site has the Alexa top 500 (minus all the porn). The idea was that people who maintain websites that want to make sure the next version of Firefox renders them correctly would put their site in the tool and get a notification when the site differs over a certain floating point threshold. However, only the submit form was completed and the code to send emails, the rest of the code and workflow weren't completed. So you can add all the pages you want pretty easily :)
Another thing we really wanted was for those same people to be able to add a greasemonkey script that removed any dynamic content from the page but that also wasn't completed.
> If the sites differ, and they almost always do, it produces a floating point
> number that represents the difference.
Is this due to ad rotation? When this happens, probably want to load the site a few more times in each browser, and complain only if it's consistent within each version in addition to being different between the versions.