I have been developing a new regression test framework, primarily for layout, but since it's end-to-end it also tests HTTP, Gfx/Widget, parser, content, views, etc. It's inspired by Hixie's test engine for Opera. It's fairly straightforward and requires only very minor changes to the Mozilla codebase. Basically we set up a special local Web server that feeds Mozilla a XUL app to drive the tests and a set of testcases. The XUL app loads each testcase and then signals the server (by requesting a magic URL) that the testcase is loaded. The server then takes a screenshot of the Mozilla window and replies to the URL request, causing Mozilla to move to the next testcase. The result is a set of PNGs, one per testcase. Regression testing consists of building a baseline set of PNGs and then rerunning the tests with a modified Mozilla, and comparing the PNGs. This is all implemented in one big Perl script. Currently it only works on Unix-like systems because it uses fork() and X-specific graphics commands. I wanted to use Xvfb for headless operation and fast screenshotting, but Xvfb is broken in RH9, so for now I'm running the tests on the current X display and using ImageMagick's import command. Runtime is dominated by the time taken by the screenshots so I'd like to get Xvfb working eventually. There are some complications to the above description. Mozilla may crash or hang on some testcases and the script needs to detect that, kill Mozilla, and resume with the remaining tests. Some tests are unsuitable for this approach because they're animated so screenshots will not always return the same contents. My script supports a "classify" mode where it runs the testcases a few times and checks that Mozilla reports a consistent image every time. To ensure that the screen is fully updated before the screenshot is taken, I patched Mozilla so that when the right environment variable is set, we flush all reflows and force repaint after firing onload, and then also print a message on STDOUT. The server watches for this message and takes the screenshot only after the message appears. This isn't completely done yet but it is usable now. I want to add an "image comparison" feature to compare directories full of PNGs and generate a DHTML report visually highlighting any differences.
oh dear. I already checked in the paint-forcing patch by mistake! http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsDocumentViewer.cpp&branch=&root=/cvsroot&subdir=mozilla/content/base/src&command=DIFF_FRAMESET&rev1=1.347&rev2=1.348 http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsPresShell.cpp&branch=&root=/cvsroot&subdir=mozilla/layout/html/base/src&command=DIFF_FRAMESET&rev1=3.679&rev2=3.680 (see call to EndUpdateViewBatch) Well, er, that simplifies things :-)
Created attachment 138877 [details] testrunner.pl checkpoint of the current state of the testing script. All commands should work as advertised. There are still a few features I need to add: -- an image comparison report generator -- expose 'chunk delay' option --- tells the server to pause in the middle of feeding HTML pages to Mozilla, to test incremental reflows -- need to change input syntax so that # introduces a line comment, and change classifier output to report "# OK", "# FAILURE", "# MISMATCH", so you can write echo *.html | testrunner.pl classify | grep OK | testrunner.pl -m other/mozilla and of course this needs to be run on larger test suites and any testrunner bugs fixed.
A couple more notes before I forget: > setTimeout(nextFrame, 1000); I put this in because without it, no window ever appears. I'm not sure why. I should try reducing 1000 to 1, but as it's only used for the first frame it's not really an issue. Other than this there are no built-in delays. The tests will run as fast as the system can go. This probably leaves zombie processes around. I need to put a wait() after close(<RUNNER>), at least.
My Xft build appears to produce different antialiasing pixels in different runs. Is there a way to stop Xft from antialiasing by setting an environment variable or something?
I guess I can launch mozilla with a custom fonts.conf pointed to by FONTCONFIG_FILE
Um, that seems bad. Anti-aliasing should be completely deterministic.
I have subpixel positioning on. Maybe that's doing it.
It shouldn't, assuming your window is always in the same place (you full-screen the window, right?).
> you full-screen the window, right? At the moment I'm setting the window to 400x600. The window is not always at the same place. Anyway, I've written the code to turn off antialiasing, and I've made all the other changes mentioned here, and now I'm just polishing up the script so it's not as write-only.
> At the moment I'm setting the window to 400x600. The window is not always at > the same place. Ah. I recommend full-screening the window. :-) > Anyway, I've written the code to turn off antialiasing Generally for this kind of script you want the test to be as close as possible to what end-users are actually going to see.
Making the window any given size is easy, but the bigger it is, the slower everything runs. 400x600 seems like a good size for most testcases. It would be nice if we could run with antialiasing, but with Xft, we can't. I did some more tests; turning off subpixel positioning helps, but there are still a few cases where I get different pixel values unless I turn off antialiasing completely. I realized that background image loads don't block onload firing. This is a problem in some testcases. bz, if you're reading this, would it be hard to toggle that behavior if MOZ_FORCE_PAINT_AFTER_ONLOAD is set at runtime? I still have at least one bug to shake out that is stopping me from running the testcases in layout/html/tests. There's another bug that is not too serious but I don't know how to fix yet: tests with IFRAMEs fire onload events when those IFRAMEs load, and that spits out "PAINT FORCED" messages, and I don't know how to distinguish those from the top-level IFRAME. Maybe I'll add some goop to my nsPresShell patch.
> would it be hard to toggle that behavior if MOZ_FORCE_PAINT_AFTER_ONLOAD is set > at runtime? See http://lxr.mozilla.org/seamonkey/source/layout/base/src/nsImageLoader.cpp#120 -- you'd want to not pass the LOAD_BACKGROUND flag there if you want them to affect onload (just pass nsIRequest::LOAD_NORMAL). As for iframes, is this the problem with load events bubbling in XUL and such? Or are you using a capturing listener or something?
Great, I'll make a patch to the image loader. Thanks! The problem is that my change to DocumentViewerImpl::LoadComplete prints a message after onload has fired and we've finished painting --- for any IFRAME that we load. So when we print the "paint forced" message, the server script doesn't know whether the message refers to a child frame or to the real testcase. Probably I should just have DocumentViewerImpl::LoadComplete include the document URL in the message.
> Making the window any given size is easy, but the bigger it is, the slower > everything runs. 400x600 seems like a good size for most testcases. This seems weird... The Opera regression tests I did run at full-screen 1600x1200 and work fine. Is Mozilla really that much slower?
No, it's the time required for screenshotting that is the bottleneck. Also, I have a feeling that a narrower window will induce more interesting wrapping behaviours.
Created attachment 139591 [details] New version New iteration of the script. It does everything I've mentioned in this bug. I've successfully run this over all the testcases under layout/html/tests. Of these tests -- 138 are classified "FAILED" (Mozilla crashed, or hung, or the onload event failed to fire on the top level document --- this seems to happen quite often on framesets, and it also happens on tests that try to print themselves) -- 66 are classified "MISMATCH" (We got different results depending on the timing of the screenshot; I need to look into these more closely, but some of them are no doubt animated images, or scripts --- a lot of print tests fell into this category too) -- 1391 are classified "OK" (We got identical results over 3 iterations with varying timing of the screenshot in each iteration)
Hmm... Pretty much anything in the FAILED section is a bug, no?
Yes, most FAILED testcases probably are bugs. Some of them could even be bugs in the test framework. I'll look into it.
Created attachment 139602 [details] [diff] [review] more core changes Here are the additional changes that I need in nsDocumentViewer and nsImageLoader, as discussed above.
Created attachment 139727 [details] updated script One more update. This makes image diff do something sensible even if the images end up in a different format or size. Also, get rid of some alerts from the JS controller because the alert box cripples any subsequent tests.
The classify run that I did had some problems, namely that my wife was using the computer at the time and that corrupted some of the tests :-). The real numbers are FAILED 133, MISMATCH 12, OK 1450. A demo of the imagediff report for the 12 mismatches is here: http://ocallahan.org/mozilla/testrunner/demo1/index.html
Note the disturbing one pixel difference in the table test case. I think Xft just isn't very good about deterministic rendering, even though all antialiasing is off.
I'm supposed to take delivery of a new home machine today. I'm planning to make my old machine into a headless server, running (among other things) a continuous layout regression tester based on this code.
I would actually expect that Xft would do deterministic rendering. Keith?
Comment on attachment 139602 [details] [diff] [review] more core changes Looks fine. r+sr=bzbarsky. I assume the idea is to be able to run this in non-debug mode, right?
Yes, absolutely. We want to be able to do regression tests with opt builds.
checked in patch 139602
Robert, this bug is obsolete?