Closed
Bug 230697
Opened 21 years ago
Closed 14 years ago
New automated regression test framework
Categories
(Core :: Layout, enhancement)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: roc, Assigned: roc)
Details
Attachments
(2 files, 2 obsolete files)
2.59 KB,
patch
|
bzbarsky
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
27.06 KB,
text/plain
|
Details |
I have been developing a new regression test framework, primarily for layout, but since it's end-to-end it also tests HTTP, Gfx/Widget, parser, content, views, etc. It's inspired by Hixie's test engine for Opera. It's fairly straightforward and requires only very minor changes to the Mozilla codebase. Basically we set up a special local Web server that feeds Mozilla a XUL app to drive the tests and a set of testcases. The XUL app loads each testcase and then signals the server (by requesting a magic URL) that the testcase is loaded. The server then takes a screenshot of the Mozilla window and replies to the URL request, causing Mozilla to move to the next testcase. The result is a set of PNGs, one per testcase. Regression testing consists of building a baseline set of PNGs and then rerunning the tests with a modified Mozilla, and comparing the PNGs. This is all implemented in one big Perl script. Currently it only works on Unix-like systems because it uses fork() and X-specific graphics commands. I wanted to use Xvfb for headless operation and fast screenshotting, but Xvfb is broken in RH9, so for now I'm running the tests on the current X display and using ImageMagick's import command. Runtime is dominated by the time taken by the screenshots so I'd like to get Xvfb working eventually. There are some complications to the above description. Mozilla may crash or hang on some testcases and the script needs to detect that, kill Mozilla, and resume with the remaining tests. Some tests are unsuitable for this approach because they're animated so screenshots will not always return the same contents. My script supports a "classify" mode where it runs the testcases a few times and checks that Mozilla reports a consistent image every time. To ensure that the screen is fully updated before the screenshot is taken, I patched Mozilla so that when the right environment variable is set, we flush all reflows and force repaint after firing onload, and then also print a message on STDOUT. The server watches for this message and takes the screenshot only after the message appears. This isn't completely done yet but it is usable now. I want to add an "image comparison" feature to compare directories full of PNGs and generate a DHTML report visually highlighting any differences.
Assignee | ||
Comment 1•21 years ago
|
||
oh dear. I already checked in the paint-forcing patch by mistake! http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsDocumentViewer.cpp&branch=&root=/cvsroot&subdir=mozilla/content/base/src&command=DIFF_FRAMESET&rev1=1.347&rev2=1.348 http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsPresShell.cpp&branch=&root=/cvsroot&subdir=mozilla/layout/html/base/src&command=DIFF_FRAMESET&rev1=3.679&rev2=3.680 (see call to EndUpdateViewBatch) Well, er, that simplifies things :-)
Comment 2•21 years ago
|
||
Sounds sexy.
Assignee | ||
Comment 3•21 years ago
|
||
checkpoint of the current state of the testing script. All commands should work as advertised. There are still a few features I need to add: -- an image comparison report generator -- expose 'chunk delay' option --- tells the server to pause in the middle of feeding HTML pages to Mozilla, to test incremental reflows -- need to change input syntax so that # introduces a line comment, and change classifier output to report "# OK", "# FAILURE", "# MISMATCH", so you can write echo *.html | testrunner.pl classify | grep OK | testrunner.pl -m other/mozilla and of course this needs to be run on larger test suites and any testrunner bugs fixed.
Assignee | ||
Comment 4•21 years ago
|
||
A couple more notes before I forget:
> setTimeout(nextFrame, 1000);
I put this in because without it, no window ever appears. I'm not sure why. I
should try reducing 1000 to 1, but as it's only used for the first frame it's
not really an issue. Other than this there are no built-in delays. The tests
will run as fast as the system can go.
This probably leaves zombie processes around. I need to put a wait() after
close(<RUNNER>), at least.
Assignee | ||
Comment 5•21 years ago
|
||
My Xft build appears to produce different antialiasing pixels in different runs. Is there a way to stop Xft from antialiasing by setting an environment variable or something?
Assignee | ||
Comment 6•21 years ago
|
||
I guess I can launch mozilla with a custom fonts.conf pointed to by FONTCONFIG_FILE
Comment 7•21 years ago
|
||
Um, that seems bad. Anti-aliasing should be completely deterministic.
Assignee | ||
Comment 8•21 years ago
|
||
I have subpixel positioning on. Maybe that's doing it.
Comment 9•21 years ago
|
||
It shouldn't, assuming your window is always in the same place (you full-screen the window, right?).
Assignee | ||
Comment 10•21 years ago
|
||
> you full-screen the window, right?
At the moment I'm setting the window to 400x600. The window is not always at the
same place.
Anyway, I've written the code to turn off antialiasing, and I've made all the
other changes mentioned here, and now I'm just polishing up the script so it's
not as write-only.
Comment 11•21 years ago
|
||
> At the moment I'm setting the window to 400x600. The window is not always at > the same place. Ah. I recommend full-screening the window. :-) > Anyway, I've written the code to turn off antialiasing Generally for this kind of script you want the test to be as close as possible to what end-users are actually going to see.
Assignee | ||
Comment 12•21 years ago
|
||
Making the window any given size is easy, but the bigger it is, the slower everything runs. 400x600 seems like a good size for most testcases. It would be nice if we could run with antialiasing, but with Xft, we can't. I did some more tests; turning off subpixel positioning helps, but there are still a few cases where I get different pixel values unless I turn off antialiasing completely. I realized that background image loads don't block onload firing. This is a problem in some testcases. bz, if you're reading this, would it be hard to toggle that behavior if MOZ_FORCE_PAINT_AFTER_ONLOAD is set at runtime? I still have at least one bug to shake out that is stopping me from running the testcases in layout/html/tests. There's another bug that is not too serious but I don't know how to fix yet: tests with IFRAMEs fire onload events when those IFRAMEs load, and that spits out "PAINT FORCED" messages, and I don't know how to distinguish those from the top-level IFRAME. Maybe I'll add some goop to my nsPresShell patch.
Comment 13•21 years ago
|
||
> would it be hard to toggle that behavior if MOZ_FORCE_PAINT_AFTER_ONLOAD is set > at runtime? See http://lxr.mozilla.org/seamonkey/source/layout/base/src/nsImageLoader.cpp#120 -- you'd want to not pass the LOAD_BACKGROUND flag there if you want them to affect onload (just pass nsIRequest::LOAD_NORMAL). As for iframes, is this the problem with load events bubbling in XUL and such? Or are you using a capturing listener or something?
Assignee | ||
Comment 14•21 years ago
|
||
Great, I'll make a patch to the image loader. Thanks! The problem is that my change to DocumentViewerImpl::LoadComplete prints a message after onload has fired and we've finished painting --- for any IFRAME that we load. So when we print the "paint forced" message, the server script doesn't know whether the message refers to a child frame or to the real testcase. Probably I should just have DocumentViewerImpl::LoadComplete include the document URL in the message.
Comment 15•21 years ago
|
||
> Making the window any given size is easy, but the bigger it is, the slower
> everything runs. 400x600 seems like a good size for most testcases.
This seems weird... The Opera regression tests I did run at full-screen
1600x1200 and work fine. Is Mozilla really that much slower?
Assignee | ||
Comment 16•21 years ago
|
||
No, it's the time required for screenshotting that is the bottleneck. Also, I have a feeling that a narrower window will induce more interesting wrapping behaviours.
Assignee | ||
Comment 17•21 years ago
|
||
New iteration of the script. It does everything I've mentioned in this bug. I've successfully run this over all the testcases under layout/html/tests. Of these tests -- 138 are classified "FAILED" (Mozilla crashed, or hung, or the onload event failed to fire on the top level document --- this seems to happen quite often on framesets, and it also happens on tests that try to print themselves) -- 66 are classified "MISMATCH" (We got different results depending on the timing of the screenshot; I need to look into these more closely, but some of them are no doubt animated images, or scripts --- a lot of print tests fell into this category too) -- 1391 are classified "OK" (We got identical results over 3 iterations with varying timing of the screenshot in each iteration)
Assignee | ||
Updated•21 years ago
|
Attachment #138877 -
Attachment is obsolete: true
Comment 18•21 years ago
|
||
Hmm... Pretty much anything in the FAILED section is a bug, no?
Assignee | ||
Comment 19•21 years ago
|
||
Yes, most FAILED testcases probably are bugs. Some of them could even be bugs in the test framework. I'll look into it.
Assignee | ||
Comment 20•21 years ago
|
||
Here are the additional changes that I need in nsDocumentViewer and nsImageLoader, as discussed above.
Assignee | ||
Updated•21 years ago
|
Attachment #139602 -
Flags: superreview+
Attachment #139602 -
Flags: review+
Assignee | ||
Updated•21 years ago
|
Attachment #139602 -
Flags: superreview?(bz-vacation)
Attachment #139602 -
Flags: superreview+
Attachment #139602 -
Flags: review?(bz-vacation)
Attachment #139602 -
Flags: review+
Assignee | ||
Comment 21•21 years ago
|
||
One more update. This makes image diff do something sensible even if the images end up in a different format or size. Also, get rid of some alerts from the JS controller because the alert box cripples any subsequent tests.
Assignee | ||
Updated•21 years ago
|
Attachment #139591 -
Attachment is obsolete: true
Assignee | ||
Comment 22•21 years ago
|
||
The classify run that I did had some problems, namely that my wife was using the computer at the time and that corrupted some of the tests :-). The real numbers are FAILED 133, MISMATCH 12, OK 1450. A demo of the imagediff report for the 12 mismatches is here: http://ocallahan.org/mozilla/testrunner/demo1/index.html
Assignee | ||
Comment 23•21 years ago
|
||
Note the disturbing one pixel difference in the table test case. I think Xft just isn't very good about deterministic rendering, even though all antialiasing is off.
Assignee | ||
Comment 24•21 years ago
|
||
I'm supposed to take delivery of a new home machine today. I'm planning to make my old machine into a headless server, running (among other things) a continuous layout regression tester based on this code.
Comment 25•21 years ago
|
||
I would actually expect that Xft would do deterministic rendering. Keith?
Comment 26•21 years ago
|
||
Comment on attachment 139602 [details] [diff] [review] more core changes Looks fine. r+sr=bzbarsky. I assume the idea is to be able to run this in non-debug mode, right?
Attachment #139602 -
Flags: superreview?(bz-vacation)
Attachment #139602 -
Flags: superreview+
Attachment #139602 -
Flags: review?(bz-vacation)
Attachment #139602 -
Flags: review+
Assignee | ||
Comment 27•21 years ago
|
||
Yes, absolutely. We want to be able to do regression tests with opt builds.
Assignee | ||
Comment 28•20 years ago
|
||
checked in patch 139602
Comment 29•14 years ago
|
||
Robert, this bug is obsolete?
Assignee | ||
Comment 30•14 years ago
|
||
Absolutely.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•