Closed Bug 230697 Opened 21 years ago Closed 14 years ago

New automated regression test framework

Categories

(Core :: Layout, enhancement)

x86
Linux
enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: roc, Assigned: roc)

Details

Attachments

(2 files, 2 obsolete files)

I have been developing a new regression test framework, primarily for layout,
but since it's end-to-end it also tests HTTP, Gfx/Widget, parser, content,
views, etc. It's inspired by Hixie's test engine for Opera.

It's fairly straightforward and requires only very minor changes to the Mozilla
codebase. Basically we set up a special local Web server that feeds Mozilla a
XUL app to drive the tests and a set of testcases. The XUL app loads each
testcase and then signals the server (by requesting a magic URL) that the
testcase is loaded. The server then takes a screenshot of the Mozilla window and
replies to the URL request, causing Mozilla to move to the next testcase. The
result is a set of PNGs, one per testcase. Regression testing consists of
building a baseline set of PNGs and then rerunning the tests with a modified
Mozilla, and comparing the PNGs.

This is all implemented in one big Perl script. Currently it only works on
Unix-like systems because it uses fork() and X-specific graphics commands. I
wanted to use Xvfb for headless operation and fast screenshotting, but Xvfb is
broken in RH9, so for now I'm running the tests on the current X display and
using ImageMagick's import command. Runtime is dominated by the time taken by
the screenshots so I'd like to get Xvfb working eventually.

There are some complications to the above description. Mozilla may crash or hang
on some testcases and the script needs to detect that, kill Mozilla, and resume
with the remaining tests. Some tests are unsuitable for this approach because
they're animated so screenshots will not always return the same contents. My
script supports a "classify" mode where it runs the testcases a few times and
checks that Mozilla reports a consistent image every time.

To ensure that the screen is fully updated before the screenshot is taken, I
patched Mozilla so that when the right environment variable is set, we flush all
reflows and force repaint after firing onload, and then also print a message on
STDOUT. The server watches for this message and takes the screenshot only after
the message appears.

This isn't completely done yet but it is usable now. I want to add an "image
comparison" feature to compare directories full of PNGs and generate a DHTML
report visually highlighting any differences.
Sounds sexy.
Attached file testrunner.pl (obsolete) —
checkpoint of the current state of the testing script. All commands should work
as advertised. There are still a few features I need to add:
-- an image comparison report generator
-- expose 'chunk delay' option --- tells the server to pause in the middle of
feeding HTML pages to Mozilla, to test incremental reflows
-- need to change input syntax so that # introduces a line comment, and change
classifier output to report "# OK", "# FAILURE", "# MISMATCH", so you can write
echo *.html | testrunner.pl classify | grep OK | testrunner.pl -m other/mozilla


and of course this needs to be run on larger test suites and any testrunner
bugs fixed.
A couple more notes before I forget:

> setTimeout(nextFrame, 1000);
I put this in because without it, no window ever appears. I'm not sure why. I
should try reducing 1000 to 1, but as it's only used for the first frame it's
not really an issue. Other than this there are no built-in delays. The tests
will run as fast as the system can go.

This probably leaves zombie processes around. I need to put a wait() after
close(<RUNNER>), at least.
My Xft build appears to produce different antialiasing pixels in different runs.
Is there a way to stop Xft from antialiasing by setting an environment variable
or something?
I guess I can launch mozilla with a custom fonts.conf pointed to by FONTCONFIG_FILE
Um, that seems bad. Anti-aliasing should be completely deterministic.
I have subpixel positioning on. Maybe that's doing it.
It shouldn't, assuming your window is always in the same place (you full-screen
the window, right?).
> you full-screen the window, right?

At the moment I'm setting the window to 400x600. The window is not always at the
same place.

Anyway, I've written the code to turn off antialiasing, and I've made all the
other changes mentioned here, and now I'm just polishing up the script so it's
not as write-only.
> At the moment I'm setting the window to 400x600. The window is not always at 
> the same place.

Ah. I recommend full-screening the window. :-)


> Anyway, I've written the code to turn off antialiasing

Generally for this kind of script you want the test to be as close as possible 
to what end-users are actually going to see.
Making the window any given size is easy, but the bigger it is, the slower
everything runs. 400x600 seems like a good size for most testcases.

It would be nice if we could run with antialiasing, but with Xft, we can't. I
did some more tests; turning off subpixel positioning helps, but there are still
a few cases where I get different pixel values unless I turn off antialiasing
completely.

I realized that background image loads don't block onload firing. This is a
problem in some testcases. bz, if you're reading this, would it be hard to
toggle that behavior if MOZ_FORCE_PAINT_AFTER_ONLOAD is set at runtime?

I still have at least one bug to shake out that is stopping me from running the
testcases in layout/html/tests. There's another bug that is not too serious but
I don't know how to fix yet: tests with IFRAMEs fire onload events when those
IFRAMEs load, and that spits out "PAINT FORCED" messages, and I don't know how
to distinguish those from the top-level IFRAME. Maybe I'll add some goop to my
nsPresShell patch.
> would it be hard to toggle that behavior if MOZ_FORCE_PAINT_AFTER_ONLOAD is set
> at runtime?

See
http://lxr.mozilla.org/seamonkey/source/layout/base/src/nsImageLoader.cpp#120 --
you'd want to not pass the LOAD_BACKGROUND flag there if you want them to affect
onload (just pass nsIRequest::LOAD_NORMAL).

As for iframes, is this the problem with load events bubbling in XUL and such? 
Or are you using a capturing listener or something?
Great, I'll make a patch to the image loader. Thanks!

The problem is that my change to DocumentViewerImpl::LoadComplete prints a
message after onload has fired and we've finished painting --- for any IFRAME
that we load. So when we print the "paint forced" message, the server script
doesn't know whether the message refers to a child frame or to the real
testcase. Probably I should just have DocumentViewerImpl::LoadComplete include
the document URL in the message.
> Making the window any given size is easy, but the bigger it is, the slower
> everything runs. 400x600 seems like a good size for most testcases.

This seems weird... The Opera regression tests I did run at full-screen
1600x1200 and work fine. Is Mozilla really that much slower?
No, it's the time required for screenshotting that is the bottleneck.

Also, I have a feeling that a narrower window will induce more interesting
wrapping behaviours.
Attached file New version (obsolete) —
New iteration of the script. It does everything I've mentioned in this bug.
I've successfully run this over all the testcases under layout/html/tests. Of
these tests
-- 138 are classified "FAILED" (Mozilla crashed, or hung, or the onload event
failed to fire on the top level document --- this seems to happen quite often
on framesets, and it also happens on tests that try to print themselves)
-- 66 are classified "MISMATCH" (We got different results depending on the
timing of the screenshot; I need to look into these more closely, but some of
them are no doubt animated images, or scripts --- a lot of print tests fell
into this category too)
-- 1391 are classified "OK" (We got identical results over 3 iterations with
varying timing of the screenshot in each iteration)
Hmm... Pretty much anything in the FAILED section is a bug, no?
Yes, most FAILED testcases probably are bugs. Some of them could even be bugs in
the test framework. I'll look into it.
Here are the additional changes that I need in nsDocumentViewer and
nsImageLoader, as discussed above.
Attachment #139602 - Flags: superreview?(bz-vacation)
Attachment #139602 - Flags: superreview+
Attachment #139602 - Flags: review?(bz-vacation)
Attachment #139602 - Flags: review+
Attached file updated script
One more update. This makes image diff do something sensible even if the images
end up in a different format or size. Also, get rid of some alerts from the JS
controller because the alert box cripples any subsequent tests.
The classify run that I did had some problems, namely that my wife was using the
computer at the time and that corrupted some of the tests :-). The real numbers
are FAILED 133, MISMATCH 12, OK 1450.

A demo of the imagediff report for the 12 mismatches is here:
http://ocallahan.org/mozilla/testrunner/demo1/index.html
Note the disturbing one pixel difference in the table test case. I think Xft
just isn't very good about deterministic rendering, even though all antialiasing
is off.
I'm supposed to take delivery of a new home machine today. I'm planning to make
my old machine into a headless server, running (among other things) a continuous
layout regression tester based on this code.
I would actually expect that Xft would do deterministic rendering.  Keith?
Comment on attachment 139602 [details] [diff] [review]
more core changes

Looks fine.  r+sr=bzbarsky.

I assume the idea is to be able to run this in non-debug mode, right?
Attachment #139602 - Flags: superreview?(bz-vacation)
Attachment #139602 - Flags: superreview+
Attachment #139602 - Flags: review?(bz-vacation)
Attachment #139602 - Flags: review+
Yes, absolutely. We want to be able to do regression tests with opt builds.
checked in patch 139602
Robert, this bug is obsolete?
Absolutely.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: