Closed Bug 62589 Opened 24 years ago Closed 23 years ago

the layout regression tests need improvement

Categories

(Core :: Layout, defect, P1)

x86
Windows NT
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.1

People

(Reporter: buster, Assigned: waterson)

References

Details

Attachments

(10 files)

currently,
1) some tests fail in unpredictable ways
2) it's hard to decipher the regression test output
3) it would be hard to use the regression tests as part of an automated process,
as in tinderbox
the problems can be addressed in this way:
1) separate out the tests into those that need delays because of asynch page
loading (image loading in particular)  Remove the delay for files that don't
need it (greatly speeding up the process.)  I've done this for block, and chris
is doing it for table.  Since this part of the regression test is not part of
the normal build, these changes do not need to be super-reviewed.  Chris and I
will review each other's changes to the data files.

2) I've changed the regression test output:
a) to mark any discrepancy as a failure,
b) to output more data about every failure (in particular, which frame was being
tested at the time of the failure),
c) to output in the format "regression test [passed|failed] <filename>" to make
searching for these strings easier
d) to output a summary after every url list that includes number of url's loaded
and number that failed.

3) the performance speed up from (1) plus the grep-friendly strings from (2)
should make it easy to hook up regression testing to a tinderbox.

Any other suggestions?
Status: NEW → ASSIGNED
Priority: P3 → P1
Chris K. and I were discussing how the delay between the loading of URLs happens
at the wrong time. We need the delay to happen after the load but before the
regerssion data is dumped, whereas currently the delay happens after the
regression data is dumped, and before the next URL is loaded. Moving the delay
to before the regression data is dumped would allow for the extra time we want
to let the layout complete.

I believe that Chris has changes for this in his tree, so we should consider
taking those changes as well. As it is now, I cannot see how the delay makes any
difference at all (although I've heard that it does).

Another problem (on Win32 platforms at least) is that assertions stop the run
until a dialog is dismissed. Is there a way to make assertions fail the URL but
continue the tests?
attached the diffs for nsWebCrawler.cpp and nsFrameUtil.cpp.
chris k or marc, please review
chris w, please sr.

to marc's point, chris and I had discussed this issue earlier today. chris can
either send me his changes or fold them in himself.
r=attinasi with small comments for each patch:

1) in part 1: '1024-1' should probably be 'sizeof(dirName)-1' (can you say
'nit'?) Also, the case in nsWebCrawler::PerformRegressionTest where you now
return NS_ERROR_FAILURE (nsIFrameUtil not created) might deserve an assertion too.

2) in part 2: The node-dumps you put in for the cases where the state and bbox
don't match should probably be put in for the case where the style data does not
match too.

With or without the above, the changes look good.
do what marc said, sr=waterson
Marking donttest to get this off the managerial radar. If this does in fact need
a reduced testcase, please remove the donttest keyword and explain what testcase
you would like. Thanks!
Keywords: donttest
-Will that patch fix the  'bogus bbox differences' karnaze mentioned on
2000-10-10 in n.p.m.layout?
- is there a way to get rid of  path's like s:\mozilla\dist\win32_d.obj\bin\viewer
I don't have a path s:\ and  in order to run the tests locally (outside
netscape) I have to edit all the scripts.
- This is especially true for the filelist.txt
- Another improvement would be the possibility to dump the content of the verify
run into one big file, that could be attached to a bug as a proof, that the
regression tests have been successful.

Currently there is only one assert in the table regression test's, much more
annoying are the assertions at the start up and shutdown.
>>Will that patch fix the  'bogus bbox differences' karnaze mentioned...
most, if not all, bbox false hits will be fixed by a combination of better
testing code and some shuffling around of the test cases themselves

>>is there a way to get rid of  path's like s:/ ....
sure, we could use local paths.  I'll look into that separately

>>Another improvement would be the possibility to dump the content of the verify
>>run into one big file, that could be attached to a bug as a proof, that the
>>regression tests have been successful.

I'm not sure I want to ask people to attach big output files to bugs as a matter
of course.  If we didn't trust someone to run the regression tests before
checking in, they wouldn't have gotten approval to check in from me or chris.
It would be good, to be able to run the regression tests as a single run, 
without leaving and restarting the viewer, by this one would have bloat 
statistics on a larger sample ( and get rid of the start and shutdown 
assertions)
moving to mozilla0.9
moving to mozilla0.9
Target Milestone: --- → mozilla0.9
Target Milestone: mozilla0.9 → mozilla0.8
Attached patch bbox patchSplinter Review
In my oppinion, it would be good if bbox differences would generate a failed
result for the corresponding regression test (see patch for a quick hack example).
Moving to Mozilla0.9
Target Milestone: mozilla0.8 → mozilla0.9
Taking this bug so I can get it checked in.

Chris, have you looked at the bbox patch attached by bernd? I'll try that out
too. It should probably be checked in separately. The first two patches, from
buster, have already been reviewed so they appear to be good to go. I'm
reapplying and retesting them now just to be sure...
Assignee: buster → attinasi
Status: ASSIGNED → NEW
my patch is obsolete, it has been fixed as a part of 1.18 checkin from Steve
(2001-01-16).
Accepting - moving out to Mozilla 1.0 as this is ongoing effort.
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9 → mozilla1.0
The idea behind the patch is to avoid to load the next url while the reflow has
not finished and we are still dumping regression data, may be this will reduce
the amount of false positives.
sr=attinasi on attachID=28738
Attached patch patch cosmeticsSplinter Review
bernd, could you post unified diffs from now on (cvs diff -u)? Also, I don't
understand why we'd startup dumping layout regression data for a *new* page
before the *old* page has completed. This all happens on the main thread.
With the above patch, I got four false positives (ten frame state mismatches)
running through the table regression tests. That said, I only made it through
563 of the tests, because it hangs sometimes. (That said, without the patch, I
only completed 460 of the tests due to crashes, and had 2,645 frame state
mismatches!)

The idea is to *always* do regression on a timer callback, even if the timeout
is zero. This lets all layout's timers clear before we try to validate.
...and it looks like the hang is caused by a missing image file; e.g., in

  mozilla/layout/html/tests/table/core/captions3.html

we've got

  <img src="../../images/vh40" alt="Valid HTML 4.0!" height="31" width="88">

Let me see if I can fix that.
Chris, I cannot tell what your patch is doing (yet), but the results sure sound
amazing! I'm applying your work-in-progress now to see for myself (and so I can
see the changes in context).
Chris, I just ran your patch (attachID 29257) with the same engine for the
baseline and the verify and I got 0 (that is Z E R O) false positives over the
table regression tests! Every test passed, no bbox or framestate errors - This,
I like very, VERY much! You da man, man!
Depends on: 74165
Well, I'm not having quite so much luck :-(.

Running through the table tests, I see three hangs, two from places where we
document.write() into a table cell. We also hang in the block tests this way,
too. I'm looking at that.

I still see about a dozen frame state mismatches. The flags that are mismatches
are consistently either NS_FRAME_HAS_LOADED_IMAGES or
NS_FRAME_EXTERNAL_REFERENCE. I've squelched a bunch of bogus false positives by
whacking nsFrameUtils.cpp to ignore frame state differences in ScrollbarFrame.
(The slider frame inside the scrollbar consistently generates dozens of
NS_FRAME_HAS_LOADED_IMAGES mismatches.)
>Also, I don't understand why we'd startup dumping layout regression data for a 
>*new* page before the *old* page has completed.

I have seen this several times, the usual indicator is that the page does not
make it to the screen, I have seen that a large page was unchanged on the screen
and in the status bar the URL's changed. In the previous alogorithm this would
happen under the following conditions:

LoadNextURL is executed and starts a timer 
with the OnStart observer we dump the regression data for the previous URL, and
may be this takes a longer time than the we specified for the timer. Under these
conditions the LoadNextURL routine will be executed before the URL made it to
the screen. 

This scenario became possible after the last change in the execution of
regression tests where the delay:= could be specified in the file_list.txt
files. My impression is that you are reverting to some extend this change, I
consider  this a good idea.
The build I reported teh excellent results from has no new ImageLib (it is a bit
old, obviously). HTat explains the frame-state mismatches maybe, but not the
hangs...
I tried Chris patch, Marc is right (You da man, man). It looks pretty promising,
but I had the feeling that the delay times inserted in the file_list.txt files
are ignored. 
Depends on: 53956
Depends on: 75198
Ok, things are getting warmer. With patches for above bugs, I'm now hanging 
only on table/bugs/bug11026.html (libpr0n! grr...) I'll try to get that nailed 
tomorrow, so I can get on with my block frame hackery...
With the last patch, plus a couple of random fixes from pavlov, I can get the
regression tests to run all the way through, with one false positive in the
block tests and six false positives in the table tests.

I made it so that we ignore the NS_FRAME_EXTERNAL_REFERENCE frame state flag: I
think these differences occur if viewer is the top window and you move the mouse
or something. The only places that the bit gets set are 1) in the caret code and
2) in the ESM.

I also made it so that we ignore the scrollbars, which were another source of
frequent false positives.
Waterson, this is great! Please check it in. I've been having problems with
linux patch files. r=karnaze.

Does this also mean that we can remove the delay:=n workarounds and speed things up?
I definitely want to get this in, but this'll make the tests unusable until the 
dependent bugs are fixed. Let me put some pressure on getting the fixes in for 
the dependents...
marc, I'm gonna steal this one. Hope you don't mind. With fixes for bug 75576, I
get zero false positives from bbox differences! Woo hoo! (Bug 75576 fixes some
cases where image requests weren't properly added to the load group.)
Assignee: attinasi → waterson
Status: ASSIGNED → NEW
Depends on: 75576
Target Milestone: mozilla1.0 → mozilla0.9.1
Status: NEW → ASSIGNED
Chris, I thought you owned it already - if not in name, in spirit. Looking
forward to this landing - if you need a review, just holler.
Attached patch updated patchSplinter Review
karnaze, dbaron, or attinasi, could you r= that latest patch?

- nsFrameUtil ignores differences in scrollbars and some (one) of the frame
  state bits

- reworked nsWebCrawler a bit, tried to clean up some of the timing code

- removed code that kills the event queue, as this was causing intermittent
  crashes on shutdown. (might be leaky, I didn't check...)
Chris, is there a possibility to get rid of the lines:

http://lxr.mozilla.org/seamonkey/source/layout/base/src/nsFrameUtil.cpp#648
http://lxr.mozilla.org/seamonkey/source/layout/base/src/nsFrameUtil.cpp#650

I commented them out a long time ago, dumping the critical nodes is for me
enough information. When this happens on a long testcase you can easily dump
0.5MB of data, which is completely redundant because you have the same
information in the .rgd files.
bernd: good idea. Why don't I add an extra parameter (``noisy regression data'') 
or something like that in case people want the old behavior.
I would like to 'r' your patch and separate my wish from this, in order to get
your fix into the tree faster ( I have your fix applied already, but maybee
smooth running regression tests are a good argument for others to use them :-))
ok, so dcone is going to be checking in the stuff bernd wanted in bug 78523, and 
jud is going to whack _all_ this stuff pretty hard in bug 15345. (After which, 
I'll need to update these patches.) Adding dependencies; and emailed jud asking 
him to land nsWebCrawler changes ASAP.
Depends on: 15345, 78523
Ok, stuff is looking good on both Windows and Linux, so I'd like to solicit a
final review. Here is a summary of the changes:

- Remove ``-d'' option, hard-code delay to 200msec.
- Collect regression data 200msec after page load has completed
- Add support for printing regression tests in Linux harness (still doesn't
  appear to work; looks like it actually wants to print!)
- Fix some bustage that jud introduced; specifically, need OnStateChange()
  to ignore any state changes for any web-progress but topmost.
- Remove code that was killing the event queue on shutdown; I don't think it's
  necessary, and it was causing periodic crashes.
- Set XPCOM_DEBUG_BREAK=warn on Win32, so that assertions turn into printf()'s
  instead of dialog boxes
- Sprinkle some nsCOMPtr's around
- Make nsFrameUtil ignore differences in scroll frames, or differences in
  some of the frame-state flags.
r=karnaze. bug 78523 should have fixed the printing problems. I have already 
checked in removal of delay:= in table file lists.
I still have to pull and patch to test this all out, but from inspecting the
patches I say 'super'! sr=attinasi
Keywords: patch
r=rpotts.  The changes to nsWebCrawler::OnStateChange(...) look fine to me.

I noticed that you are only handling the success case for document completion.
Do you need to do anything in the case that the load fails (with
NS_BINDING_ABORTED for example)?  This could be a fairly common situation with
meta-refresh...

-- rick
Changes checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Marking verified per last comments
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: