Closed Bug 474472 Opened 16 years ago Closed 16 years ago

sporadic failure of 459443-1.html / reftest snapshot doesn't always catch final paints in an iframe

Categories

(Core :: Layout, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: johnath, Assigned: roc)

References

Details

Attachments

(7 files, 1 obsolete file)

Examples:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1232474198.1232479346.15917.gz

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1232465679.1232471563.32718.gz

(These had multiple green cycles between the two failures, and nothing compelling in the window).

The failing test:

REFTEST TEST-UNEXPECTED-FAIL | file:///builds/moz2_slave/mozilla-central-linux-unittest/build/layout/reftests/bugs/459443-1.html |
Per comments in bug 459443, it sounds like we're not painting the iframe's content if the snapshot is taken during the iframe's onload handler.  Things work if it's done async off a timeout.

The only issue I've been able to think of in terms of the onload handler causing issues is that we're probably still suppressing painting on the subframe when onload fires.  But we only check paint suppression in the display list builder on the outermost presshell it's invoked on, so that shouldn't affect things here...

roc suggested a gDumpPaintList=1 log might help see what's going on.
OS: Mac OS X → Linux
Summary: Intermittent (?) reftest failures on Linux → Intermittent reftest failures on Linux: 459443-1.html
Here's the reftest failure log generated for the reftest in bug 474472 (before my patch to it landed)
Attachment #357915 - Attachment description: stout of reftest → stout of reftest | clean-reftest-output.pl
and here's the corresponding stderr (with gDumpPaintList=1)
How come your reftest stdout doesn't have the data: URLs for the images?
Could you add some dump statements to reftest.js and get a combined log of both output streams so we can see exactly which display lists correspond to the snapshot being taken by reftest.js?
(In reply to comment #4)
> How come your reftest stdout doesn't have the data: URLs for the images?

It does -- they're linkified via the clean-reftest-output.pl script.
(In reply to comment #5)
> Could you add some dump statements to reftest.js and get a combined log

Here's an attempt at that -- I've added dump() statements at entry & exit points for a number of functions.
sorry, that last attachment was an old version.  This one has more information.
Attachment #358049 - Attachment is obsolete: true
I think the problem might be this:
 * For the testcase, we initialize gCurrentCanvas here:
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/reftest.js#648
   inside of "continuation()"
 * More painting happens after that, after the iframe content loads
 * --BUT-- AFAICT, we never update gCurrentCanvas again for the testcase -- that is, we never call "UpdateCurrentCanvasForEvent" or "InitCurrentCanvasWithSnapshot" -- until after we've stored the canvas as gCanvas1 in DocumentLoaded().
Here's a patch with all the dump() statements I added to reftest.js
It looks like the canvas-update should be handled in a call to  "AfterPaintListener", which is registered as a listener for the MozAfterPaint event inside the method "continuation()".

However, according to my log, AfterPaintListener is never called.  So, I think that logic is breaking somehow.
If I remove this line from FinishWaitingForTestEnd:
  616   gBrowser.removeEventListener("MozAfterPaint", AfterPaintListener, false);
then we *do* call AfterPaintListener, which triggers a FlushFrames() call (and a bunch of dumped paints) and an update to the canvas.

So, I think the problem is that we're calling FinishWaitingForTestEnd (and removing the event-listener) when we shouldn't be doing so.

In fact, I think the problem is that we're trusting "utils.isMozAfterPaintPending" to tell us when we're done painting, when in fact I think it only recognizes pending-paints for the *outer* document (not the iframe).

The underlying implementation of that method is here:
http://mxr.mozilla.org/mozilla-central/source/dom/src/base/nsDOMWindowUtils.cpp#675
and that just checks a single PresContext for "IsDOMPaintEventPending()" -- I'm guessing it ignores the iframe's PresContext.

Assuming I'm correct in this diagnosis, we need to do one of the following:
 (a) bubble IsDOMPaintEventPending() status up from the iframe to the outer document
 (b) change reftest.js to check isMozAfterPaintPending on subdocuments as well as the main document
 (c) something else?
(In reply to comment #12)
> If I remove this line from FinishWaitingForTestEnd:
>   616   gBrowser.removeEventListener("MozAfterPaint", AfterPaintListener,
> false);
> then we *do* call AfterPaintListener, which triggers a FlushFrames() call (and
> a bunch of dumped paints) and an update to the canvas.

(and the test passes, with that modification -- I forgot to mention that part. :))
Summary: Intermittent reftest failures on Linux: 459443-1.html → sporadic failure of 459443-1.html / reftest snapshot doesn't always catch final paints in an iframe
Comment #12 sounds reasonable. However, isMozAfterPaintPending should be returning true, I would have thought, since the outer document's nsPresContext::mCrossDocDirtyRegion should be non-empty when nsPresContext::IsDOMPaintEventPending is called on the outer document and the inner document has a pending event.
Here's a new reftest to test this bug, based on 459443-1.html, but tweaked a bit to remove unnecessary bits and simplify the structure.
(In reply to comment #14)
> the outer document's
> nsPresContext::mCrossDocDirtyRegion should be non-empty when
> nsPresContext::IsDOMPaintEventPending is called on the outer document and the
> inner document has a pending event.

It looks like stuff happens in this order:
 - the iframe finishes loading
 - the reftest removes "reftest-wait" (triggering the false "isMozAfterPaintPending" check, and the resulting snapshot)
 - we unsuppress painting (which was presumably suppressed during loading)
 - We call a cascade of "InvalidateInternal" methods, with "nsPresContext::NotifyInvalidation" at the bottom, which sets mCrossDocDirtyRegion to be non-empty

So AFAICT, the problem is that the snapshot is triggered in an _onload_ handler, and painting is suppressed until loading is *over*.  Or something like that...
Here's a backtrace from the first time I hit NotifyInvalidation after the initial reflow.  This is the point where mCrossDocDirtyRegion is set to be non-empty.

Notice in particular these stack levels -- their names back up my previous comment's hypothesis.
#18 0xb317a954 in PresShell::UnsuppressAndInvalidate
#19 0xb317ab94 in PresShell::UnsuppressPainting 
#20 0xb31446a3 in DocumentViewerImpl::LoadComplete
#21 0xb2f57ba9 in nsDocShell::EndPageLoad

Also -- when I hit this breakpoint, these reftest.js dump-statements have already gone by:

****dholbert: OnDocumentLoad called.
****dholbert: OnDocumentLoad returning early, for subframe.
****dholbert: AttrModifiedListener called.
****dholbert: FlushRendering called.
[a bunch of gDumpPaintList output which I'm removing here]
****dholbert: FlushRendering returning.
****dholbert: AttrModifiedListener: utils.isMozAfterPaintPending == false
****dholbert: FinishWaitingForTestEnd called.
****dholbert: FinishWaitingForTestEnd returning.
****dholbert: AttrModifiedListener returning.

(and as mentioned previously -- as soon as we hit the "utils.isMozAfterPaintPending == false" clause & the resulting "FinishWaitingForTestEnd" call, we're hosed)
OK thanks! So the issue is that the invalidates that should have been pending from the IFRAME have actually been suppressed by the paint suppression in the IFRAME.

It's not immediately obvious how to fix this. I need to think a bit. It would probably be enough for AttrModifiedListener to do another setTimeout(0) before checking IsMozAfterPaintPending. Let's try that.
Attached patch fix? Splinter Review
This should fix it ... Daniel, please give it a try.
One more tidbit: it looks like painting is suppressed due to nsPresShell::mPaintSuppressionTimer.  

This timer's defaults to being 250ms-long (notably longer than the reftest takes to load), but it can be tweaked via the integer pref "nglayout.initialpaint.delay" (which doesn't exist by default, but can be created)

If I set this pref to 0, the reftest almost always passes (4/5 test runs passed)
If I set this pref to 20000 the reftest always fails (5/5 test runs failed)

(In reply to comment #19)
Thanks roc -- I'll try out that patch & report back shortly...
(In reply to comment #19)
> This should fix it ... Daniel, please give it a try.

Yeah -- that patch does appear to reliably fix it, even with nglayout.initialpaint.delay set to 20000.  Woo-hoo! :)
That looks safe enough, I guess.  That said, should IsMozAfterPaintPending be checking for paint-suppressed child presshells or something?

(And Daniel, thanks a ton for digging into this!)
(In reply to comment #22)
> That looks safe enough, I guess.  That said, should IsMozAfterPaintPending be
> checking for paint-suppressed child presshells or something?

I don't think so. The subdocument might be paint-suppressed for quite a while, and I don't think it would be useful for anyone for isMozAfterPaintPending to return true all that time.
Comment on attachment 358112 [details] [diff] [review]
fix? 

ok, r=dbaron.

Waldo tells me that setTimeout(0) isn't really 0, though, so we might at some point want to try speeding up reftest by making all the setTimeout(0) calls use XPCOM timers (or postMessage).
Attachment #358112 - Flags: review?(dbaron) → review+
IMHO there should be a Web-accessible DOM API, say, "window.schedule", that behaves like setTimeout(really-0).
Whiteboard: [needs landing]
Pushed http://hg.mozilla.org/mozilla-central/rev/23105976dd37

Can someone please undo the test changes that we had to work around this bug?
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing]
Whiteboard: [needs 191 landing]
Reftest (attachment 358086 [details] [diff] [review]) pushed: 
 http://hg.mozilla.org/mozilla-central/rev/8cbebee6cb8e

And yeah, I can undo the workaround in the other bug's test.
Flags: in-testsuite+
Whiteboard: [needs 191 landing] → [needs 191 landing][reftest needs 191 landing]
Workaround backed out:
  http://hg.mozilla.org/mozilla-central/rev/c0d1d49217cc

We need to backout the workaround on 1.9.1 once this lands there, too.  The changeset to be backed out there is http://hg.mozilla.org/releases/mozilla-1.9.1/rev/f9f526b04f36
Assignee: nobody → roc
Actually, I'm not sure that we want (or need?) to land this on 1.9.1 -- the fix patches a chunk of code that was created in Bug 471365, which hasn't landed on 1.9.1 and isn't marked as wanted on 1.9.1.

--> removing 191 statements in the whiteboard here (which I'd originally added)
Whiteboard: [needs 191 landing][reftest needs 191 landing]
Blocks: 471365
Target Milestone: --- → mozilla1.9.2a1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: