Closed Bug 932233 Opened 11 years ago Closed 9 years ago

Android reftest intermittent TEST-UNEXPECTED-FAIL | [reftest-name] | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint), on Android

Categories

(Testing :: Reftest, defect, P3)

All
Android
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: cbook, Unassigned)

References

()

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file, 1 obsolete file)

Android 2.2 Armv6 Tegra mozilla-inbound opt test plain-reftest-3 on 2013-10-29 02:40:34 PDT for push e09fa7cac06b

slave: tegra-146

https://tbpl.mozilla.org/php/getParsedLog.php?id=29829175&tree=Mozilla-Inbound

REFTEST TEST-UNEXPECTED-FAIL | http://10.250.49.167:30146/tests/layout/reftests/svg/as-image/background-resize-1.html | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint)
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.49.167:30146/tests/layout/reftests/svg/as-image/background-resize-2.html | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint)
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.49.167:30146/tests/layout/reftests/svg/as-image/background-resize-3.html | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint)
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.49.167:30146/tests/layout/reftests/svg/as-image/background-resize-4.html | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint)
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.49.167:30146/tests/layout/reftests/svg/as-image/canvas-drawImage-simple-1b.html | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint)
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.49.167:30146/tests/layout/reftests/svg/as-image/canvas-drawImage-scale-1a.html | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint)
REFTEST TEST-UNEXPECTED-FAIL | http://10.250.49.167:30146/tests/layout/reftests/svg/as-image/canvas-drawImage-scale-1b.html | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint)
TEST-UNEXPECTED-FAIL | http://10.250.49.167:30146/tests/layout/reftests/svg/as-image/canvas-drawImage-scale-1c.html | application ran for longer than allowed maximum time
http://mxr.mozilla.org/mozilla-central/source/layout/reftests/svg/as-image/background-resize-1.html?force=1

Looks like a fairly simple reftest to me...
Why would we not get a MozReftestInvalidate event?
Flags: needinfo?(dbaron)
Priority: -- → P5
Seth might know more about this than I do.
Flags: needinfo?(seth)
Hmmm... very odd. AFAIK MozReftestInvalidate fires when the initial paint is finished and snapshotted, so this bug indicates that the initial paint never finishes. I'm not actually entirely sure how that could happen without a crash or nontermination. Any idea how this is even possible?
Flags: needinfo?(seth)
(In reply to Seth Fowler [:seth] from comment #4)
> Hmmm... very odd. AFAIK MozReftestInvalidate fires when the initial paint is
> finished and snapshotted, so this bug indicates that the initial paint never
> finishes

Not quite -- we don't fire MozReftestInvalidate firing until "we have no pending paints". (So if we do the first paint, and we already have more stuff to paint, then we won't fire MozReftestInvalidate.)

See e.g. bug 781362 comment 8 for (a bit) more.
It's also odd that all the following tests also times out waiting for paint events:

background-resize-1.html | timed out waiting for pending paint...
background-resize-2.html | timed out waiting for pending paint...
background-resize-3.html | timed out waiting for pending paint...
background-resize-4.html | timed out waiting for pending paint...
canvas-drawImage-simple-1b.html | timed out waiting for pending paint...
canvas-drawImage-scale-1a.html | timed out waiting for pending paint...
canvas-drawImage-scale-1b.html | timed out waiting for pending paint...

before the test framework gives up and kills it:

canvas-drawImage-scale-1c.html | application ran for longer than allowed maximum time

The logic that fires these events lives in nsPresContext, so it seems it should
be "reset" when a new document loads?  which would point to the error being in
reftest-content.js maybe?

Given that this test is a simple text book use of MozReftestInvalidate:
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/README.txt#424
it seems rather worrisome if such tests fails intermittently...

We currently have 18 open bugs on "waiting for pending paint count to reach zero".
The earliest of those is bug 922996, filed on 2013-10-02, so the root cause
might be some change(s) around that time.
BTW, all of these 18 bugs are intermittent-failures on Android.
This reftest framework change landed 2013-09-19, only affecting Android AFAICT -- 
could it be what triggered these failures?
https://hg.mozilla.org/integration/mozilla-inbound/rev/8445294582d1
Flags: needinfo?(dminor)
That change was made to disable using skiaGL due to it being slow for the reftest use case of creating many canvases. SkiaGL was only enabled around 20 August (or so) so if these intermittents did not occur between 20 August and 19 September, but happened prior and after, then re-enabling skiaGL might be a fix. Depending upon how often the intermittent occurs, this might be testable on Try. If you have further questions on this, :snorp is probably the person to ask.
Flags: needinfo?(dminor)
It's pretty bad when even the very first reftest-sanity/ test fails with this error (bug 935481).
Priority: P5 → P3
Blocks: 935481
Given the large amount of "waiting for pending paint count to reach zero" intermittent
failures on Android recently, I see no other explanation than a bug in the reftest
framework, or in the parts of Gecko that is expected to deliver the MozAfterPaint
event.  It seems unlikely that all these tests are at fault.
Component: SVG → Reftest
Keywords: regression
Product: Core → Testing
Hardware: ARM → All
Blocks: 936411
Going off the theory that this got way worse around late-October, I will attempt to bisect on Try. However, given that this happens on Tegras mainly, I expect it's going to be very slow-going.
Blocks: 940264
Blocks: 941512
Blocks: 942131
Blocks: 942784
Blocks: 943329
Blocks: 943377
This has been hard to narrow down because it started in early October at a low frequency and got a lot worse in late October (and has been bad ever since).

I *think* I've got the initial cause narrowed down to this push, though:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=51b36c5fd45f&tochange=33be4ad3a720

The pushes from Cameron certainly look like they might be related. Will continue bisecting within this push.
Nevermind, I managed to reproduce in an earlier push, so comment 113 is out. I have a changeset from October 1 that is consistently green (after 300+ runs), so I do have a lower bound on this. Like I said before, the problem is that this first started as an infrequent (and therefore hard to reproduce) failure early in October before turning into the frequent failure it now is in late October.

Right now I'm only trying to find the initial cause of this failure, not what made it worse later. Once we know that, we can decide if it's worth trying to bisect that or not.
Flags: needinfo?(dbaron)
Is it just me, or does bug 878935 stand out like a sore thumb in that range?
Flags: needinfo?(matspal)
Flags: needinfo?(jaywir3)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #128)
> Is it just me, or does bug 878935 stand out like a sore thumb in that range?

That would be my first guess too, but it was backed out the day after:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1382e5fe60f
and never relanded AFAICT.
Flags: needinfo?(matspal)
Flags: needinfo?(jaywir3)
Damn. Given the Tegra wait time situation, I won't be surprised if I don't get this narrowed down until the weekend. As before, don't wait on my behalf! ;)
(In reply to Mats Palmgren (:mats) from comment #132)
> That would be my first guess too, but it was backed out the day after:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/d1382e5fe60f
> and never relanded AFAICT.

As expected, green.
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #138)
> As expected, green.

I'm having a hard time getting this to reproduce at this point. Mats, I'm not sure I'm going to be able to take this any farther :(
Flags: needinfo?(matspal)
Yeah, I'm not sure there even is one particular range that made it more frequent.
It might just be the overall volume of tests that is the problem.

I think we need to add more logging to at least figure out which part of the system
that fails.  I know next to nothing about how reftests are run on Android though,
so I'm of little use there.
Flags: needinfo?(matspal)
(In reply to Mats Palmgren (:mats) from comment #145)
> Yeah, I'm not sure there even is one particular range that made it more
> frequent.
> It might just be the overall volume of tests that is the problem.
> 
> I think we need to add more logging to at least figure out which part of the
> system
> that fails.  I know next to nothing about how reftests are run on Android
> though,
> so I'm of little use there.

Jet, I think we're at a point in this bug where we need some help from the Platform team on this. Can you please help?
Flags: needinfo?(bugs)
[dropping the test name from the summary, since this isn't about any one particular test anymore, per comment 12]
Summary: Intermittent TEST-UNEXPECTED-FAIL | tests/layout/reftests/svg/as-image/background-resize-1.html | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint) → Android reftest intermittent TEST-UNEXPECTED-FAIL | [reftest-name] | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint), on Android
(In reply to Carsten Book [:Tomcat] from comment #0)
> Android 2.2 Armv6 Tegra mozilla-inbound opt test plain-reftest-3 on
> 2013-10-29 02:40:34 PDT for push e09fa7cac06b
> 
> slave: tegra-146

On all the logs I've seen, only Android 2.2 Armv6 is affected. That is, only ARMv6 and no other devices. Is that true?
Flags: needinfo?(bugs) → needinfo?(ryanvm)
It affects all Tegra devices.
Flags: needinfo?(ryanvm)
Blocks: 952345
(In reply to Jet Villegas (:jet) from comment #153)

Any news here? We continue to hit this across all supported branches where Android tests run.
Flags: needinfo?(bugs)
(In reply to Mats Palmgren (:mats) from comment #9)
> FYI, here's a list of changes for end of September:
> http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2013-09-
> 20&enddate=2013-10-03

If we believe that the above range is our regression window, the sore thumb I see is bug 922007. 

Can we launch a few Android runs with these prefs set?
"layers.offmainthreadcomposition.enabled" == false
"layers.use-deprecated-textures" == true

I wouldn't be too surprised if the new textures and multi-threaded compositing introduced reftest timing changes for us.
Flags: needinfo?(bugs) → needinfo?(ryanvm)
(In reply to Jet Villegas (:jet) from comment #266)
> (In reply to Mats Palmgren (:mats) from comment #9)
> > FYI, here's a list of changes for end of September:
> > http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2013-09-
> > 20&enddate=2013-10-03
> 
> If we believe that the above range is our regression window, the sore thumb
> I see is bug 922007. 
> 
> Can we launch a few Android runs with these prefs set?
> "layers.offmainthreadcomposition.enabled" == false

At some point we only supported opengl compositing on android but I can't find the code that enforces that so maybe we have some support for basic layers now but I am not sure.

> "layers.use-deprecated-textures" == true

This pref is ignored everywhere except on windows now. The only behavior is to use new textures. windows will follow soon.
Blocks: 975053
(In reply to Jet Villegas (:jet) from comment #266)
> Can we launch a few Android runs with these prefs set?
> "layers.offmainthreadcomposition.enabled" == false

https://tbpl.mozilla.org/?tree=Try&rev=976a986ce77f

I'm not triggering a run with "layers.use-deprecated-textures" == true based on comment 268. Let me know if you want me to anyway. Note that Tegras are generally always backlogged and Try is very low on the totem pole priority-wise, so it's probably going to take awhile to get enough retriggers to run to draw any conclusions.
Flags: needinfo?(ryanvm)
Any thoughts on what we should do here.
Flags: needinfo?(roc)
We get into a loop of endless whole-canvas invalidation. With some logging it shouldn't be that hard to figure out what causes that.

This code doesn't look right:
http://dxr.mozilla.org/mozilla-central/source/layout/tools/reftest/reftest-content.js#918
    if (shouldSnapshotWholePage) {
I think we should actually be calling that function instead of testing whether it's defined :-)
Flags: needinfo?(roc)
That was added in bug 876626 by Matt. Matt, see previous comment ... are you interested in fixing that? :-)

With that fixed, we'll probably still have an infinite invalidation loop, but we'd be able to see which rects are being invalidated which would help narrow the problem down.
Flags: needinfo?(matt.woodrow)
Comment on attachment 8405505 [details] [diff] [review]
part 1 - Call shouldSnapshotWholePage rather than testing for its existence

This is going to need its own bug.
Attachment #8405505 - Attachment is obsolete: true
Attachment #8405505 - Flags: review?(roc)
bug 995410 is now landed on inbound.  So the next (or maybe slightly after, depending on merges) failure should have more useful logging info.
(In reply to Phil Ringnalda (:philor) from comment #354)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=38150498&tree=Mozilla-
> Inbound

So this log contains (after the first failure):

REFTEST TEST-UNEXPECTED-FAIL | http://10.26.84.3:30251/tests/layout/reftests/bugs/413292-1.html | load failed: timed out waiting for pending paint count to reach zero (waiting for MozAfterPaint)
REFTEST INFO | Saved log: START http://10.26.84.3:30251/tests/layout/reftests/bugs/413292-1.html
REFTEST INFO | Saved log: [CONTENT] OnDocumentLoad triggering WaitForTestEnd
REFTEST INFO | Saved log: [CONTENT] WaitForTestEnd: Adding listeners

After the second test failure the log looks similar to the second half there:
REFTEST INFO | Saved log: START http://10.26.84.3:30251/tests/layout/reftests/bugs/418766-1a.html
REFTEST INFO | Saved log: [CONTENT] OnDocumentLoad triggering WaitForTestEnd
REFTEST INFO | Saved log: [CONTENT] WaitForTestEnd: Adding listeners
REFTEST INFO | Saved log: Initializing canvas snapshot
REFTEST INFO | Saved log: DoDrawWindow 0,0,800,1000
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/418766-1a.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
REFTEST INFO | Saved log: [CONTENT] Rect: 0 0 800 743
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: DoDrawWindow 0,0,800,743
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/418766-1a.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/418766-1a.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects

That repeats a number of times.

REFTEST INFO | Saved log: Initializing canvas snapshot
REFTEST INFO | Saved log: DoDrawWindow 0,0,800,1000
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AttrModifiedListener fired
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/413292-1.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
REFTEST INFO | Saved log: [CONTENT] Rect: 0 0 800 743
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: DoDrawWindow 0,0,800,743
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in about:blank
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AttrModifiedListener fired
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/413292-1.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
REFTEST INFO | Saved log: [CONTENT] Rect: 8 8 312 162
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: DoDrawWindow 8,8,304,154
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in about:blank
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AttrModifiedListener fired
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/413292-1.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in data:text/html,iframe%201%20<iframe%20width='100'%20height='100'%20src='data:text/html,iframe%202'>
REFTEST INFO | Saved log: [CONTENT] flushWindow failed: TypeError: win.document.documentElement is null

Then that last section (the last 11 lines or so) are repeated a number of times, it looks like about 15 times total. Then the log contains these lines repeated ad infinitum:

REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AttrModifiedListener fired
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/413292-1.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects
REFTEST INFO | Saved log: Updating canvas for invalidation

After the second test failure the log contains:

REFTEST INFO | Saved log: START http://10.26.84.3:30251/tests/layout/reftests/bugs/418766-1a.html
REFTEST INFO | Saved log: [CONTENT] OnDocumentLoad triggering WaitForTestEnd
REFTEST INFO | Saved log: [CONTENT] WaitForTestEnd: Adding listeners
REFTEST INFO | Saved log: Initializing canvas snapshot
REFTEST INFO | Saved log: DoDrawWindow 0,0,800,1000
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/418766-1a.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
REFTEST INFO | Saved log: [CONTENT] Rect: 0 0 800 743
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: DoDrawWindow 0,0,800,743
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/418766-1a.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/418766-1a.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects

repeated many times.

dbaron: is that useful log spew?
Flags: needinfo?(matt.woodrow) → needinfo?(dbaron)
The cycle seems to be these 5 lines repeating:
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
> REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in http://10.26.84.3:30251/tests/layout/reftests/bugs/413292-1.html
> REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects
> REFTEST INFO | Saved log: Updating canvas for invalidation
Flags: needinfo?(dbaron) → needinfo?(matt.woodrow)
I'm not sure why this would happen.

When we check to see if a MozAfterPaint is pending (nsPresContext::IsDOMPaintEventPending) we insert an 'empty' (0 rect) invalidation if we're waiting on a refresh driver tick. This is because the tick might not actually invalidate anything, and we're promising a MozAfterPaint event will happen.

If for some reason the refresh driver is always pending (nsRefreshDriver::ViewManagerFlushIsPending), but we clear that, then we'd see this behaviour.

Either the refresh driver is ticking and something is causing it to re-set that state after each paint, or it's stopped with the state set.

In the latter case we'd fire MozAfterPaint anyway (nsPresContext::EnsureEventualDidPaint), but we'd then detect the refresh driver still being pending and leave IsDOMPaintEventPending returning true (and schedule the next eventual did paint).
Flags: needinfo?(matt.woodrow)
Inactive; closing (see bug 1180138).
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: