Closed Bug 1351038 Opened 8 years ago Closed 8 years ago

Intermittent test_animations_playbackrate.html | animation with zero playback rate should stay in the same position and be running on the main thread: should NOT be animating on compositor

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: birtles)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(1 file)

Other frequent intermittents started in similar tests around the same time....
See Also: → 1351039, 1351042
:dao - Are these bugs regressions from bug 759252? Do you know what is going wrong?
Flags: needinfo?(dao+bmo)
(In reply to Geoff Brown [:gbrown] from comment #3) > :dao - Are these bugs regressions from bug 759252? Do you know what is going > wrong? I'm guessing the test falls over because we use CSS animations in the browser UI. Layout folks will need to take a closer look.
Flags: needinfo?(dao+bmo)
:jet, as the triage owner here, could you help find someone to look at this?
Flags: needinfo?(bugs)
Whiteboard: [stockwell needswork]
Brian: can someone on your team have a look at this one?
Flags: needinfo?(bugs) → needinfo?(bbirtles)
Looking at the log, it appears we're sending this animation to the compositor when we shouldn't. In bug 1305325 we changed the behavior so that even if the playback rate is zero, we might sometimes send the animation to the compositor (specifically https://hg.mozilla.org/mozilla-central/rev/60857c37bcfac0cac3609b146de0306b7699ba49). If another compositor animation happens to be running/scheduled on the same element I can see how we could end up sending it to the compositor, but bug 759252 shouldn't cause that that since those animations should be running on a different element in a different document. (The reverse case where we fail to send animations to the compositor would be more understandable since we do synchronize animations on independent elements that start at the same time.) At this stage I'm inclined to suspect either something is amiss in FindAnimationsForCompositor (e.g. we're finding animations in the wrong place -- wrong frame?) or something is wrong in our OMTA reporting (e.g. we're not actually sending the animation to the compositor but some stale state there thinks we are?).
Oh, looking at that test more closely it appears that we set the playbackRate mid-way through. So this is a failure to pull the animation off the layer.
Oh, and this only happens on e10s, and only debug/asan builds. I forget how this works but it does seem like we're hitting a race particularly when the machine is under load (perhaps the spinner just exacerbates the problem). I notice we fail to update certain state from nsDisplayList when there are no compositor animations but that might be ok. I'll look into other tests to see if we've come across this before. I can't reproduce locally so I sent a patch with extra debugging output to try in order to understand better what is happening: https://treeherder.mozilla.org/#/jobs?repo=try&revision=07712ed08a99f3b5080aac8d7e4f81633cbe8fe0
Hmm, this comment from file_animations_effect_timing_duration.html#77 seems relevant: // Setter of timing option should set up the changes to animations for the // next layer transaction but it won't schedule a paint immediately so we need // to tick the refresh driver before we can wait on the next paint. advance_clock(0);
We have the same pattern (and explanation) elsewhere including: file_animations_effect_timing_iterations.html#55 file_animations_pausing.html#56 So I think all this is, is a case of us getting lucky until now. Adding an advance_clock(0) here should hopefully do the trick.
Assignee: nobody → bbirtles
Status: NEW → ASSIGNED
Flags: needinfo?(bbirtles)
Unpatched try run (so we can get a sense for how many retriggers are needed to reliably reproduce): https://treeherder.mozilla.org/#/jobs?repo=try&revision=236f6358b987e7b4b9bc88bade16288cd2c5a350 Try run with advance_clock(0) added: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8ec921bdfe48113e7cbf785da29e31ff2c741707 Try run with advance_clock(0) added and waitForPaintsFlushed() converted to waitForPaints(): https://treeherder.mozilla.org/#/jobs?repo=try&revision=07aa2d83ac2e37e4d74fbc65febc33b95637b3b8 (I suspect we don't need to flush, but flushing was causing us to get lucky more often because some other change in the doc was causing us to schedule a paint.)
I agree that one of the failure reason is the lack of advance_clock(0). But it seems to me there is another issue because bug 1351042 (file_animations_iterationstart.html) does correctly call advance_clock there.
It seems like we have similar problems in: test_animations_effect_timing_enddelay.html test_animations_iterationstart.html The conditions seem to largely correspond to the cases we started sending to the compositor in bug 1305325 (zero-playback rate animations, animations in their end delay, animations filling forwards) so I'm a bit suspicious that something else might be amiss.
(In reply to Brian Birtles (:birtles) from comment #16) > It seems like we have similar problems in: > > test_animations_effect_timing_enddelay.html > test_animations_iterationstart.html > > The conditions seem to largely correspond to the cases we started sending to > the compositor in bug 1305325 (zero-playback rate animations, animations in > their end delay, animations filling forwards) so I'm a bit suspicious that > something else might be amiss. A possible reason what I can think of is that we got incorrect MozAfterPaint due to the throbber transform animation. See bug 1341294.
Ah, but this failure only happens on e10s...
Also, the advance_clock(0) does NOT appear to fix the issue. (At this stage, however, the try run with *both* advance_clock(0) added and waitForPaintsFlushed() converted to waitForPaints() seems not to have failed test_animations_playbackrate.html. That might be just luck, though, or a slightly faster machine. Or perhaps flushing influences this somehow.)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #18) > Ah, but this failure only happens on e10s... Actually, looking it to that bug, this seems likely to me, especially in e10s?
Oh, right. Compositor might be confused that a paint which caused by throbber is for *content* document?
Yeah, that's my suspicion at the moment. I don't have a really good theory for it though. I've pushed another patch with logging of transaction IDs to try[1] so I'll see if that helps explain it. Otherwise, it's nearly the weekend here so perhaps we should just disable those four tests by doing s/MainThread/Either/ with a comment to a bug where we can investigate further. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=766c0de3f3c012f17090f21025cecec1808dfd73
See Also: → 1352336
(In reply to Brian Birtles (:birtles) from comment #22) > Yeah, that's my suspicion at the moment. I don't have a really good theory > for it though. I've pushed another patch with logging of transaction IDs to > try[1] so I'll see if that helps explain it. Otherwise, it's nearly the > weekend here so perhaps we should just disable those four tests by doing > s/MainThread/Either/ with a comment to a bug where we can investigate > further. > > [1] > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=766c0de3f3c012f17090f21025cecec1808dfd73 That log didn't provide anything useful. (Unless the "Not taking screenshot here: see the one that was previously logged" message is relevant?)
Comment on attachment 8853309 [details] Bug 1351038 - Disable failing tests in test_animations_playbackrate.html, test_animations_iterationstart.html, and test_animations_effect_timing_enddelay.html; https://reviewboard.mozilla.org/r/125394/#review127968
Attachment #8853309 - Flags: review?(hikezoe) → review+
That's really weird. After disabling those three problematic tests, I see a new failure: https://treeherder.mozilla.org/logviewer.html#?job_id=87808294&repo=try&lineNumber=2824 This one is, perhaps, slightly different. If I'm reading the log correctly, like other cases we are also failing to remove the animation from the compositor in time, but furthermore we are failing to update style on the main thread. Perhaps we are failing to schedule a paint when the animation finishes? I think the best thing to do for now, is to land attachment 8853309 [details] AND also disable test_animations_effect_timing_enddelay.html. Being the weekend I'll just go ahead and disable test_animations_effect_timing_enddelay.html with r=me for now.
Pushed by bbirtles@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/355a5b1ce195 Disable failing tests in test_animations_playbackrate.html, test_animations_iterationstart.html, and test_animations_effect_timing_enddelay.html; r=hiro https://hg.mozilla.org/integration/mozilla-inbound/rev/42d522a5f9b5 Completely disable test_animations_effect_timing_enddelay.html; r=me
One thing I just realized is that we don't set chrome refresh driver as testing mode when we set content refresh driver as testing mode. Right? In this circumstance what happens?
See Also: → 1352714
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(In reply to Hiroyuki Ikezoe (:hiro) from comment #29) > One thing I just realized is that we don't set chrome refresh driver as > testing mode when we set content refresh driver as testing mode. Right? In > this circumstance what happens? Right, I'm really not sure. Let's take this discussion to bug 1352336.
Whiteboard: [stockwell needswork] → [stockwell disabled]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: