Closed Bug 1145327 Opened 5 years ago Closed 5 years ago

css-animations/print-no-animations.html failing with the vsync refresh driver on os x

Categories

(Core :: Graphics, defect, P2)

x86_64
All
defect

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: mchang, Assigned: mchang)

References

()

Details

Attachments

(7 files, 10 obsolete files)

2.36 KB, text/plain
Details
1.16 KB, patch
mchang
: review+
Details | Diff | Splinter Review
1.28 KB, patch
mchang
: review+
dbaron
: review+
Details | Diff | Splinter Review
1.22 KB, patch
Details | Diff | Splinter Review
1.37 KB, patch
vlad
: review+
Details | Diff | Splinter Review
2.26 KB, patch
mchang
: review+
Details | Diff | Splinter Review
3.38 KB, patch
mchang
: review+
Details | Diff | Splinter Review
Interestingly, the reference and test file colors are inverted. The original commit of the test is: http://hg.mozilla.org/mozilla-central/rev/14b9894272cb
Attached file Frame tree of test
This is a very weird case. The reference image is actually just a div with the color olive [1]. In the reftest failure [2], the image is blue when the color blue isn't even in the test case at all. It almost seems like something is cached and we're displaying that.

From my reading of the code, the snapshot from the reftest framework actually doesn't composite, but just draws into a buffer and snapshots that. We don't force or flush composites, just paints, which should force us to redraw. I don't know where the color blue is coming from unless its leftover from the previous test case.

Also, from the original commit of the tests, I verified that we never return early due to a non-dynamic pres context. 

[1] https://dxr.mozilla.org/mozilla-central/source/layout/reftests/css-animations/print-no-animations-ref.html?from=print-no-animations-ref.html%20&case=true#1
[2] https://treeherder.mozilla.org/logviewer.html#?job_id=7810504&repo=mozilla-inbound
Hi David,

This test only fails with bug 1144321, which ticks the refresh drivers on vsync instead of software timers. There shouldn't be any good reason why ticking the refresh driver on a different interval should cause this bug unless there is an already existing race condition, which has been the case with every bug related to the vsync refresh driver.

Can I disable this test for now and re-enable it once I figure out what's going on? I'd like to land the vsync refresh driver on OS X before the branch date. Thanks!
Attachment #8580520 - Flags: review?(dbaron)
Test occurs on hg m-c 0a11b73c77b7 just for future reference.
Comment on attachment 8580520 [details] [diff] [review]
Disable css-animations/print-no-animations.html on OS X 10.10

r=dbaron if you use random-if() instead of skip-if(), and use == instead of =.
Attachment #8580520 - Flags: review?(dbaron) → review+
Comment on attachment 8580923 [details] [diff] [review]
Mark css-animations/print-no-animations.html as random on OS X

From irc, it's ok.
Attachment #8580923 - Flags: review?(dbaron) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/ea0035a9e043

Also it seems having random-if(cocoaWidget) fails doesn't actually work on the reftest-list. Needed random-if(cocoaWidget) fails-if(!cocoaWidget).
(In reply to Mason Chang [:mchang] from comment #9)
> Also it seems having random-if(cocoaWidget) fails doesn't actually work on
> the reftest-list.

Yeah, this is a bit counterintuitive. If you have two annotations on a single test, and they imply conflicting things, then the later one wins. This is semi-documented in REFTEST.txt: " later items within a line override earlier ones." http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/README.txt?mark=313-313#311

(That bit me in the past, and I filed bug 1054376 before realizing what was going on.)
I'm still not sure what's going on but it's interesting. What it looks like is that we start an animation in the nsAnimationManager. The test file has one animation which creates a blue line of text. This animation starts on the test file. When we load the reference file, which has no animations, the animation from the previous test is still occurring and running when we take the reference snapshot of the reference test. Thus the animation is what is getting snapshotted rather than the new page. The nsAnimationManager also listens to the refresh driver, which is probably why this is a weird race condition depending on when we tick the refresh driver.
We also get the inverse where the animation ends in the test file, causing the test file to be olive because the nsAnimationManager thinks there are no animations when the snapshot occurs in the reftest framework.
From what I've found out today, we send out an NS_ANIMATION_START message here[1]. After we send out the start message, we load the print-no-animations-ref.html file, which has no animations. At that point, we disconnect all animations at the CommonAnimationManager[2]. During this time, the nsRefreshDriver has not ticked.

On the passing cases, we've been able to tick the refresh driver a couple of times after the NS_ANIMATION_START. We also tick it once after the reftest loads the reference file, then disconnect all the animations. 

@Brian - Can you think of something leftover that we might not clean up if we start an animation then disconnect the CommonAnimationManager before the refresh driver ticks? Thanks!

[1] https://dxr.mozilla.org/mozilla-central/source/layout/style/nsAnimationManager.cpp#763
[2] https://dxr.mozilla.org/mozilla-central/source/layout/style/AnimationCommon.cpp#67
Flags: needinfo?(bbirtles)
I don't understand what reftest-print is doing. What causes the animation to end in comment 13? Do we make the prescontext non-dynamic?

After cancelling the animations we will still have a pending restyle which will mean we have an refresh driver observer.

You could try applying the patches from bug 1117603 (which got backed out) which make us unregister from the refresh driver a little less aggressively.
Flags: needinfo?(bbirtles) → needinfo?(mchang)
(In reply to Brian Birtles (:birtles) from comment #15)
> I don't understand what reftest-print is doing. What causes the animation to
> end in comment 13? Do we make the prescontext non-dynamic?
> 
> After cancelling the animations we will still have a pending restyle which
> will mean we have an refresh driver observer.
> 
> You could try applying the patches from bug 1117603 (which got backed out)
> which make us unregister from the refresh driver a little less aggressively.

I still don't really understand how we end animations yet, I was trying to find it. I just instrumented[1], but couldn't find anything yet. Also, I notice that we dispatch some DOM messages like animationstart and animationend to JS. Could that cause some timing issue in JS or are all animations controlled in the nsAnimationManager? Like if we reload the page before JS finishes processing the animationstart message?

[1]https://dxr.mozilla.org/mozilla-central/source/layout/style/nsAnimationManager.cpp#142
Flags: needinfo?(mchang)
Blocks: 1146014
See Also: → 1050323
See Also: → 1043078
From irc, many of the CSS animation failures that are popping up are due to a race condition where we can sometimes send a backwards in time TimeStamp in the nsRefreshDriver. With the vsync refresh driver, we tick with the vsync timestamp. However, we post a runnable from the vsync thread to the main thread to tick the refresh driver. If between the time when we post a runnable from the vsync thread and when the runnable actually runs, another refresh driver tick has occurred, we would tick the refresh driver with a timestamp that is in the past relative to the most recent tick.

With this patch, we always choose the latest timestamp to tick the refresh driver. This means we can possibly tick the refresh driver twice with the same timestamp, but we sometimes already do this.

From try, it looks like it resolves most of the CSS animation issues - https://treeherder.mozilla.org/#/jobs?repo=try&revision=ed7ab42ef5d2
Attachment #8581697 - Flags: review?(bbirtles)
After using these patches which clamps the refresh driver timestamps, I can no longer reproduce these failures locally which were occurring 1/3 times. I ran these tests 20 times with no failures, so let's re-enable these tests.
Attachment #8581698 - Attachment is obsolete: true
Attachment #8581698 - Flags: review?(bbirtles)
Attachment #8581705 - Flags: review?(bbirtles)
Blocks: 1144317
Comment on attachment 8581705 [details] [diff] [review]
Part 2: Add a test to ensure that requestAnimationFrame timestamps are always increasing

Review of attachment 8581705 [details] [diff] [review]:
-----------------------------------------------------------------

One nit on the test here:

::: dom/animation/test/chrome/test_request_animation_frame.html
@@ +12,5 @@
> +  // Test that all timestamps are either in the future or now
> +  // and does not ever go backwards
> +  function rafCallback(aTimestamp) {
> +    SimpleTest.ok(aTimestamp >= lastRequestAnimationFrameTimestamp,
> +      "New RequestAnimationFrame timestamp is later than the previous RequestAnimationFrame timestamp");

Nit: I'd suggest rewording this message, because
 (1) people will only look at it when it fails, and
 (2) in that scenario, the statement in the message is exactly wrong.

Perhaps s/is later/should be later/ ?
Updated with feedback from comment 22. Thanks for pointing that out!
Attachment #8581705 - Attachment is obsolete: true
Attachment #8581705 - Flags: review?(bbirtles)
Attachment #8581753 - Flags: review?(dholbert)
Attachment #8581753 - Flags: review?(bbirtles)
Comment on attachment 8581753 [details] [diff] [review]
Part 2: Add a test to ensure that requestAnimationFrame timestamps are always increasing

Review of attachment 8581753 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/animation/test/chrome/test_request_animation_frame.html
@@ +1,3 @@
> +<!DOCTYPE html>
> +<meta charset=utf-8>
> +<title>Test chrome-only MutationObserver animation notifications</title>

This title might need an update. (I'm not sure what MutationObserver has to do with the test - maybe this was copypasted accidentally?)

Also: does this really need to be a chrome mochitest? Despite the title, I don't see it doing any chrome-only stuff, so I think it could just be a normal mochitest, and hence should live in a different directory -- maybe /dom/animation/test/animation-timeline?

@@ +1,5 @@
> +<!DOCTYPE html>
> +<meta charset=utf-8>
> +<title>Test chrome-only MutationObserver animation notifications</title>
> +<script src="chrome://mochikit/content/tests/SimpleTest/SimpleTest.js"></script>
> +<script src="chrome://mochikit/content/tests/SimpleTest/EventUtils.js"></script>

I don't think you use EventUtils.js in this test at all (you're not synthesizing mouse events or anything), so you should probably remove this script tag.

@@ +5,5 @@
> +<script src="chrome://mochikit/content/tests/SimpleTest/EventUtils.js"></script>
> +<link rel="stylesheet" type="text/css" href="chrome://mochikit/content/tests/SimpleTest/test.css">
> +<script>
> +  var lastRequestAnimationFrameTimestamp = 0;
> +  var requestAnimationFrameCount = 100;

So, this means we'll call requestAnimationFrameCount 100 times, which means this test will take ~2 seconds, which is kind of a long time for a test. (It's OK if the test *needs* to be that long, but I'm not sure this test needs to be that long.)  Tests pile up, so it's nice to avoid having a single test eat up time, unnecessarily.

Could we reduce this count to, say, 20 or 30?

@@ +9,5 @@
> +  var requestAnimationFrameCount = 100;
> +  var currentCount = 0;
> +
> +  // Test that all timestamps are either in the future or now
> +  // and does not ever go backwards

s/does/do/  ("timestamps...do")
Updated with feedback from comment 24. Moved the file to animation-timelines, removed unnecessary includes, and lowered the requestAnimationFrame count to 20.
Attachment #8581753 - Attachment is obsolete: true
Attachment #8581753 - Flags: review?(dholbert)
Attachment #8581753 - Flags: review?(bbirtles)
Attachment #8581812 - Flags: review?(dholbert)
Attachment #8581812 - Flags: review?(bbirtles)
Updated with some nits in the comment regarding refresh driver timestamp clamping.
Attachment #8581697 - Attachment is obsolete: true
Attachment #8581697 - Flags: review?(bbirtles)
Attachment #8581813 - Flags: review?(bbirtles)
Comment on attachment 8581812 [details] [diff] [review]
Part 2: Add a test to ensure that requestAnimationFrame timestamps are always increasing

Review of attachment 8581812 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me!
Attachment #8581812 - Flags: review?(dholbert) → review+
Comment on attachment 8581813 [details] [diff] [review]
Part 1: Clamp Refresh Driver Timestamps to be monotonically increasing

Review of attachment 8581813 [details] [diff] [review]:
-----------------------------------------------------------------

This looks mostly fine. One nit is that I wonder if we really need the bug number references. The second reference should probably be bug 1028511 anyway. For the last sentence I'd probably just say, "To prevent the refresh driver time from going backwards we clamp to use the most recent timestamp just like we do when changing timers in EnsureTimerStarted." or something like that.

Also, Vlad should review this.

::: layout/base/nsRefreshDriver.cpp
@@ +1491,5 @@
>      // be completed (on the Compositor). Mark that we missed the paint
>      // and keep waiting.
>      return;
>    }
> +  mMostRecentTick = std::max(mMostRecentTick, aNowTime);

I think this could be mMostRecentTick = mMostRecentRefresh?

Since it seems like mMostRecentRefresh is always >= mMostRecentTick
Attachment #8581813 - Flags: review?(vladimir)
Attachment #8581813 - Flags: review?(bbirtles)
Attachment #8581813 - Flags: review+
Comment on attachment 8581812 [details] [diff] [review]
Part 2: Add a test to ensure that requestAnimationFrame timestamps are always increasing

Review of attachment 8581812 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me (except one nit below).

One question, however, does this test fail without part 1?

::: dom/animation/test/animation-timeline/test_request_animation_frame.html
@@ +11,5 @@
> +  // Test that all timestamps are either in the future or now
> +  // and do not ever go backwards
> +  function rafCallback(aTimestamp) {
> +    SimpleTest.ok(aTimestamp >= lastRequestAnimationFrameTimestamp,
> +      "New RequestAnimationFrame timestamp should be later than the previous RequestAnimationFrame timestamp");

"later than or equal to"?
Attachment #8581812 - Flags: review?(bbirtles) → review+
(In reply to Brian Birtles (:birtles) from comment #29)
> Comment on attachment 8581812 [details] [diff] [review]
> Part 2: Add a test to ensure that requestAnimationFrame timestamps are
> always increasing
> 
> Review of attachment 8581812 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good to me (except one nit below).
> 
> One question, however, does this test fail without part 1?

I can't get it to fail without part 1 locally but I could get the asserts to fail everywhere including locally when checking within the refresh driver itself that it did not go backwards. The failures locally  happened because we created the refresh driver at [1] before ticking the refresh driver.

[1] https://dxr.mozilla.org/mozilla-central/source/layout/base/nsRefreshDriver.cpp#1031
Comment on attachment 8581813 [details] [diff] [review]
Part 1: Clamp Refresh Driver Timestamps to be monotonically increasing

Cancelling review request for now since we are considering a different approach
Attachment #8581813 - Flags: review?(vladimir)
(In reply to Mason Chang [:mchang] from comment #20)
> @Brian - From
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=ed7ab42ef5d2
> 
> We still have two intermittents, like such -
> https://treeherder.mozilla.org/logviewer.html#?job_id=5859681&repo=try
> 
> Is the assertion at
> https://dxr.mozilla.org/mozilla-central/source/dom/animation/test/css-
> animations/test_animations-dynamic-changes.html?from=test_animations-dynamic-
> changes.html%20&case=true#43 still valid if we can tick the refresh driver
> twice with the same timestamp?

We discussed this briefly on IRC. Returning the same timestamp twice might be problematic if content in the wild is assuming that the times returned always increase (especially, e.g. if they are dividing by the difference in time since they'll get a divide by zero error). We already have this problem when switching timers but that doesn't happen very often so it's possible that such content has never hit that condition (and we should probably fix that case anyway).

We discussed a few options for fixing this:
a) When new time < old time, update the current time to old time + 1ms.
b) Simply skip the refresh driver tick when new time < old time.

Mason's going to try (b) for now since it potentially saves busy work and we already have cases where we abandon ticks (because we're waiting for paints etc.).
Flags: needinfo?(bbirtles)
From irc, we could have an issue with the approach in bug 8581813 where we give the same timestamp to requestAnimationFrames. Instead, if we get a timestamp that is in the past, skip this refresh driver tick. This should be safe since everyone assumes we have to wait for a refresh driver tick sometime anyway and we should've just ticked if we hit this case.
Attachment #8582104 - Flags: review?(bbirtles)
Updated to skip refresh driver ticks in <= timestamps, not just <.
Attachment #8582104 - Attachment is obsolete: true
Attachment #8582104 - Flags: review?(bbirtles)
Attachment #8582106 - Flags: review?(bbirtles)
Comment on attachment 8582106 [details] [diff] [review]
Part 1: Skip refresh driver tick if timestamp is in the past

Review of attachment 8582106 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me but I'd like Vlad or someone else who's more familiar with this to have a look though.
Attachment #8582106 - Flags: review?(vladimir)
Attachment #8582106 - Flags: review?(bbirtles)
Attachment #8582106 - Flags: review+
Updated to not skip ticks if the refresh driver is in test mode.

Successful try - https://treeherder.mozilla.org/#/jobs?repo=try&revision=d375923967da
Attachment #8581813 - Attachment is obsolete: true
Attachment #8582106 - Attachment is obsolete: true
Attachment #8582106 - Flags: review?(vladimir)
Attachment #8582444 - Flags: review?(vladimir)
Carrying r+, with Part 1, rAF should always be increasing, not >=.
Attachment #8581812 - Attachment is obsolete: true
Attachment #8582534 - Flags: review+
See Also: → 1146692
See Also: 1146692
Rebased for v2.2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 1123762, Project Silk
User impact if declined: A requestAnimationFrame or other animations can have timestamps in the past, which may make the animation janky or not start at all.
Testing completed: Manual, this patch also includes a new mochitest test for this bug. 
Risk to taking this patch (and alternatives if risky): Low - This just enforces an assumption that refresh driver ticks should always go forward in time.
String or UUID changes made by this patch: None
Attachment #8582832 - Flags: review+
Attachment #8582832 - Flags: approval-mozilla-b2g37?
Keywords: leave-open
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Attachment #8582832 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
You need to log in before you can comment on or make changes to this bug.