Closed Bug 1152051 Opened 5 years ago Closed 5 years ago

MOZ_CRASH() in mozilla::layers::Axis::GetOverscroll at Axis.cpp:220

Categories

(Core :: Panning and Zooming, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: gwagner, Assigned: botond)

Details

Attachments

(6 files, 1 obsolete file)

Attached file GDB
STR: on current trunk during monkey testing in accounts.firefox.com website
Attached file gdb for axis
Also seen in homescreen edit mode
[Parent 5025] ###!!! ASSERTION: GetOverscroll() (-18.670620) and first overscroll animation sample (6.224783) have different signs
: 'false', file ../../../gfx/layers/apz/src/Axis.cpp, line 219

For comment 2
(In reply to Gregor Wagner [:gwagner] from comment #1)
> gdb for axis

Thanks - this is quite helpful! Summarizing the relevant values:

  mOverscroll = 34
  mLastOverscrollPeak = 0
  mOverscrollScale = 1
  mFirstOverscrollAnimationSample = -5.8
  mVelocity = 2.4

This tells us that we've failed to detect a peak, even though one must have occurred, as the overscroll has changed sign since the first sample.
Analyzing the code some more, it looks like we can fail to detect a peak if one of the velocity samples is exactly zero.

Since our condition for detecting peaks is '(oldVelocity * mVelocity) < 0', if in one sample the velocity changes from say negative to zero, and on the next sample, from zero to positive, we fail to detect the peak on either sample.
Attached file MozReview Request: bz://1152051/botond (obsolete) —
/r/6749 - Bug 1152051 - During an overscroll animation, detect a peak even if a sample has a velocity of exaclty zero. r=Cwiiis

Pull down this commit:

hg pull -r 40d7d5dce4492fb3344b50f98930896aa61bfd27 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8589980 - Flags: review?(chrislord.net)
The patch fixes the problem and includes a gtest for exercising this code.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b3e6ec38a38b
Comment on attachment 8589980 [details]
MozReview Request: bz://1152051/botond

https://reviewboard.mozilla.org/r/6747/#review5785

Ship It!
Attachment #8589980 - Flags: review?(chrislord.net) → review+
https://hg.mozilla.org/mozilla-central/rev/6121b4a6c0c0
Assignee: nobody → botond
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
I'd like to uplift this fix to 2.2, but let's let is bake on m-c for a bit and get some confirmation that the assertion stopped firing.
Attached file gdb
I am still seeing this crash with current b2g-inbound trunk.
Flags: needinfo?(botond)
How current was your b2g-inbound trunk when you found this? The fix had only landed on m-c the previous day; it's possible it hadn't been merged to b2g-i yet at the time you pulled and built.
Flags: needinfo?(anygregor)
Attached file gdb 4/20 build
GDB of 4/20 build
Flags: needinfo?(anygregor)
This is still happening with todays b-i trunk.
Ok, thanks. Guess the patch didn't fix the problem them.

Looking at the values of the relevant variables from gdb again:

  mOverscroll = -4.3295083
  mLastOverscrollPeak = 0
  mOverscrollScale = 1
  mFirstOverscrollAnimationSample = 0.980073452
  mVelocity = -0.465937704

the problem is the same as in comment 4: we have failed to detect a peak, but, clearly, for some other reason than the one described in comment 5.
Status: RESOLVED → REOPENED
Flags: needinfo?(botond)
Resolution: FIXED → ---
I've set up monkey tests to run locally to reproduce this and be able to investigate with more information available.
I was able to reproduce this in the monkey test with some added logging, which revealed that something is setting the velocity to zero in between two samples of the overscroll animation.

The overscroll animation does not expect this (it expects that the outgoing velocity of one sample is equal to the incoming velocity of the next), and it fails to detect a peak as a result.

I audited the places in APZ code that set the velocity, and found the following:

  - As far as I can tell, there is no place where we set the velocity
    from outside the Sample() method of an animation, and allow an
    existing animation to continue and observe the changed velocity.
    In other words, I couldn't find a place where we "change the 
    velocity from under the animation's feet", even though this is
    what appears to be happening.

  - Accesses to Axis::mVelocity are not synchronized between the
    controller and compositor threads. *This* can give rise to the
    controller thread "changing the velocity from under an
    animation's feet" *if* the two threads are actually distinct.
    However, on B2G, the two threads are the same, so - while this
    is a problem we should fix - it can't be what's causing this bug.

I will continue to investigate to try to find the cause of this bug.
(In reply to Botond Ballo [:botond] from comment #18)
>   - Accesses to Axis::mVelocity are not synchronized between the
>     controller and compositor threads. *This* can give rise to the
>     controller thread "changing the velocity from under an
>     animation's feet" *if* the two threads are actually distinct.
>     However, on B2G, the two threads are the same, so - while this
>     is a problem we should fix - it can't be what's causing this bug.

Filed bug 1158970 for this.
(In reply to Botond Ballo [:botond] from comment #18)
> I was able to reproduce this in the monkey test with some added logging,
> which revealed that something is setting the velocity to zero in between two
> samples of the overscroll animation.

Further investigation revealed that this diagnosis was wrong. Nothing was setting the velocity to zero in between two samples of the same animation. Rather, something was clearing the animation, setting the velocity to zero, and starting a new overscroll animation, but it was failing to clear some state that led the animation (and me during my initial analysis) to believe the first sample of the new animation was in fact a subsequent sample of the previous one.

The solution should be simple: always clear the state when an animation is stopped.
Comment on attachment 8589980 [details]
MozReview Request: bz://1152051/botond

/r/6749 - Bug 1152051 - Ensure that destroying an overscroll animation always clears the state Axis tracks about it. r=Cwiiis

Pull down this commit:

hg pull -r 4b5df9acb0ad8e0093494dc1e9962cbca7289bcc https://reviewboard-hg.mozilla.org/gecko/
Attachment #8589980 - Flags: review+
(In reply to Botond Ballo [:botond] from comment #20)
> The solution should be simple: always clear the state when an animation is
> stopped.

The attached MozReview request implement this. I'll run the monkey test with it locally for a day or two to make sure it really did fix the problem, before posting the patch for review.
Attached patch GtestSplinter Review
Writing a gtest that triggered the right conditions for the assertion to fire was tricky, but I managed it.
(Posted as patch because MozReview is giving me trouble. As with the fix, I'll wait until the monkey test runs a while longer before posting for review.)
Haven't seen the crash in the monkey test with this patch! Try push is also clean. 

Posting patches for review.
Attachment #8589980 - Flags: review?(chrislord.net)
Comment on attachment 8589980 [details]
MozReview Request: bz://1152051/botond

/r/6749 - Bug 1152051 - Ensure that destroying an overscroll animation always clears the state Axis tracks about it. r=Cwiiis

Pull down this commit:

hg pull -r f3fc8ee4218de7b6efa62c82a27104d9dfa97b32 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8599630 - Flags: review?(chrislord.net)
Attachment #8599630 - Flags: review?(bugmail.mozilla)
Comment on attachment 8599630 [details] [diff] [review]
Gtest

Review of attachment 8599630 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/tests/gtest/TestAsyncPanZoomController.cpp
@@ +294,5 @@
> +    ParentLayerPoint pointOut;
> +    ViewTransform viewTransformOut;
> +    while (apzc->SampleContentTransformForFrame(testStartTime, &viewTransformOut, pointOut)) {
> +      // The reported scroll offset should be the same throughout.
> +      EXPECT_EQ(ParentLayerPoint(0, 90), pointOut);

The 0,90 should probably be a parameter to this function, since it's a helper.
Attachment #8599630 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8599630 [details] [diff] [review]
Gtest

Review of attachment 8599630 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm.

::: gfx/tests/gtest/TestAsyncPanZoomController.cpp
@@ +1173,5 @@
>  
> +void APZCBasicTester::PanIntoOverscroll(int& aTime)
> +{
> +  int touchStart = 500;
> +  int touchEnd = 10;

To my shame, I don't really know much about the apzc tests, but are these values guaranteed to always get into overscroll? They seem arbitrary.
Attachment #8599630 - Flags: review?(chrislord.net) → review+
(In reply to Chris Lord [:cwiiis] from comment #29)
> Comment on attachment 8599630 [details] [diff] [review]
> Gtest
> 
> Review of attachment 8599630 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> lgtm.
> 
> ::: gfx/tests/gtest/TestAsyncPanZoomController.cpp
> @@ +1173,5 @@
> >  
> > +void APZCBasicTester::PanIntoOverscroll(int& aTime)
> > +{
> > +  int touchStart = 500;
> > +  int touchEnd = 10;
> 
> To my shame, I don't really know much about the apzc tests, but are these
> values guaranteed to always get into overscroll? They seem arbitrary.

They are in APZCBasicTester and derived fixtures because the APZC in those has a scrollable range of 100 [1] [2].

(We could make this better by coordinating these two numbers, i.e. using 'SCROLLABLE_HEIGHT + 400' instead of '500' here).

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/tests/gtest/TestAsyncPanZoomController.cpp?rev=4dca08bb090a#232
[2] http://mxr.mozilla.org/mozilla-central/source/gfx/tests/gtest/TestAsyncPanZoomController.cpp?rev=4dca08bb090a#257
Thanks for the review of the test, Chris!

I apologize if the mixture of MozReview and regular patches in this bug in confusing, but I'm also looking for a review on the actual fix, which is the MozReview request :)
Flags: needinfo?(chrislord.net)
Attachment #8589980 - Flags: review?(chrislord.net) → review+
Comment on attachment 8589980 [details]
MozReview Request: bz://1152051/botond

https://reviewboard.mozilla.org/r/6747/#review6787

Ship It!
Sorry, missed the other r? - Lgtm :)
Flags: needinfo?(chrislord.net)
https://hg.mozilla.org/mozilla-central/rev/c7b25a872235
https://hg.mozilla.org/mozilla-central/rev/e1b56bd22b3a
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Attachment #8589980 - Attachment is obsolete: true
Attachment #8619982 - Flags: review+
You need to log in before you can comment on or make changes to this bug.