Crash in mozilla::layers::MaybeTimeDuration::AssertSanity

RESOLVED FIXED in Firefox 54

Status

()

defect
--
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: marcia, Assigned: birtles)

Tracking

({crash, regression})

54 Branch
mozilla55
All
Windows
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53 unaffected, firefox54+ fixed, firefox55 fixed)

Details

(crash signature)

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is 
report bp-85e0e25b-b561-4641-97d3-08b870170508.
=============================================================

Seen while looking at nightly crash stats - crashes started in 55 using 20170506030204:

Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0b255199db9d6a6f189b89b7906f99155bde3726&tochange=37a5b7f6f101df2eb292b1b6baaf1540c9920e20

Comment 1

2 years ago
this is now starting to show up on 54.0b8 as well, so probably something triggering this got uplifted. i'm seeing crashes already starting on nightly build 20170503030212. common landings from that range and in 54.0b8 would be from these two bugs: 
* Bug 1334583 - WebAnimation crash: Assertion failure: !IsNull() (Cannot compute with a null value) [@mozilla::EnableIf]
* Bug 1355480 - Add telemetry for measuring the average load caused by timeouts
Keywords: regression
OS: Windows 10 → Windows
Hardware: Unspecified → All

Comment 2

2 years ago
[Tracking Requested - why for this release]:
this is a late regression in this beta cycle and the top browser crash in early data from 54.0b8

Updated

2 years ago
Blocks: 1334583
Flags: needinfo?(bbirtles)
Version: 55 Branch → 54 Branch
Tracking as top crash in 54.0b8.
Per comment #2, this is late regression, we need to address it ASAP.

Hi :farre, 
Can you help check if this top crash is caused by bug 1355480?

Hi :bbirtles,
Can you help check if this is caused by bug 1334583?
Flags: needinfo?(afarre)
Bug 1334583 seems more likely.
Flags: needinfo?(afarre)
Assignee

Comment 6

2 years ago
I'll look into this today. For what it's worth, it would be fine to back bug 1334583 out of beta if we don't find a fix soon (and may be safer to do that anyway).
Flags: needinfo?(bbirtles)
Assignee

Comment 7

2 years ago
At a glance, I suspect the call to get_TimeDuration() in AnimationHelper::SampleAnimationForEachNode is failing because the TimeDuration is null. There's an assertion earlier in the function that checks the TimeDuration is non-null but it's debug-only.

I need to look into why we might end up getting here with a null start time (perhaps we failed to update the start time of a pending animation?). Given that bug 1334583 shouldn't really have changed the logic here, I wonder if this was an existing problem that has simply been exposed because MaybeTimeDuration uses MOZ_RELEASE_ASSERT when previously the only things that would have failed were debug-only MOZ_ASSERTs.
Assignee

Comment 8

2 years ago
These crashes all seem to happen on Windows and in at least one case I see the following sort of errors in the log:

  |[0][GFX1-]: GPU process disabled after 1 attempts (t=689.365) |[1][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[2][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[3][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[4][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[5]CP+[GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[6][GFX1-]: [D3D11] failed to get compositor device. (t=689.365) |[7][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=689.365) |[8][GFX1-]: Compositors might be mixed (4,1) (t=689.365) |[9][GFX1-]: [D3D11] failed to get compositor device. (t=689.609) |[10][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=689.609) |[11][GFX1-]: [D3D11] failed to get compositor device. (t=697.503) |[12][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=697.503) |[13][GFX1-]: [D3D11] failed to get compositor device. (t=698.379) |[14][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=698.379) 

My guess is that in some cases we have some sort of early return that means we fail to call StartPendingAnimations and we end up with pending animations on the compositor. That's probably a bug that it ever happens, but I'd rather not try and fix that underlying bug here since I'd only be guessing about what happens. Instead, we already have assertions that will fail if that situation ever reproduces in a debug build so I'd rather wait until that happens before trying to fix the underlying bug.

Instead, in this bug I think we should just harden the code in AnimationHelper::SampleAnimationForEachNode to make sure even in release builds it checks that the MaybeTimeDuration is non-null before trying to read its TimeDuration value.
Assignee: nobody → bbirtles
Status: NEW → ASSIGNED
Assignee

Comment 10

2 years ago
Posted patch PatchSplinter Review
Attachment #8868813 - Flags: review?(hikezoe)
Comment on attachment 8868813 [details] [diff] [review]
Patch

Thanks for the detailed commit message.  It makes sense to me.
I also suspect there is a bug when a bad thing happens on the GPU process.

To clarify StartPendingAnimations() is failed to be called, how about adding an assertion 'MOZ_ASSERT(anim.isNotPlaying() || animation.startTime().type() == MaybeTimeDuration::TTimeDuration' after updating startTime() in StartPendingAnimations()?
Attachment #8868813 - Flags: review?(hikezoe) → review+
Assignee

Comment 12

2 years ago
(In reply to Hiroyuki Ikezoe (:hiro) from comment #11)
> Comment on attachment 8868813 [details] [diff] [review]
> Patch
> 
> Thanks for the detailed commit message.  It makes sense to me.
> I also suspect there is a bug when a bad thing happens on the GPU process.
> 
> To clarify StartPendingAnimations() is failed to be called, how about adding
> an assertion 'MOZ_ASSERT(anim.isNotPlaying() || animation.startTime().type()
> == MaybeTimeDuration::TTimeDuration' after updating startTime() in
> StartPendingAnimations()?

I'm not sure I follow. As in:

          // If the animation is play-pending, resolve the start time.
          if (anim.startTime().type() == MaybeTimeDuration::Tnull_t &&
              !anim.originTime().IsNull() &&
              !anim.isNotPlaying()) {
            TimeDuration readyTime = aReadyTime - anim.originTime();
            anim.startTime() =
              anim.playbackRate() == 0
              ? readyTime
              : readyTime - anim.holdTime().MultDouble(1.0 /
                                                       anim.playbackRate());
            updated = true;
          }
          MOZ_ASSERT(anim.isNotPlaying() ||
                     anim.startTime.type() == MaybeTimeDuration::TTimeDuration);

?
(In reply to Brian Birtles (:birtles) from comment #12)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #11)
> > Comment on attachment 8868813 [details] [diff] [review]
> > Patch
> > 
> > Thanks for the detailed commit message.  It makes sense to me.
> > I also suspect there is a bug when a bad thing happens on the GPU process.
> > 
> > To clarify StartPendingAnimations() is failed to be called, how about adding
> > an assertion 'MOZ_ASSERT(anim.isNotPlaying() || animation.startTime().type()
> > == MaybeTimeDuration::TTimeDuration' after updating startTime() in
> > StartPendingAnimations()?
> 
> I'm not sure I follow. As in:
> 
>           // If the animation is play-pending, resolve the start time.
>           if (anim.startTime().type() == MaybeTimeDuration::Tnull_t &&
>               !anim.originTime().IsNull() &&
>               !anim.isNotPlaying()) {
>             TimeDuration readyTime = aReadyTime - anim.originTime();
>             anim.startTime() =
>               anim.playbackRate() == 0
>               ? readyTime
>               : readyTime - anim.holdTime().MultDouble(1.0 /
>                                                        anim.playbackRate());
>             updated = true;
>           }
>           MOZ_ASSERT(anim.isNotPlaying() ||
>                      anim.startTime.type() ==
> MaybeTimeDuration::TTimeDuration);
> 
> ?

Yes, this is exactly what I meant.  Redundant?
Assignee

Comment 14

2 years ago
It seems a little redundant to me although it might become more interesting once we add new types of timelines (where originTime could possibly be null for a playing animation). For now, though, I'd like to keep this patch as simple as possible so there is less risk of uplifting it.

Comment 15

2 years ago
Pushed by bbirtles@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/515fbfec6ab5
Check if the startTime is set before using it in SampleAnimationForEachNode; r=hiro

Comment 16

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/515fbfec6ab5
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee

Comment 17

2 years ago
Comment on attachment 8868813 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/Bug causing the regression]: bug 1334583
[User impact if declined]: Top crasher on Windows
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: No. I'm still waiting for Nightly to pick it up but I'm filing this preemptively since it's nearly the weekend here.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: Not really.
[Why is the change risky/not risky?]: It's a one liner that checks a value is set before reading it (plus tightening a debug-only assert). If it doesn't fix it, we just back out bug 1334583 from beta.
[String changes made/needed]: None
Attachment #8868813 - Flags: approval-mozilla-beta?
Assignee

Comment 18

2 years ago
Attachment 8868813 [details] [diff] should apply cleanly on beta (I actually wrote the patch on top of beta and then rebased it when pushing to inbound).
Comment on attachment 8868813 [details] [diff] [review]
Patch

Fix a top browser crash from 54.0b8. Beta54+. Should be in 54 beta 10.
Attachment #8868813 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Brian Birtles (:birtles) from comment #17)
> [Is this code covered by automated tests?]: Yes
> [Has the fix been verified in Nightly?]: No. I'm still waiting for Nightly
> to pick it up but I'm filing this preemptively since it's nearly the weekend
> here.
> [Needs manual test from QE? If yes, steps to reproduce]: No.

Setting qe-verify- based on Brian's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.