Closed
Bug 1363107
Opened 8 years ago
Closed 8 years ago
Crash in mozilla::layers::MaybeTimeDuration::AssertSanity
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | + | fixed |
firefox55 | --- | fixed |
People
(Reporter: marcia, Assigned: birtles)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
3.07 KB,
patch
|
hiro
:
review+
gchang
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-85e0e25b-b561-4641-97d3-08b870170508.
=============================================================
Seen while looking at nightly crash stats - crashes started in 55 using 20170506030204:
Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0b255199db9d6a6f189b89b7906f99155bde3726&tochange=37a5b7f6f101df2eb292b1b6baaf1540c9920e20
Comment 1•8 years ago
|
||
this is now starting to show up on 54.0b8 as well, so probably something triggering this got uplifted. i'm seeing crashes already starting on nightly build 20170503030212. common landings from that range and in 54.0b8 would be from these two bugs:
* Bug 1334583 - WebAnimation crash: Assertion failure: !IsNull() (Cannot compute with a null value) [@mozilla::EnableIf]
* Bug 1355480 - Add telemetry for measuring the average load caused by timeouts
status-firefox53:
--- → unaffected
status-firefox54:
--- → affected
status-firefox-esr52:
--- → unaffected
Keywords: regression
OS: Windows 10 → Windows
Hardware: Unspecified → All
Comment 2•8 years ago
|
||
[Tracking Requested - why for this release]:
this is a late regression in this beta cycle and the top browser crash in early data from 54.0b8
tracking-firefox54:
--- → ?
Updated•8 years ago
|
Flags: needinfo?(bbirtles)
Version: 55 Branch → 54 Branch
Comment 4•8 years ago
|
||
Per comment #2, this is late regression, we need to address it ASAP.
Hi :farre,
Can you help check if this top crash is caused by bug 1355480?
Hi :bbirtles,
Can you help check if this is caused by bug 1334583?
Flags: needinfo?(afarre)
Assignee | ||
Comment 6•8 years ago
|
||
I'll look into this today. For what it's worth, it would be fine to back bug 1334583 out of beta if we don't find a fix soon (and may be safer to do that anyway).
Flags: needinfo?(bbirtles)
Assignee | ||
Comment 7•8 years ago
|
||
At a glance, I suspect the call to get_TimeDuration() in AnimationHelper::SampleAnimationForEachNode is failing because the TimeDuration is null. There's an assertion earlier in the function that checks the TimeDuration is non-null but it's debug-only.
I need to look into why we might end up getting here with a null start time (perhaps we failed to update the start time of a pending animation?). Given that bug 1334583 shouldn't really have changed the logic here, I wonder if this was an existing problem that has simply been exposed because MaybeTimeDuration uses MOZ_RELEASE_ASSERT when previously the only things that would have failed were debug-only MOZ_ASSERTs.
Assignee | ||
Comment 8•8 years ago
|
||
These crashes all seem to happen on Windows and in at least one case I see the following sort of errors in the log:
|[0][GFX1-]: GPU process disabled after 1 attempts (t=689.365) |[1][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[2][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[3][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[4][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[5]CP+[GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[6][GFX1-]: [D3D11] failed to get compositor device. (t=689.365) |[7][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=689.365) |[8][GFX1-]: Compositors might be mixed (4,1) (t=689.365) |[9][GFX1-]: [D3D11] failed to get compositor device. (t=689.609) |[10][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=689.609) |[11][GFX1-]: [D3D11] failed to get compositor device. (t=697.503) |[12][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=697.503) |[13][GFX1-]: [D3D11] failed to get compositor device. (t=698.379) |[14][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=698.379)
My guess is that in some cases we have some sort of early return that means we fail to call StartPendingAnimations and we end up with pending animations on the compositor. That's probably a bug that it ever happens, but I'd rather not try and fix that underlying bug here since I'd only be guessing about what happens. Instead, we already have assertions that will fail if that situation ever reproduces in a debug build so I'd rather wait until that happens before trying to fix the underlying bug.
Instead, in this bug I think we should just harden the code in AnimationHelper::SampleAnimationForEachNode to make sure even in release builds it checks that the MaybeTimeDuration is non-null before trying to read its TimeDuration value.
Assignee: nobody → bbirtles
Status: NEW → ASSIGNED
Assignee | ||
Comment 9•8 years ago
|
||
Assignee | ||
Comment 10•8 years ago
|
||
Attachment #8868813 -
Flags: review?(hikezoe)
Comment 11•8 years ago
|
||
Comment on attachment 8868813 [details] [diff] [review]
Patch
Thanks for the detailed commit message. It makes sense to me.
I also suspect there is a bug when a bad thing happens on the GPU process.
To clarify StartPendingAnimations() is failed to be called, how about adding an assertion 'MOZ_ASSERT(anim.isNotPlaying() || animation.startTime().type() == MaybeTimeDuration::TTimeDuration' after updating startTime() in StartPendingAnimations()?
Attachment #8868813 -
Flags: review?(hikezoe) → review+
Assignee | ||
Comment 12•8 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #11)
> Comment on attachment 8868813 [details] [diff] [review]
> Patch
>
> Thanks for the detailed commit message. It makes sense to me.
> I also suspect there is a bug when a bad thing happens on the GPU process.
>
> To clarify StartPendingAnimations() is failed to be called, how about adding
> an assertion 'MOZ_ASSERT(anim.isNotPlaying() || animation.startTime().type()
> == MaybeTimeDuration::TTimeDuration' after updating startTime() in
> StartPendingAnimations()?
I'm not sure I follow. As in:
// If the animation is play-pending, resolve the start time.
if (anim.startTime().type() == MaybeTimeDuration::Tnull_t &&
!anim.originTime().IsNull() &&
!anim.isNotPlaying()) {
TimeDuration readyTime = aReadyTime - anim.originTime();
anim.startTime() =
anim.playbackRate() == 0
? readyTime
: readyTime - anim.holdTime().MultDouble(1.0 /
anim.playbackRate());
updated = true;
}
MOZ_ASSERT(anim.isNotPlaying() ||
anim.startTime.type() == MaybeTimeDuration::TTimeDuration);
?
Comment 13•8 years ago
|
||
(In reply to Brian Birtles (:birtles) from comment #12)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #11)
> > Comment on attachment 8868813 [details] [diff] [review]
> > Patch
> >
> > Thanks for the detailed commit message. It makes sense to me.
> > I also suspect there is a bug when a bad thing happens on the GPU process.
> >
> > To clarify StartPendingAnimations() is failed to be called, how about adding
> > an assertion 'MOZ_ASSERT(anim.isNotPlaying() || animation.startTime().type()
> > == MaybeTimeDuration::TTimeDuration' after updating startTime() in
> > StartPendingAnimations()?
>
> I'm not sure I follow. As in:
>
> // If the animation is play-pending, resolve the start time.
> if (anim.startTime().type() == MaybeTimeDuration::Tnull_t &&
> !anim.originTime().IsNull() &&
> !anim.isNotPlaying()) {
> TimeDuration readyTime = aReadyTime - anim.originTime();
> anim.startTime() =
> anim.playbackRate() == 0
> ? readyTime
> : readyTime - anim.holdTime().MultDouble(1.0 /
> anim.playbackRate());
> updated = true;
> }
> MOZ_ASSERT(anim.isNotPlaying() ||
> anim.startTime.type() ==
> MaybeTimeDuration::TTimeDuration);
>
> ?
Yes, this is exactly what I meant. Redundant?
Assignee | ||
Comment 14•8 years ago
|
||
It seems a little redundant to me although it might become more interesting once we add new types of timelines (where originTime could possibly be null for a playing animation). For now, though, I'd like to keep this patch as simple as possible so there is less risk of uplifting it.
Comment 15•8 years ago
|
||
Pushed by bbirtles@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/515fbfec6ab5
Check if the startTime is set before using it in SampleAnimationForEachNode; r=hiro
Comment 16•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee | ||
Comment 17•8 years ago
|
||
Comment on attachment 8868813 [details] [diff] [review]
Patch
Approval Request Comment
[Feature/Bug causing the regression]: bug 1334583
[User impact if declined]: Top crasher on Windows
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: No. I'm still waiting for Nightly to pick it up but I'm filing this preemptively since it's nearly the weekend here.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: Not really.
[Why is the change risky/not risky?]: It's a one liner that checks a value is set before reading it (plus tightening a debug-only assert). If it doesn't fix it, we just back out bug 1334583 from beta.
[String changes made/needed]: None
Attachment #8868813 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 18•8 years ago
|
||
Attachment 8868813 [details] [diff] should apply cleanly on beta (I actually wrote the patch on top of beta and then rebased it when pushing to inbound).
Comment 19•8 years ago
|
||
Comment on attachment 8868813 [details] [diff] [review]
Patch
Fix a top browser crash from 54.0b8. Beta54+. Should be in 54 beta 10.
Attachment #8868813 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 20•8 years ago
|
||
bugherder uplift |
Comment 21•8 years ago
|
||
(In reply to Brian Birtles (:birtles) from comment #17)
> [Is this code covered by automated tests?]: Yes
> [Has the fix been verified in Nightly?]: No. I'm still waiting for Nightly
> to pick it up but I'm filing this preemptively since it's nearly the weekend
> here.
> [Needs manual test from QE? If yes, steps to reproduce]: No.
Setting qe-verify- based on Brian's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in
before you can comment on or make changes to this bug.
Description
•