Closed Bug 1363107 Opened 8 years ago Closed 8 years ago

Crash in mozilla::layers::MaybeTimeDuration::AssertSanity

Categories

(Core :: Graphics: Layers, defect)

54 Branch
All
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 + fixed
firefox55 --- fixed

People

(Reporter: marcia, Assigned: birtles)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-85e0e25b-b561-4641-97d3-08b870170508. ============================================================= Seen while looking at nightly crash stats - crashes started in 55 using 20170506030204: Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0b255199db9d6a6f189b89b7906f99155bde3726&tochange=37a5b7f6f101df2eb292b1b6baaf1540c9920e20
this is now starting to show up on 54.0b8 as well, so probably something triggering this got uplifted. i'm seeing crashes already starting on nightly build 20170503030212. common landings from that range and in 54.0b8 would be from these two bugs: * Bug 1334583 - WebAnimation crash: Assertion failure: !IsNull() (Cannot compute with a null value) [@mozilla::EnableIf] * Bug 1355480 - Add telemetry for measuring the average load caused by timeouts
Keywords: regression
OS: Windows 10 → Windows
Hardware: Unspecified → All
[Tracking Requested - why for this release]: this is a late regression in this beta cycle and the top browser crash in early data from 54.0b8
Blocks: 1334583
Flags: needinfo?(bbirtles)
Version: 55 Branch → 54 Branch
Tracking as top crash in 54.0b8.
Per comment #2, this is late regression, we need to address it ASAP. Hi :farre, Can you help check if this top crash is caused by bug 1355480? Hi :bbirtles, Can you help check if this is caused by bug 1334583?
Flags: needinfo?(afarre)
Bug 1334583 seems more likely.
Flags: needinfo?(afarre)
I'll look into this today. For what it's worth, it would be fine to back bug 1334583 out of beta if we don't find a fix soon (and may be safer to do that anyway).
Flags: needinfo?(bbirtles)
At a glance, I suspect the call to get_TimeDuration() in AnimationHelper::SampleAnimationForEachNode is failing because the TimeDuration is null. There's an assertion earlier in the function that checks the TimeDuration is non-null but it's debug-only. I need to look into why we might end up getting here with a null start time (perhaps we failed to update the start time of a pending animation?). Given that bug 1334583 shouldn't really have changed the logic here, I wonder if this was an existing problem that has simply been exposed because MaybeTimeDuration uses MOZ_RELEASE_ASSERT when previously the only things that would have failed were debug-only MOZ_ASSERTs.
These crashes all seem to happen on Windows and in at least one case I see the following sort of errors in the log: |[0][GFX1-]: GPU process disabled after 1 attempts (t=689.365) |[1][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[2][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[3][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[4][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[5]CP+[GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=689.365) |[6][GFX1-]: [D3D11] failed to get compositor device. (t=689.365) |[7][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=689.365) |[8][GFX1-]: Compositors might be mixed (4,1) (t=689.365) |[9][GFX1-]: [D3D11] failed to get compositor device. (t=689.609) |[10][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=689.609) |[11][GFX1-]: [D3D11] failed to get compositor device. (t=697.503) |[12][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=697.503) |[13][GFX1-]: [D3D11] failed to get compositor device. (t=698.379) |[14][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=698.379) My guess is that in some cases we have some sort of early return that means we fail to call StartPendingAnimations and we end up with pending animations on the compositor. That's probably a bug that it ever happens, but I'd rather not try and fix that underlying bug here since I'd only be guessing about what happens. Instead, we already have assertions that will fail if that situation ever reproduces in a debug build so I'd rather wait until that happens before trying to fix the underlying bug. Instead, in this bug I think we should just harden the code in AnimationHelper::SampleAnimationForEachNode to make sure even in release builds it checks that the MaybeTimeDuration is non-null before trying to read its TimeDuration value.
Assignee: nobody → bbirtles
Status: NEW → ASSIGNED
Attached patch PatchSplinter Review
Attachment #8868813 - Flags: review?(hikezoe)
Comment on attachment 8868813 [details] [diff] [review] Patch Thanks for the detailed commit message. It makes sense to me. I also suspect there is a bug when a bad thing happens on the GPU process. To clarify StartPendingAnimations() is failed to be called, how about adding an assertion 'MOZ_ASSERT(anim.isNotPlaying() || animation.startTime().type() == MaybeTimeDuration::TTimeDuration' after updating startTime() in StartPendingAnimations()?
Attachment #8868813 - Flags: review?(hikezoe) → review+
(In reply to Hiroyuki Ikezoe (:hiro) from comment #11) > Comment on attachment 8868813 [details] [diff] [review] > Patch > > Thanks for the detailed commit message. It makes sense to me. > I also suspect there is a bug when a bad thing happens on the GPU process. > > To clarify StartPendingAnimations() is failed to be called, how about adding > an assertion 'MOZ_ASSERT(anim.isNotPlaying() || animation.startTime().type() > == MaybeTimeDuration::TTimeDuration' after updating startTime() in > StartPendingAnimations()? I'm not sure I follow. As in: // If the animation is play-pending, resolve the start time. if (anim.startTime().type() == MaybeTimeDuration::Tnull_t && !anim.originTime().IsNull() && !anim.isNotPlaying()) { TimeDuration readyTime = aReadyTime - anim.originTime(); anim.startTime() = anim.playbackRate() == 0 ? readyTime : readyTime - anim.holdTime().MultDouble(1.0 / anim.playbackRate()); updated = true; } MOZ_ASSERT(anim.isNotPlaying() || anim.startTime.type() == MaybeTimeDuration::TTimeDuration); ?
(In reply to Brian Birtles (:birtles) from comment #12) > (In reply to Hiroyuki Ikezoe (:hiro) from comment #11) > > Comment on attachment 8868813 [details] [diff] [review] > > Patch > > > > Thanks for the detailed commit message. It makes sense to me. > > I also suspect there is a bug when a bad thing happens on the GPU process. > > > > To clarify StartPendingAnimations() is failed to be called, how about adding > > an assertion 'MOZ_ASSERT(anim.isNotPlaying() || animation.startTime().type() > > == MaybeTimeDuration::TTimeDuration' after updating startTime() in > > StartPendingAnimations()? > > I'm not sure I follow. As in: > > // If the animation is play-pending, resolve the start time. > if (anim.startTime().type() == MaybeTimeDuration::Tnull_t && > !anim.originTime().IsNull() && > !anim.isNotPlaying()) { > TimeDuration readyTime = aReadyTime - anim.originTime(); > anim.startTime() = > anim.playbackRate() == 0 > ? readyTime > : readyTime - anim.holdTime().MultDouble(1.0 / > anim.playbackRate()); > updated = true; > } > MOZ_ASSERT(anim.isNotPlaying() || > anim.startTime.type() == > MaybeTimeDuration::TTimeDuration); > > ? Yes, this is exactly what I meant. Redundant?
It seems a little redundant to me although it might become more interesting once we add new types of timelines (where originTime could possibly be null for a playing animation). For now, though, I'd like to keep this patch as simple as possible so there is less risk of uplifting it.
Pushed by bbirtles@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/515fbfec6ab5 Check if the startTime is set before using it in SampleAnimationForEachNode; r=hiro
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8868813 [details] [diff] [review] Patch Approval Request Comment [Feature/Bug causing the regression]: bug 1334583 [User impact if declined]: Top crasher on Windows [Is this code covered by automated tests?]: Yes [Has the fix been verified in Nightly?]: No. I'm still waiting for Nightly to pick it up but I'm filing this preemptively since it's nearly the weekend here. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: Not really. [Why is the change risky/not risky?]: It's a one liner that checks a value is set before reading it (plus tightening a debug-only assert). If it doesn't fix it, we just back out bug 1334583 from beta. [String changes made/needed]: None
Attachment #8868813 - Flags: approval-mozilla-beta?
Attachment 8868813 [details] [diff] should apply cleanly on beta (I actually wrote the patch on top of beta and then rebased it when pushing to inbound).
Comment on attachment 8868813 [details] [diff] [review] Patch Fix a top browser crash from 54.0b8. Beta54+. Should be in 54 beta 10.
Attachment #8868813 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Brian Birtles (:birtles) from comment #17) > [Is this code covered by automated tests?]: Yes > [Has the fix been verified in Nightly?]: No. I'm still waiting for Nightly > to pick it up but I'm filing this preemptively since it's nearly the weekend > here. > [Needs manual test from QE? If yes, steps to reproduce]: No. Setting qe-verify- based on Brian's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: