Closed Bug 1321275 Opened 3 years ago Closed 2 years ago

[webvr] Crash in mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::gfx::PVRLayer::Transition

Categories

(Core :: WebVR, defect, P3, critical)

53 Branch
x86
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- disabled
firefox54 --- disabled
firefox55 --- fixed
firefox56 --- fixed

People

(Reporter: marcia, Assigned: kip)

References

Details

(Keywords: crash, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-e1611277-1097-466e-83d8-7d9472161130.
=============================================================

Seen while looking at crash stats: http://bit.ly/2gFAxfv. Started with Build 20161129171714. ni on kgilbert for ideas on a possible cause.
Flags: needinfo?(kgilbert)
Priority: -- → P3
Whiteboard: [gfx-noted]
To hit this call, the user must have been visiting a WebVR site.  The hardware is also typical of a high end VR computer, with 32GB of ram, an NVidia 1080, and an Oculus VR display.  The crash reports are showing that the GPU process has terminated abnormally, so this may be a driver or Oculus Compositor bug.  We have seen more stability in the 64-bit builds of Firefox which link against the 64-bit Oculus runtimes and recommend WebVR users to download the 64-bit version of Firefox.  In this instance, it appears to be a 32-bit instance of Firefox.
Flags: needinfo?(kgilbert)
The crash report shows that there were 36 instances, all on 32 bit Firefox, and only one 64 bit instance with the same crash.
Another common trend is that the crash appears to happen within the VRDisplay.exitPresent WebVR call.  Likely this is seen by users as a crash while exiting VR presentation.
Summary: Crash in mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::gfx::PVRLayer::Transition → [webvr] Crash in mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::gfx::PVRLayer::Transition
Component: Graphics → WebVR
Crash Signature: [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::gfx::PVRLayer::Transition] → [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::gfx::PVRLayer::Transition] [@ Abort | corrupted actor state | mozalloc_abort | NS_DebugBreak | mozilla::ipc::LogicError | mozilla::gfx::PVRLayer::Transition ]
There are 32 crashes across 21 different installations in the last 3 days.
I am thinking if the actor is destroyed at the parent side so we should override ActorDestroy() or  Recv__delete__() to do some check.
Comment on attachment 8864065 [details]
Bug 1321275 - Confirm VRLayerChild is not destroyed before sending destroy message to the parent side;

https://reviewboard.mozilla.org/r/135786/#review138910

LGTM, Thanks!
Attachment #8864065 - Flags: review?(kgilbert) → review+
I have confirmed that the actor of VRLayerChild will be destroyed when we exit the VR present mode. So, I think it should be a correct fix.
Assignee: nobody → dmu
Pushed by dmu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6f76ba14922c
Confirm VRLayerChild is not destroyed before sending destroy message to the parent side; r=kip
https://hg.mozilla.org/mozilla-central/rev/6f76ba14922c
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
This seems to not resolve the crash issue completely, we probably have to consider add some operation at  PVRLayerChild::Recv__delete__() instead of PVRLayerChild::ActorDestroy().
This is still crashing.
#13 topcrash in the Windows nightly of 20170511063838.
Flags: needinfo?(dmu)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1364856
Flags: needinfo?(dmu)
(In reply to Julian Seward [:jseward] from comment #12)
> This is still crashing.
> #13 topcrash in the Windows nightly of 20170511063838.

Let's try to fix it at Bug 1364856
The signature spiked the 2017-06-01 (59 crashes) and there are 54 crashes today.
Maybe it's related to the patch for bug 1362213.
:kip, an idea ?
Flags: needinfo?(kgilbert)
See Also: → 1370090
79 occurrences in Nightly 20170602030204, which makes it the #4 Windows topcrash.
I am reproduce it by follow the below steps:

1. Viewing a WebVR site like, https://webvr.info/samples/XX-vr-controllers.html, in the present mode.
2. Closing the SteamVR runtime. We can notice the WebVR page is suspended.
3. After press the exit VR button, the tab will crash, and the crash report is very similar as we have.

We still need to do some confirmation for Oculus Rift, but it is a reproducible case at least.
Depends on: 1371228
I can't reproduce this crash on Oculus Rift, but following the steps from Comment 16 is very easy for reproducing in HTC Vive. Let's fix the crash on Vive first.
I may have found the cause of this issue.

In VRManagerChild::DeallocPVRLayerChild, we delete the PVRLayerChild, rather than doing proper reference counting on the VRLayerChild object which has been deallocated but still referenced by VRDisplayPresentation::mLayers.  This happens when the GPU process is terminated.

When the user tries to stop the VR presentation or change the URL, PVRLayerChild::SendDestroy is called by VRDisplayPresentation::DestroyLayers.  This results in the assertion about "invalid actor state".

We can fix this by following the same pattern used by TextureClient:

-------------

layers::PTextureChild*
VRManagerChild::AllocPTextureChild(const SurfaceDescriptor&,
                                   const LayersBackend&,
                                   const TextureFlags&,
                                   const uint64_t&)
{
  return TextureClient::CreateIPDLActor();
}

bool
VRManagerChild::DeallocPTextureChild(PTextureChild* actor)
{
  return TextureClient::DestroyIPDLActor(actor);
}

PTextureChild*
TextureClient::CreateIPDLActor()
{
  TextureChild* c = new TextureChild();
  c->AddIPDLReference();
  return c;
}

// static
bool
TextureClient::DestroyIPDLActor(PTextureChild* actor)
{
  static_cast<TextureChild*>(actor)->ReleaseIPDLReference();
  return true;
}



  // AddIPDLReference and ReleaseIPDLReference are only to be called by CreateIPDLActor
  // and DestroyIPDLActor, respectively. We intentionally make them private to prevent misuse.
  // The purpose of these methods is to be aware of when the IPC system around this
  // actor goes down: mIPCOpen is then set to false.
  void AddIPDLReference() {
    MOZ_ASSERT(mIPCOpen == false);
    mIPCOpen = true;
    AddRef();
  }
  void ReleaseIPDLReference() {
    MOZ_ASSERT(mIPCOpen == false);
    Release();
  }


-------------

I'll take this bug and implement the fix.
Flags: needinfo?(kgilbert)
Assignee: dmu → kgilbert
Attachment #8885938 - Flags: review?(dmu)
I would recommend this patch for uplift to Beta, once it has been proven stable in Nightly.
Blocks: 1380495
No longer blocks: 1380495
Blocks: 1370090
Comment on attachment 8885938 [details]
Bug 1321275 - Fix reference counting of PVRLayerChild when GPU process has been terminated

https://reviewboard.mozilla.org/r/156724/#review161888

LGTM. Good job! kip.
Attachment #8885938 - Flags: review?(dmu) → review+
Unfortunately, the try run shows that there were some warnings and an assertion to fix before landing.  I've updated the patch and will re-run try to confirm the fix.
The try push looks clean now.
​
I have removed some unneeded variables causing warnings, corrected the initialized value of VRLayerChild::mIPCOpen to correct an assertion, and added a MOZ_COUNT_CTOR(VRLayerChild) to balance the MOZ_COUNT_DTOR(VRLayerChild).  This seems to have fixed the warnings and cleaned up the try build.

Could you please review the interdiff (2 revs) before I land?
Flags: needinfo?(dmu)
Comment on attachment 8885938 [details]
Bug 1321275 - Fix reference counting of PVRLayerChild when GPU process has been terminated

https://reviewboard.mozilla.org/r/156724/#review162694

::: gfx/vr/ipc/VRLayerChild.cpp:24
(Diff revisions 2 - 3)
>    : mCanvasElement(nullptr)
>    , mShSurfClient(nullptr)
>    , mFront(nullptr)
>    , mIPCOpen(false)
>  {
> +  MOZ_COUNT_CTOR(VRLayerChild);

It was removed at Bug 1321275 because the memory leak problem that is detected by our mochitests. If it doesn't have new leaks, it would be good to me.
Please take a look at what I mentioned at Comment 26. Thanks
Flags: needinfo?(dmu)
Pushed by kgilbert@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d97dc1dbc289
Fix reference counting of PVRLayerChild when GPU process has been terminated r=daoshengmu
https://hg.mozilla.org/mozilla-central/rev/d97dc1dbc289
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: mozilla55 → mozilla56
You need to log in before you can comment on or make changes to this bug.