Closed Bug 1564127 Opened 1 year ago Closed 3 months ago

Crash in [@ mozilla::ipc::IToplevelProtocol::OtherPid]

Categories

(Core :: WebVR, defect, P2, critical)

68 Branch
x86_64
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr68 --- fixed
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fixed
firefox72 --- fixed

People

(Reporter: worcester12345, Assigned: daoshengmu)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-6717a756-1682-4568-8095-f6a340190708.

Top 10 frames of crashing thread:

0 xul.dll mozilla::ipc::IToplevelProtocol::OtherPid ipc/glue/ProtocolUtils.cpp:586
1 xul.dll void mozilla::gfx::VRProcessParent::InitAfterConnect gfx/vr/ipc/VRProcessParent.cpp:156
2 xul.dll nsresult mozilla::ipc::TaskFactory<mozilla::RDDProcessHost>::TaskWrapper<mozilla::ipc::TaskFactory<mozilla::RDDProcessHost>::RunnableMethod<void  ipc/glue/TaskFactory.h:37
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1175
4 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
5 xul.dll nsThread::Shutdown xpcom/threads/nsThread.cpp:881
6 xul.dll nsresult mozilla::LazyIdleThread::ShutdownThread xpcom/threads/LazyIdleThread.cpp:276
7 xul.dll nsresult mozilla::LazyIdleThread::Notify xpcom/threads/LazyIdleThread.cpp:516
8 xul.dll nsresult nsTimerEvent::Run xpcom/threads/TimerThread.cpp:260
9 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1175

First Firefox crash on this computer since December!
Happened right after updating to:
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0 ID:20190705220548

Tabs in this window:133
Tabs in all windows:186

Hardware: Unspecified → x86_64
Component: General → WebVR
Product: Firefox → Core

IIRC, in Win 7, we don't have GPU process. So, we should not launch VR process too. It looks like the gpuChild is a nullptr [1] when running in Win 7. That makes sense to me because it has no GPU process there. However, our VR process only can be asked to launch by GPU process, so it should not happen. If we look into the callstack:

0 xul.dll mozilla::ipc::IToplevelProtocol::OtherPid ipc/glue/ProtocolUtils.cpp:586
1 xul.dll void mozilla::gfx::VRProcessParent::InitAfterConnect gfx/vr/ipc/VRProcessParent.cpp:156
2 xul.dll nsresult mozilla::ipc::TaskFactory<mozilla::RDDProcessHost>::TaskWrapper<mozilla::ipc::TaskFactory<mozilla::RDDProcessHost>::RunnableMethod<void ipc/glue/TaskFactory.h:37

It said the task in RDDProcessHost was asking to init VR process parent. That should not happen. I am wondering if it is our crash report gives us wrong information or there is something wrong in RDDProcessHost.

[1] https://searchfox.org/mozilla-central/rev/da3f3eaaacb6fb344fd21ac29ace2da0e33f12d3/gfx/vr/ipc/VRProcessParent.cpp#161

Updating affected versions based on crash-stats.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: crash

According to the version no. 7061 that is Win 7 SP1. It seems like GPU process is available there [1], so the questions will be why it only happens on Win 7, is it possible we just needs to check if gpuChild is a nullptr. Also, and why RDDProcessHost calls our VR process's callback [2].

[1] https://dxr.mozilla.org/mozilla-central/rev/5b91c8869f4275d2a2857114cf12fa7b7bf4190e/gfx/thebes/gfxWindowsPlatform.cpp#1647-1660
[2] https://searchfox.org/mozilla-central/rev/da3f3eaaacb6fb344fd21ac29ace2da0e33f12d3/gfx/vr/ipc/VRProcessParent.cpp#205

Priority: -- → P2
See Also: → 1565000

According to the report after FF 69 [1], the RDDProcessHost doesn't show up more. I would think we need to handle the situation when GPU process is shutdown but VR process is still trying to access GPU channels.

[1] https://crash-stats.mozilla.org/report/index/9fd7da2e-b7f2-4c00-a167-327b20190710

For the most reports are showing [0][GFX1-]: Killing GPU process due to IPC reply timeout, I doubt GPU is killed by someone but we were still trying to create a VR process. If so, we need to fallback to non-GPU process mode and create a VR service.

(In reply to Jan Andre Ikenmeyer [:darkspirit] and Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧

See Also: → 1564976
See Also: 1564976 →

1564976 is linux. This one is Windows 7.

I can't reproduce this crash in my local. I need more helpful information. :jld, may I know when the IToplevelProtocol::OtherPid() will be possible to be ACCESS_VIOLATION_READ? I saw we have the same situation at GPUChild[1] and VRProcessParent[2].

[1] https://crash-stats.mozilla.org/report/index/9fd7da2e-b7f2-4c00-a167-327b20190710
[2] https://crash-stats.mozilla.org/report/index/1be28cf7-4b17-4041-913e-65ce10190915

Flags: needinfo?(jld)

Note the address in both crashes: 0x38, which is in roughly the right range to be a struct field offset, and in fact is the offset of mOtherPid.

This means that IToplevelProtocol::OtherPid is being called on a null this pointer:

Flags: needinfo?(jld)
Assignee: nobody → dmu

I have checked

0x20 is address of &mVRChild->mOtherPid and &gpuChild->mOtherPid at x86 when the mChild is a 0x0.
0x38 is address of &mVRChild->mOtherPid and &gpuChild->mOtherPid at amd64 when the mChild is a 0x0.

:jld. Please help review. Thanks.

Flags: needinfo?(jld)
Pushed by dmu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f6b52e18b521
Check if GPU/VR process has connected with the parent process. r=kip

(In reply to Daosheng Mu[:daoshengmu] from comment #13)

:jld. Please help review. Thanks.

We wanna land it shortly. please ni? me if I need to change my mind. Thanks.

Flags: needinfo?(jld)
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72

Please nominate this for Beta approval when you get a chance. Looks like we probably won't get much data on whether this fix helps with the crash until it hits release.

Comment on attachment 9094061 [details]
Bug 1564127 - Check if GPU/VR process has connected with the parent process.

Beta/Release Uplift Approval Request

  • User impact if declined: The Crash in [@ mozilla::ipc::IToplevelProtocol::OtherPid] crash will still happen when it is failed to get VR child IPC.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It tries to avoid crashes and will not happen other regressions.
  • String changes made/needed:
Flags: needinfo?(dmu)
Attachment #9094061 - Flags: approval-mozilla-beta?

Comment on attachment 9094061 [details]
Bug 1564127 - Check if GPU/VR process has connected with the parent process.

Crash fix, uplift approved for 71 beta 6, thanks.

Attachment #9094061 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

I see >100 crash reports for this signature on ESR68 - is this something we should consider fixing there too? It'll need a rebased patch if yes.

Flags: needinfo?(dmu)

Comment on attachment 9094061 [details]
Bug 1564127 - Check if GPU/VR process has connected with the parent process.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined: The Crash in [@ mozilla::ipc::IToplevelProtocol::OtherPid] crash will still happen when it is failed to get VR child IPC.
  • Fix Landed on Version: ESR 68
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It tries to avoid crashes and will not happen other regressions.
  • String or UUID changes made by this patch:
Flags: needinfo?(dmu)
Attachment #9094061 - Flags: approval-mozilla-esr68?

Comment on attachment 9094061 [details]
Bug 1564127 - Check if GPU/VR process has connected with the parent process.

Simple crash fix, approved for 68.3esr.

Attachment #9094061 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
You need to log in before you can comment on or make changes to this bug.