Crash in [@ logging::LogMessage::~LogMessage] in sandbox::PolicyBase::AddHandleToShare
Categories
(Core :: Audio/Video: Playback, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox67 | --- | unaffected |
firefox67.0.1 | --- | unaffected |
firefox68 | + | fixed |
firefox69 | + | fixed |
People
(Reporter: philipp, Assigned: mjf)
References
(Regression)
Details
(Keywords: crash, regression, Whiteboard: keep-open)
Crash Data
Attachments
(1 file)
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
This bug is for crash report bp-69db24bc-3735-4ad3-b150-9d38f0190528.
Top 10 frames of crashing thread:
0 firefox.exe logging::LogMessage::~LogMessage security/sandbox/chromium-shim/base/logging.cpp:120
1 firefox.exe sandbox::ResultCode sandbox::InterceptionManager::PatchNtdll security/sandbox/chromium/sandbox/win/src/interception.cc:424
2 firefox.exe sandbox::InterceptionManager::InitializeInterceptions security/sandbox/chromium/sandbox/win/src/interception.cc:150
3 firefox.exe sandbox::ResultCode sandbox::PolicyBase::SetupAllInterceptions security/sandbox/chromium/sandbox/win/src/sandbox_policy_base.cc:676
4 firefox.exe sandbox::PolicyBase::AddTarget security/sandbox/chromium/sandbox/win/src/sandbox_policy_base.cc:543
5 firefox.exe sandbox::ResultCode sandbox::BrokerServicesBase::SpawnTarget security/sandbox/chromium/sandbox/win/src/broker_services.cc:474
6 xul.dll mozilla::SandboxBroker::LaunchApp security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp:244
7 xul.dll bool mozilla::ipc::GeckoChildProcessHost::PerformAsyncLaunch ipc/glue/GeckoChildProcessHost.cpp:1216
8 xul.dll static void mozilla::ipc::GeckoChildProcessHost::RunPerformAsyncLaunch::<unnamed-tag>::operator ipc/glue/GeckoChildProcessHost.cpp:601
9 xul.dll nsresult mozilla::detail::RunnableFunction<`lambda at z:/task_1558609157/build/src/ipc/glue/GeckoChildProcessHost.cpp:600:24'>::Run xpcom/threads/nsThreadUtils.h:562
this windows crash signature is spiking up during the 68 cycle - currently it's accounting for 0.6% of browser crashes in 68.0b. the first affected nightly build was 20190410100020.
Comment 1•5 years ago
|
||
Sorry, was on PTO.
mjf: looks like this is due to the handles being closed before the process is launched and they are inherited.
Specifically here:
https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/media/ipc/RDDProcessHost.cpp#59
The launch is done async below, so my guess is we have a race at some point, possibly just at timeout (although I'd assume that would be a fairly rare thing).
jld: you appear to have code to prevent a similar thing on the content process launch, could you help mjf on this?
https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/ipc/ContentParent.cpp#2128-2132
I wonder if we should duplicate the handle in GeckoChildProcessHost::AddHandleToShare, so GeckoChildProcessHost has its own copy and can close it when it knows the process has launched or the launch has failed.
A similar thing has been added in Fx69 for many of the other process types, so they may well hit the same problem.
Comment 2•5 years ago
|
||
Just checked and the first crashes like this on Fx68 Nightly were build id 20190410100020, which matches when bug 1539029 landed.
Updated•5 years ago
|
Comment 3•5 years ago
|
||
[Tracking Requested - why for this release]:
AV1 + RDD process startup crash issue.
Assignee | ||
Comment 4•5 years ago
•
|
||
(In reply to Bob Owen (:bobowen) from comment #1)
Sorry, was on PTO.
mjf: looks like this is due to the handles being closed before the process is launched and they are inherited.
Specifically here:
https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/media/ipc/RDDProcessHost.cpp#59The launch is done async below, so my guess is we have a race at some point, possibly just at timeout (although I'd assume that would be a fairly rare thing).
jld: you appear to have code to prevent a similar thing on the content process launch, could you help mjf on this?
https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/ipc/ContentParent.cpp#2128-2132I wonder if we should duplicate the handle in GeckoChildProcessHost::AddHandleToShare, so GeckoChildProcessHost has its own copy and can close it when it knows the process has launched or the launch has failed.
A similar thing has been added in Fx69 for many of the other process types, so they may well hit the same problem.
bobowen: you mention handles being closed, but I'm not closing them specifically anywhere. The handle is created here[1] and added to the RDDProcessHost (a GeckoChildProcessHost) immediately afterwards. The shared mem pointed to by that handle belongs to mPrefSerializer, correct? I thought that was the reason for keeping the SharedPreferenceSerializer around until after the process has launched, but possibly something else is going on that is destroying the shmem earlier than I expect. SharedPreferenceSerializer is only destroyed here[2] if AsyncOpen returns an error, or here[3] in InitAfterConnect.
[1] https://searchfox.org/mozilla-central/source/ipc/glue/ProcessUtils_common.cpp#67
[2] https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/media/ipc/RDDProcessHost.cpp#70
[3] https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/media/ipc/RDDProcessHost.cpp#150
Comment 5•5 years ago
|
||
(In reply to Michael Froman [:mjf] from comment #4)
(In reply to Bob Owen (:bobowen) from comment #1)
...
bobowen: you mention handles being closed, but I'm not closing them specifically anywhere. The handle is created here[1] and added to the RDDProcessHost (a GeckoChildProcessHost) immediately afterwards. The shared mem pointed to by that handle belongs to mPrefSerializer, correct? I thought that was the reason for keeping the SharedPreferenceSerializer around until after the process has launched, but possibly something else is going on that is destroying the shmem earlier than I expect. SharedPreferenceSerializer is only destroyed here[2] if AsyncOpen returns an error, or here[3] in InitAfterConnect.[1] https://searchfox.org/mozilla-central/source/ipc/glue/ProcessUtils_common.cpp#67
[2] https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/media/ipc/RDDProcessHost.cpp#70
[3] https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/media/ipc/RDDProcessHost.cpp#150
Yeah, I couldn't obviously see how this was happening other than [3], where the timeout could race with the launch of the process.
Updated•5 years ago
|
Comment 6•5 years ago
|
||
Bob can I ask to be the owner for now, until we have an idea what exactly is causing this?
Comment 7•5 years ago
|
||
I agree that the timeout case in WaitForLaunch
→ InitAfterConnect
is the only thing that seems to be capable of causing it. I'd suggest either not clearing mPrefSerializer
in the error case, or maybe moving it into a callback added to the WhenProcessHandleReady()
promise.
Comment 8•5 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #6)
Bob can I ask to be the owner for now, until we have an idea what exactly is causing this?
I think we're pretty clear on the issue here and that it was caused by bug 1539029.
There certainly appears to be a race in the timeout case, it looks like that's the only one.
jld has a suggestion to fix this.
I think longer term we should possibly look at passing the ownership of the HANDLE into the launching code or duplicating it.
However, I don't have time to look at this at the moment.
Assignee | ||
Comment 9•5 years ago
|
||
I'll land a patch that avoids clearing mPrefSerializer in the error case to see if it helps.
Assignee | ||
Comment 10•5 years ago
|
||
Comment 11•5 years ago
|
||
Pushed by mfroman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b2cb7ee93eef avoid a potential timeout race on Windows RDD launch. r=jld
Comment 12•5 years ago
|
||
bugherder |
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 13•5 years ago
|
||
I want to leave this open until we see a definite change in the crash reports since this was a proposed fix from :jld.
Comment 14•5 years ago
|
||
I imagine this will need to be uplifted before we can be sure it fixes the problem.
Assignee | ||
Comment 15•5 years ago
|
||
We should see results in Nightly in the next day or so. I'll wait to see those results before I request uplift.
Comment 17•5 years ago
|
||
Err.. not true. There are 3 crashes (from a single install) today.
Comment 18•5 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #17)
Err.. not true. There are 3 crashes (from a single install) today.
Unfortunately, the errors that are terminal in ~LogMessage all get picked up by that search. You have to look in the proto signature.
Those 3 are for a different crash not the AddHandleToShare one.
Assignee | ||
Comment 19•5 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #18)
(In reply to Julien Cristau [:jcristau] from comment #17)
Err.. not true. There are 3 crashes (from a single install) today.
Unfortunately, the errors that are terminal in ~LogMessage all get picked up by that search. You have to look in the proto signature.
Those 3 are for a different crash not the AddHandleToShare one.
Bob, how do you see the proto signature in the crash reports?
Assignee | ||
Comment 20•5 years ago
|
||
Or are you talking about looking at the previous call in the back trace?
Assignee | ||
Comment 21•5 years ago
|
||
Ah, never mind. I see the proto signature in the search terms:
https://crash-stats.mozilla.org/signature/?version=69.0a1&proto_signature=AddHandleToShare&signature=logging%3A%3ALogMessage%3A%3A~LogMessage&date=%3E%3D2019-06-11T21%3A08%3A00.000Z&date=%3C2019-06-18T21%3A08%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#reports
Assignee | ||
Comment 22•5 years ago
|
||
We're still holding steady with no crashes showing with this signature starting with the 20190612 build. I'm going to mark this fixed and request uplift.
Assignee | ||
Comment 23•5 years ago
|
||
Comment on attachment 9070622 [details]
Bug 1555076 - avoid a potential timeout race on Windows RDD launch. r?jld!
Beta/Release Uplift Approval Request
- User impact if declined: Without this fix, can crash RDD during timeout scenarios.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This is a very small change that only applies during times when the process launch timeout has occurred.
- String changes made/needed: none
Comment 24•5 years ago
|
||
Comment on attachment 9070622 [details]
Bug 1555076 - avoid a potential timeout race on Windows RDD launch. r?jld!
approved for 68.0b13
Comment 25•5 years ago
|
||
bugherder uplift |
Updated•3 years ago
|
Description
•