crash in mozilla::SandboxBroker::SandboxBroker

RESOLVED FIXED in Firefox 46

Status

()

--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mats, Assigned: bobowen)

Tracking

({crash})

unspecified
mozilla48
x86
Windows NT
crash
Points:
---

Firefox Tracking Flags

(firefox46+ fixed, firefox47+ fixed, firefox48+ fixed)

Details

(Whiteboard: [sbwc1], crash signature)

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-346eb15b-b6b1-43cf-a296-339242160312.
=============================================================

This signature has relatively high volume in Firefox 46b1 crash data -
it's currently at #35 in the 46b1 Top Crashers List.  The number of
reported crashes is about an order of magnitude higher than in v45
which seems worrying.

Product	Version	Count	Percentage
Firefox	46.0b1	152	62.3%
Firefox	44.0.2	34	13.9%
Firefox	45.0b99	11	4.5%
Firefox	45.0	10	4.1%
Firefox	43.0b1	3	1.2%
Firefox	38.7.0esr	2	0.8%
[... long tail ...]

Operating System	Count	Percentage
Windows 7	129	86.0%
Windows 8.1	10	6.7%
Windows 10	7	4.7%
Windows 8	2	1.3%
Windows XP	2	1.3%

Architecture	Count	Percentage
x86	150	100.0%


http://hg.mozilla.org/releases/mozilla-beta/annotate/fb3494d06dfb/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp#l34

Stack:
mozilla::SandboxBroker::SandboxBroker()
mozilla::ipc::GeckoChildProcessHost::GeckoChildProcessHost(GeckoProcessType, base::ChildPrivileges)
mozilla::gmp::GMPProcessParent::GMPProcessParent(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
mozilla::gmp::GMPParent::LoadProcess()
mozilla::gmp::GMPParent::EnsureProcessLoaded()
mozilla::gmp::GMPParent::GetGMPContentParent(mozilla::UniquePtr<mozilla::gmp::GetGMPContentParentCallback, mozilla::DefaultDelete<mozilla::gmp::GetGMPContentParentCallback> >&&)
mozilla::gmp::GeckoMediaPluginServiceParent::GetContentParentFrom(nsACString_internal const&, nsCString const&, nsTArray<nsCString> const&, mozilla::UniquePtr<mozilla::gmp::GetGMPContentParentCallback, mozilla::DefaultDelete<mozilla::gmp::GetGMPContentParentCallback> >&&)
mozilla::gmp::GeckoMediaPluginService::GetGMPVideoDecoder(nsTArray<nsCString>*, nsACString_internal const&, mozilla::UniquePtr<GMPVideoGetterCallback<GMPVideoDecoderProxy>, mozilla::DefaultDelete<GMPVideoGetterCallback<GMPVideoDecoderProxy> > >&&)
mozilla::GMPVideoDecoder::Init()
mozilla::MediaDataDecoderProxy::InternalInit()
mozilla::detail::ProxyRunnable<mozilla::MozPromise<mozilla::TrackInfo::TrackType, mozilla::MediaDataDecoder::DecoderFailureReason, 1>, mozilla::MediaDataDecoderProxy>::Run()
nss3.dll@0x28bff
NS_ProcessNextEvent(nsIThread*, bool)
mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)
MessageLoop::RunHandler()

Updated

3 years ago
Whiteboard: [sb?]
(Assignee)

Updated

3 years ago
Flags: needinfo?(bobowen.code)
(Assignee)

Comment 1

3 years ago
Created attachment 8731668 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Assuming that this is some sort of threading issue this should fix it.

If not, then this will prevent it from crashing while we're not sandboxing content.
I didn't bother checking the env var or sandbox level, so as to not complicate things.

I put the initialisation with the existing bits just so they are not scattered around.
As I've mentioned before I think SandboxBroker should acutally live in xul and then we could put the other initialisation into it as well.

Presumably this would make adding it into xpcshell easier.

This could be a cross OS class I think, with per OS implementation.
Linux already has one that lives in xul.
Attachment #8731668 - Flags: review?(aklotz)
(Assignee)

Updated

3 years ago
Assignee: nobody → bobowen.code
Status: NEW → ASSIGNED
(Assignee)

Updated

3 years ago
Flags: needinfo?(bobowen.code)
Whiteboard: [sb?] → [sbwc1]
(Assignee)

Updated

3 years ago
Duplicate of this bug: 1027916
(Assignee)

Comment 3

3 years ago
Looks like I need this in xpcshell as well ... :-)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=71860cfc2d71&selectedJob=18217457
(Assignee)

Comment 4

3 years ago
Created attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

This is looking more promising:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d2e9713e18e0

Not sure about the placement of the Initialize calls and if someone else needs to review as well.

See comment 1 for other notes.
Attachment #8731704 - Flags: review?(aklotz)
(Assignee)

Updated

3 years ago
Attachment #8731668 - Attachment is obsolete: true
Attachment #8731668 - Flags: review?(aklotz)
Comment on attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Review of attachment 8731704 [details] [diff] [review]:
-----------------------------------------------------------------

OK, but conditional on asking an xpcshell peer to review the XPCShellImpl.cpp part.
Attachment #8731704 - Flags: review?(aklotz) → review+
(Assignee)

Comment 6

3 years ago
Comment on attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Thanks Aaron.

Bobby - would you be able to OK this change to XPCShellImpl.cpp?
Attachment #8731704 - Flags: review?(bobbyholley)
Comment on attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Review of attachment 8731704 [details] [diff] [review]:
-----------------------------------------------------------------

rs=me on initializing the sandbox.
Attachment #8731704 - Flags: review?(bobbyholley) → review+
backed out for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=24311724&repo=mozilla-inbound
Flags: needinfo?(bobowen.code)
(Assignee)

Comment 11

3 years ago
OK, I've had to add in the initialisation to the gtest code in nsAppRunner.cpp as well.
Try run with this and running all tests (I think):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e54e320fb042

I've also made it set the mPolicy to nullptr if there is no sBrokerService, as on inbound it was crashing in SetSecurityLevelForGMPlugin for the gtests because mPolicy was uninitialised and not null.
Flags: needinfo?(bobowen.code)

Comment 13

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/f2582471724f
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox48: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
(Assignee)

Comment 14

3 years ago
Comment on attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Sheriff note: patch that landed was slightly different. Beta doesn't quite apply cleanly. Minor context change, I'll upload a patch.

Might want to wait until this has spent a couple of days on Nightly, but requesting now as I'm off on PTO.

Approval Request Comment
[Feature/regressing bug #]:
Possible threading issue has existed since chromium sandbox code landed.

[User impact if declined]:
Crash if issue occurs.
More instances seemed to be occurring on Beta, possibly due to e10s testing.

[Describe test coverage new/current, TreeHerder]:
GMP, NPAPI 64-bit and e10s test the various launching of sandboxed processes.
The crash can occur even without the sandbox being enabled.

[Risks and why]:
Low to medium: changes are across a few files, but all are fairly simple.

[String/UUID change made/needed]:
None
Attachment #8731704 - Flags: approval-mozilla-beta?
Attachment #8731704 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 15

3 years ago
Created attachment 8734153 [details] [diff] [review]
Beta patch - Initialize Windows sandbox BrokerServices before any child processes are created
Comment on attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Looks like this patch is for aurora. This may help with crashes!
Attachment #8731704 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8734153 [details] [diff] [review]
Beta patch - Initialize Windows sandbox BrokerServices before any child processes are created

[Triage Comment]

Might help with many sorts of crashes. Let's try it on beta, this may land for beta 5 if all goes well, if not, then beta 6.
Attachment #8734153 - Flags: approval-mozilla-beta+

Updated

3 years ago
Depends on: 1260115

Comment 20

3 years ago
This may have caused bug 1260115 on aurora and possibly beta. We should back it out and potentially respin any beta build that includes it.
Flags: needinfo?(lhenry)
Wes can you back this out of aurora and beta? Thanks
Flags: needinfo?(lhenry) → needinfo?(wkocher)
tracking-firefox46: --- → +
Attachment #8734153 - Flags: approval-mozilla-beta+ → approval-mozilla-beta?

Updated

3 years ago
Flags: needinfo?(wkocher)
Switching flags to affected for 46 and 47 since we backed out the patch on those branches. 
Bobby can you take a look or suggest someone who might be able to help here?
status-firefox46: fixed → affected
status-firefox47: fixed → affected
Flags: needinfo?(bobbyholley)
I just rubber-stamped a small change to XPConnect here - aklotz reviewed the meat of the change.
Flags: needinfo?(bobbyholley)

Updated

3 years ago
Flags: needinfo?(aklotz)

Comment 26

3 years ago
bowen will be back this week.
OK, we can wait for bowen, thanks

Updated

3 years ago
Flags: needinfo?(aklotz)
Tracking for 47+
tracking-firefox47: --- → +
tracking-firefox48: --- → +
(Assignee)

Comment 29

3 years ago
OK, so to prevent these crashes (bug 1261261 and bug 1260115) we'd need to add null checks into these other methods.

What is slightly more worrying is that for these problems to occur either Initialize isn't being called soon enough or it's failing.
I'll see if I can reproduce with the patch reapplied locally for Aurora, given that it became the top crash.
We could still take a patch for beta 9 later this week. This is the #35 topcrash in 46.0b5. For now I'm taking off the approval request for beta, but still tracking.
Comment on attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Looks like we don't have a viable patch for beta yet.
Attachment #8731704 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #8734153 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
(Assignee)

Comment 32

3 years ago
I'm going to re-open this and land extra changes here as it will make it easier to uplift to Fx47, if we decide to do that.

One thing I noticed, all of the crash reports that I've looked at have a very large number of threads.
The initialisation for the broker does start a new thread, so maybe there is some problem with that rather than two Init()s happening at the same time.

If the large number of threads is causing an issue, maybe it would be manifesting itself in other ways as well.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 33

3 years ago
Comment on attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Clearing this as it was backed out.
Attachment #8731704 - Flags: approval-mozilla-aurora+
(Assignee)

Comment 34

3 years ago
Created attachment 8738249 [details] [diff] [review]
Part 2: Move SandboxBroker Initialization earlier and add telemetry and extra null checks

MozReview-Commit-ID: Fu05wLn27UG

This moves the initialization into XRE_mainInit, which is probably a better place and works for gtest as well.

Hopefully, the problems when I uplifted to Aurora were because I wasn't initialising early enough.
If not, I've added Telemetry to log the result of SandboxBroker::Initialize.
So we should know if it's failing in some circumstances.

Also added null checks, to prevent the crashes I caused if the Initialize isn't called/fails.
As before this is only the case if content sandboxing isn't compiled in, if it is we crash.
Although we haven't seen that on Nightly as far as I can see.

Do you know who (if) I need to ask for review to get the Telemetry added?

Try push with this patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d69d83e42498e715664a2fc880957786f9c3abd2
Attachment #8738249 - Flags: review?(aklotz)

Updated

3 years ago
Attachment #8738249 - Flags: review?(aklotz) → review+
(In reply to Bob Owen (:bobowen) from comment #34)
> Do you know who (if) I need to ask for review to get the Telemetry added?

Until somebody says otherwise, my r+ should suffice on that given my previous role on the perf team.

Comment 37

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/65d070edff8b
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED
(Assignee)

Comment 38

3 years ago
Comment on attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Early days, but as there are no reports of SandboxBroker::Initialize failing in the telemetry I think we can try this on Aurora again now.
This time there will be the second patch as well.

Approval Request Comment
[Feature/regressing bug #]:
Possible threading issue has existed since chromium sandbox code landed.
It is also possible that another underlying problem with large numbers of threads is causing this to fail. Bug 1262964 has been filed on this.

[User impact if declined]:
Crash if issue occurs.
More instances seemed to be occurring on Beta.

[Describe test coverage new/current, TreeHerder]:
GMP, NPAPI 64-bit and e10s test the various launching of sandboxed processes.
The crash can occur even without the sandbox being enabled.

[Risks and why]:
Low to medium: changes are across a few files, but all are fairly simple. This also caused a top crash on Aurora last time, but those issues should have been addressed.

[String/UUID change made/needed]:
None
Attachment #8731704 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 39

3 years ago
Comment on attachment 8738249 [details] [diff] [review]
Part 2: Move SandboxBroker Initialization earlier and add telemetry and extra null checks

See comment 38 for main questions.

Change to file toolkit/components/telemetry/Histograms.json doesn't quite apply cleanly, but only because other histograms at the end of the file aren't in Aurora.
Attachment #8738249 - Flags: approval-mozilla-aurora?
Comment on attachment 8731704 [details] [diff] [review]
Initialize Windows sandbox BrokerServices before any child processes are created

Taking this in Aurora with the hopes that if all goes well, we can uplift to Beta46.
Attachment #8731704 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

3 years ago
Attachment #8738249 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
has problems to apply to aurora:

grafting 338383:65d070edff8b "Bug 1256992 Part 2: Move SandboxBroker Initialization earlier and add telemetry and extra null checks. r=aklotz"
merging security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp
merging toolkit/components/telemetry/Histograms.json
merging toolkit/xre/nsAppRunner.cpp
warning: conflicts while merging toolkit/components/telemetry/Histograms.json! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use hg resolve and hg graft --continue)
Flags: needinfo?(bobowen.code)
backed out for possible problems in my conflict resolution, Bob can you take a look, thanks!
status-firefox47: fixed → affected
(Assignee)

Comment 44

3 years ago
uplift
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/f9b9691fee9e
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/0b38f4474a28
status-firefox47: affected → fixed
Flags: needinfo?(bobowen.code)
Might this still be good to uplift to beta? Are the crashes mostly e10s related, or not?
Flags: needinfo?(bobowen.code)
(Assignee)

Comment 46

3 years ago
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #45)
> Might this still be good to uplift to beta? Are the crashes mostly e10s
> related, or not?

Yes I think so, just wanted to make sure I didn't cause a top-crash on Aurora like last time. :-)

The crashes are actually generally non-e10s, because if you have e10s enabled then the broker services are getting initialised early on anyway.
Flags: needinfo?(bobowen.code)
(Assignee)

Comment 47

3 years ago
Created attachment 8741397 [details] [diff] [review]
Beta patch for Part 2: Move SandboxBroker Initialization earlier and add telemetry and extra null checks
Comment on attachment 8741397 [details] [diff] [review]
Beta patch for Part 2: Move SandboxBroker Initialization earlier and add telemetry and extra null checks

[Triage Comment]

Please uplift to beta, we'll try this out in beta 11.
Attachment #8741397 - Flags: approval-mozilla-beta+
Attachment #8734153 - Flags: approval-mozilla-beta- → approval-mozilla-beta+
Part 1 doesn't apply cleanly to beta, is it needed or should I just land part 2?
Flags: needinfo?(bobowen.code)
(Assignee)

Comment 51

3 years ago
(In reply to Wes Kocher (:KWierso) from comment #49)
> Part 1 doesn't apply cleanly to beta, is it needed or should I just land
> part 2?

It needs both, they apply for me I can push if you like:
https://bugzilla.mozilla.org/attachment.cgi?id=8734153
https://bugzilla.mozilla.org/attachment.cgi?id=8741397
Flags: needinfo?(bobowen.code)
(In reply to Bob Owen (:bobowen) from comment #51)
> (In reply to Wes Kocher (:KWierso) from comment #49)
> > Part 1 doesn't apply cleanly to beta, is it needed or should I just land
> > part 2?
> 
> It needs both, they apply for me I can push if you like:
> https://bugzilla.mozilla.org/attachment.cgi?id=8734153
> https://bugzilla.mozilla.org/attachment.cgi?id=8741397

Yeah, helps if I actually pay attention and uplift the correct pair of patches... :)
(Assignee)

Updated

3 years ago
Duplicate of this bug: 1260115
(Assignee)

Updated

3 years ago
Duplicate of this bug: 1261261
You need to log in before you can comment on or make changes to this bug.