Crash in [@ nsNSSComponent::nsNSSComponent]
Categories
(Core :: Security: PSM, defect, P1)
Tracking
()
People
(Reporter: marcia, Assigned: keeler)
References
Details
(Keywords: crash, regression, Whiteboard: [psm-assigned], qa-not-actionable)
Crash Data
Attachments
(2 files)
This bug is for crash report bp-6fb4247d-62da-4d7b-8785-116d40190415.
Seen while looking at nightly crash stats, but visible in 67 beta as well: https://bit.ly/2IxblWd
This signature has been seen before in Bug 1301407 which was duped to another bug. Looks as if this started in 67.0b3 and has continued through the betas. There are more crashes than installs. No comments.
Some correlations in beta:
(100.0% in signature vs 00.36% overall) moz_crash_reason = MOZ_RELEASE_ASSERT(NS_IsMainThread())
(62.88% in signature vs 09.56% overall) Module "ntdsapi.dll" = true [94.92% vs 16.83% if platform_version = 6.1.7601 Service Pack 1]
(66.67% in signature vs 26.96% overall) Module "apphelp.dll" = true [100.0% vs 29.80% if platform_version = 6.1.7601 Service Pack 1]
(96.21% in signature vs 17.51% overall) Module "fastprox.dll" = true [96.43% vs 23.47% if platform_version = 10.0.17134]
(96.21% in signature vs 17.73% overall) Module "wbemsvc.dll" = true [96.43% vs 23.56% if platform_version = 10.0.17134]
(96.21% in signature vs 18.09% overall) Module "wbemcomn.dll" = true [96.43% vs 23.86% if platform_version = 10.0.17134]
(96.21% in signature vs 18.19% overall) Module "wbemprox.dll" = true [96.43% vs 23.88% if platform_version = 10.0.17134]
(99.24% in signature vs 47.96% overall) reason = EXCEPTION_BREAKPOINT
(99.24% in signature vs 37.43% overall) Module "winsta.dll" = true [100.0% vs 54.04% if platform_version = 10.0.17134]
(72.73% in signature vs 28.43% overall) Module "urlmon.dll" = true [100.0% vs 55.06% if platform_version = 10.0.17134]
(43.94% in signature vs 86.79% overall) Module "softokn3.dll" = true [37.21% vs 89.06% if platform_pretty_version = Windows 7]
(99.24% in signature vs 46.16% overall) Module "psapi.dll" = true [100.0% vs 59.50% if platform_version = 10.0.17134]
(74.24% in signature vs 29.66% overall) contains_memory_report = null
(49.24% in signature vs 94.59% overall) Addon "webcompat-reporter@mozilla.org" = true
(49.24% in signature vs 94.53% overall) Addon "fxmonitor@mozilla.org" = true
(50.00% in signature vs 94.63% overall) Addon "formautofill@mozilla.org" = true
(50.00% in signature vs 94.63% overall) Addon "webcompat@mozilla.org" = true
(50.00% in signature vs 94.39% overall) Addon "screenshots@mozilla.org" = true
Top 10 frames of crashing thread:
0 xul.dll nsNSSComponent::nsNSSComponent security/manager/ssl/nsNSSComponent.cpp:208
1 xul.dll mozilla::xpcom::CreateInstanceImpl xpcom/components/StaticComponents.cpp:10546
2 xul.dll nsresult nsComponentManagerImpl::GetServiceLocked xpcom/components/nsComponentManager.cpp:1387
3 xul.dll nsComponentManagerImpl::GetServiceByContractID xpcom/components/nsComponentManager.cpp:1574
4 xul.dll static bool mozilla::net::CanEnableSpeculativeConnect netwerk/protocol/http/nsHttpHandler.cpp:2349
5 xul.dll nsresult mozilla::detail::RunnableFunction<`lambda at z:/task_1555273210/build/src/netwerk/protocol/http/nsHttpHandler.cpp:2390:61'>::Run xpcom/threads/nsThreadUtils.h:562
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1180
7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
8 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:303
9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
Updated•6 years ago
|
Comment 1•6 years ago
|
||
Dana, could we get an assignee to this bug? The spike on beta is noticeable. Thanks.
Comment 2•6 years ago
|
||
This is an assertion failure: MOZ_RELEASE_ASSERT(NS_IsMainThread())
Comment 3•6 years ago
|
||
It's crashing on "MOZ_RELEASE_ASSERT(NS_IsMainThread());" so the problem is the caller, not NSS itself.
On the stack is the background thread check added in Bug 1435141 in Fx67
https://hg.mozilla.org/releases/mozilla-beta/annotate/13b7d4d4df59f7f2e50f66644daf655bad40bec8/netwerk/protocol/http/nsHttpHandler.cpp#l2346
Either that was always wrong or something shifted around to change the relative order between that check and starting NSS. Valentin is out so adding a ni? for mayhemer
Assignee | ||
Comment 4•6 years ago
|
||
That should get called through nsHttpHandler::MaybeEnableSpeculativeConnect()
, which calls net_EnsurePSMInit()
on the main thread [0]. The problem is that net_EnsurePSMInit()
doesn't actually enforce that nsINSSComponent
is successfully initialized in non-debug builds [1]. If this first call fails on the main thread, subsequent calls off the main thread will hit this release assertion. Note that if we can't successfully initialize nsINSSComponent
, something is very wrong and the browser probably shouldn't try to continue.
[0] https://searchfox.org/mozilla-beta/rev/22be965751d52a56f4e6920d58283152e0d8bec0/netwerk/protocol/http/nsHttpHandler.cpp#2384
[1] https://searchfox.org/mozilla-beta/rev/22be965751d52a56f4e6920d58283152e0d8bec0/netwerk/base/nsNetUtil.cpp#2440
Comment 6•6 years ago
|
||
Dana, I understand we should rightly crash if NSS isn’t available. Any ideas on why we could be landing here? If there’s nothing we can do here, should we close it as environmental issue?
Assignee | ||
Comment 7•6 years ago
|
||
Probably the best course of action is to land some diagnostic assertions to try and see where in NSS initialization Firefox is failing. It may be something we can fix.
Assignee | ||
Comment 8•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Comment 10•6 years ago
|
||
bugherder |
Assignee | ||
Comment 11•6 years ago
|
||
Crashes as a result of 2e4a7bcc1a95 indicate that InitializeNSSWithFallbacks is
failing. Hopefully this will give us more information as to why.
Comment 12•6 years ago
|
||
Comment 13•6 years ago
|
||
Backed out changeset 814e0d966842 (Bug 1544511) for linux build bustages at Assertions.h:344:73: error: format '%d' expects argument of type 'int'.
Backout: https://hg.mozilla.org/integration/autoland/rev/6b6600f6781c585371dd334b26ca1a6623c29e28
Push that started the failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&selectedJob=243457594&revision=814e0d96684235d7e6a2024b9da8e8a0acbe6d59
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243457594&repo=autoland&lineNumber=33946
Assignee | ||
Updated•6 years ago
|
Comment 14•6 years ago
|
||
Comment 15•6 years ago
|
||
Backed out changeset 5f58e2f5d1f7 (bug 1544511) for Build bustage. CLOSED TREE
Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243730097&repo=autoland&lineNumber=33949
Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=243720594&revision=5f58e2f5d1f75f42edea41b1015a6037fe7764c3
Backout:
https://hg.mozilla.org/integration/autoland/rev/e587cf00dc672e07c00b7a423afb35419d6f8095
Assignee | ||
Comment 16•6 years ago
|
||
Ok - I'm pretty sure my try push actually has coverage of the build that was failing now: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef76b0933815e697101e70840ce17bcdf7ffbe9c
Comment 17•6 years ago
|
||
Comment 18•6 years ago
|
||
bugherder |
Assignee | ||
Comment 19•6 years ago
|
||
So far I've found one report that indicates we're getting SEC_ERROR_PKCS11_DEVICE_ERROR upon calling the last-resort NSS_NoDB_Init: https://crash-stats.mozilla.com/report/index/db6dbe89-2ebf-4a0b-9f9c-6eb5f0190506
Comment 20•6 years ago
|
||
Dana, should this be kept open? This is tracking for 67.
Assignee | ||
Comment 21•6 years ago
|
||
Those patches don't fix the bug, they just give us more information. At this point, we're still trying to figure out why this is happening.
Comment 22•6 years ago
|
||
Setting as fix optional for 67 as we are not going to have a fix in time for the release but if we get a fix after the release and the patch is safe to uplift, I would probably take it in a dot release.
Comment 23•5 years ago
|
||
Hi Dana, are there any more diagnostics or info for this one?
Assignee | ||
Comment 24•5 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #19)
So far I've found one report that indicates we're getting SEC_ERROR_PKCS11_DEVICE_ERROR upon calling the last-resort NSS_NoDB_Init: https://crash-stats.mozilla.com/report/index/db6dbe89-2ebf-4a0b-9f9c-6eb5f0190506
Kai, is there a reason you can think of why calling NSS_NoDB_Init
would result in the error SEC_ERROR_PKCS11_DEVICE_ERROR
?
Comment 25•5 years ago
|
||
Does NSS_NoDB_Init() actually still work? Would it make sense to test if Firefox starts up, if you disable the regular calls to init NSS in PSM, and go straight to the last-resort code?
The error might originate from CKR_DEVICE_ERROR. Some of the PKCS modules are "software devices" from NSS' point of view, maybe NSS has trouble loading them?
IIUC these crashes were first seen with Firefox 67, so maybe it's a side effect by changes in NSS 3.43 ?
Looking at https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/NSS_3.43_release_notes there isn't anything obvious.
Comment 26•5 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:keeler, maybe it's time to close this bug?
Assignee | ||
Comment 27•5 years ago
|
||
No, this still happening. It's unclear how to proceed, unfortunately.
Comment 28•5 years ago
|
||
This is %100 reproducible when kernel fips mode is enabled with fips=1 on the kernel cmd line.
Assignee | ||
Comment 29•5 years ago
|
||
Are you using Firefox downloaded from Mozilla or re-packaged by your linux distro?
Comment 30•5 years ago
|
||
These are the official binaries from mozilla.org.
I originally discovered this when upgrading from 60ESR to 68.4.1
I bisected up from 60, until I ran into 67. I then worked with Aryx on irc to get a crash report submitted.
https://crash-stats.mozilla.org/report/index/f81c27b4-00da-437e-8041-2715e0200116
Comment 31•5 years ago
|
||
(In reply to doncuppjr from comment #28)
This is %100 reproducible when kernel fips mode is enabled with fips=1 on the kernel cmd line.
FIPS issues are probably interesting to RH people, cc'ing.
Assignee | ||
Updated•5 years ago
|
Comment 32•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:keeler, maybe it's time to close this bug?
Assignee | ||
Comment 33•4 years ago
|
||
No, this is still an issue. We just don't know how to move forward here.
Comment 34•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:keeler, maybe it's time to close this bug?
Assignee | ||
Comment 35•4 years ago
|
||
We still don't really have an answer for this. Since this particular crash signature essentially masks where the real problem occurred, perhaps our best approach for now would be to change nsNSSComponent's initialization to crash immediately upon encountering an issue, rather than returning an error and then having some other thread try later, which hits the "must be on the main thread" release assertion.
Comment 36•4 years ago
|
||
It's no longer an issue for us, as we have moved to a newer ESR release.
Assignee | ||
Updated•4 years ago
|
Comment 37•3 years ago
|
||
This crash also happens on the Mac, though only in small numbers. And recently a crash report showed up with interesting mac_crash_info
:
bp-34fe2d10-fd78-42ea-a143-c130f0210610
{
"num_records": 1,
"records": [
{
"message": "\u043d#\u0012\u0001",
"module": "/System/Library/PrivateFrameworks/CoreUI.framework/Versions/A/CoreUI"
}
]
}
Yes, that "message" does need interpretation. Later I'll try to provide it.
Comment 38•3 years ago
|
||
(Following up comment #37)
In the CoreUI private framework, it's an internal _CUILog()
method that writes to the "message" field in "mac crash info". I used a disassembler to look through many of that framework's calls to _CUILog()
, and none of them comes close to matching the output here. "mac crash info" string fields (like "message") are UTF8 strings. I suspect this one is in non-Roman script, and Socorro's stackwalker
is unable to interpret it.
So this "message" is useless here.
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Comment 39•3 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:keeler, maybe it's time to close this bug?
Assignee | ||
Comment 40•3 years ago
|
||
Sure - I guess we really don't have anything we can do here.
Updated•3 years ago
|
Description
•