Crash in [@ nsNSSComponent::nsNSSComponent]

NEW
Assigned to

Status

()

defect
P1
critical
3 months ago
20 days ago

People

(Reporter: marcia, Assigned: keeler)

Tracking

({crash, leave-open, regression})

Trunk
x86
Windows 7
Points:
---

Firefox Tracking Flags

(firefox67+ wontfix, firefox68 wontfix)

Details

(Whiteboard: [psm-assigned], crash signature)

Attachments

(2 attachments)

This bug is for crash report bp-6fb4247d-62da-4d7b-8785-116d40190415.

Seen while looking at nightly crash stats, but visible in 67 beta as well: https://bit.ly/2IxblWd

This signature has been seen before in Bug 1301407 which was duped to another bug. Looks as if this started in 67.0b3 and has continued through the betas. There are more crashes than installs. No comments.

Some correlations in beta:

(100.0% in signature vs 00.36% overall) moz_crash_reason = MOZ_RELEASE_ASSERT(NS_IsMainThread())
(62.88% in signature vs 09.56% overall) Module "ntdsapi.dll" = true [94.92% vs 16.83% if platform_version = 6.1.7601 Service Pack 1]
(66.67% in signature vs 26.96% overall) Module "apphelp.dll" = true [100.0% vs 29.80% if platform_version = 6.1.7601 Service Pack 1]
(96.21% in signature vs 17.51% overall) Module "fastprox.dll" = true [96.43% vs 23.47% if platform_version = 10.0.17134]
(96.21% in signature vs 17.73% overall) Module "wbemsvc.dll" = true [96.43% vs 23.56% if platform_version = 10.0.17134]
(96.21% in signature vs 18.09% overall) Module "wbemcomn.dll" = true [96.43% vs 23.86% if platform_version = 10.0.17134]
(96.21% in signature vs 18.19% overall) Module "wbemprox.dll" = true [96.43% vs 23.88% if platform_version = 10.0.17134]
(99.24% in signature vs 47.96% overall) reason = EXCEPTION_BREAKPOINT
(99.24% in signature vs 37.43% overall) Module "winsta.dll" = true [100.0% vs 54.04% if platform_version = 10.0.17134]
(72.73% in signature vs 28.43% overall) Module "urlmon.dll" = true [100.0% vs 55.06% if platform_version = 10.0.17134]
(43.94% in signature vs 86.79% overall) Module "softokn3.dll" = true [37.21% vs 89.06% if platform_pretty_version = Windows 7]
(99.24% in signature vs 46.16% overall) Module "psapi.dll" = true [100.0% vs 59.50% if platform_version = 10.0.17134]
(74.24% in signature vs 29.66% overall) contains_memory_report = null
(49.24% in signature vs 94.59% overall) Addon "webcompat-reporter@mozilla.org" = true
(49.24% in signature vs 94.53% overall) Addon "fxmonitor@mozilla.org" = true
(50.00% in signature vs 94.63% overall) Addon "formautofill@mozilla.org" = true
(50.00% in signature vs 94.63% overall) Addon "webcompat@mozilla.org" = true
(50.00% in signature vs 94.39% overall) Addon "screenshots@mozilla.org" = true

Top 10 frames of crashing thread:

0 xul.dll nsNSSComponent::nsNSSComponent security/manager/ssl/nsNSSComponent.cpp:208
1 xul.dll mozilla::xpcom::CreateInstanceImpl xpcom/components/StaticComponents.cpp:10546
2 xul.dll nsresult nsComponentManagerImpl::GetServiceLocked xpcom/components/nsComponentManager.cpp:1387
3 xul.dll nsComponentManagerImpl::GetServiceByContractID xpcom/components/nsComponentManager.cpp:1574
4 xul.dll static bool mozilla::net::CanEnableSpeculativeConnect netwerk/protocol/http/nsHttpHandler.cpp:2349
5 xul.dll nsresult mozilla::detail::RunnableFunction<`lambda at z:/task_1555273210/build/src/netwerk/protocol/http/nsHttpHandler.cpp:2390:61'>::Run xpcom/threads/nsThreadUtils.h:562
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1180
7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
8 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:303
9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308

Dana, could we get an assignee to this bug? The spike on beta is noticeable. Thanks.

Flags: needinfo?(dkeeler)

This is an assertion failure: MOZ_RELEASE_ASSERT(NS_IsMainThread())

It's crashing on "MOZ_RELEASE_ASSERT(NS_IsMainThread());" so the problem is the caller, not NSS itself.

https://hg.mozilla.org/releases/mozilla-beta/annotate/13b7d4d4df59f7f2e50f66644daf655bad40bec8/security/manager/ssl/nsNSSComponent.cpp#l208

On the stack is the background thread check added in Bug 1435141 in Fx67
https://hg.mozilla.org/releases/mozilla-beta/annotate/13b7d4d4df59f7f2e50f66644daf655bad40bec8/netwerk/protocol/http/nsHttpHandler.cpp#l2346

Either that was always wrong or something shifted around to change the relative order between that check and starting NSS. Valentin is out so adding a ni? for mayhemer

Flags: needinfo?(honzab.moz)

That should get called through nsHttpHandler::MaybeEnableSpeculativeConnect(), which calls net_EnsurePSMInit() on the main thread [0]. The problem is that net_EnsurePSMInit() doesn't actually enforce that nsINSSComponent is successfully initialized in non-debug builds [1]. If this first call fails on the main thread, subsequent calls off the main thread will hit this release assertion. Note that if we can't successfully initialize nsINSSComponent, something is very wrong and the browser probably shouldn't try to continue.

[0] https://searchfox.org/mozilla-beta/rev/22be965751d52a56f4e6920d58283152e0d8bec0/netwerk/protocol/http/nsHttpHandler.cpp#2384
[1] https://searchfox.org/mozilla-beta/rev/22be965751d52a56f4e6920d58283152e0d8bec0/netwerk/base/nsNetUtil.cpp#2440

Flags: needinfo?(dkeeler)

Agree with Dana. W/o NSS Fx can't live.

Flags: needinfo?(honzab.moz)

Dana, I understand we should rightly crash if NSS isn’t available. Any ideas on why we could be landing here? If there’s nothing we can do here, should we close it as environmental issue?

Flags: needinfo?(dkeeler)

Probably the best course of action is to land some diagnostic assertions to try and see where in NSS initialization Firefox is failing. It may be something we can fix.

Assignee: nobody → dkeeler
Flags: needinfo?(dkeeler)
Priority: -- → P1
Whiteboard: [psm-assigned]
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2e4a7bcc1a95
add some diagnostic assertions to nsNSSComponent::InitializeNSS to see why it's failing r=KevinJacobs

Crashes as a result of 2e4a7bcc1a95 indicate that InitializeNSSWithFallbacks is
failing. Hopefully this will give us more information as to why.

Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/814e0d966842
add some diagnostic crashes to InitializeNSSWithFallbacks to see why it's failing r=KevinJacobs
Flags: needinfo?(dkeeler)
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5f58e2f5d1f7
add some diagnostic crashes to InitializeNSSWithFallbacks to see why it's failing r=KevinJacobs

Ok - I'm pretty sure my try push actually has coverage of the build that was failing now: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef76b0933815e697101e70840ce17bcdf7ffbe9c

Flags: needinfo?(dkeeler)
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b2ddb30a2d07
add some diagnostic crashes to InitializeNSSWithFallbacks to see why it's failing r=KevinJacobs

So far I've found one report that indicates we're getting SEC_ERROR_PKCS11_DEVICE_ERROR upon calling the last-resort NSS_NoDB_Init: https://crash-stats.mozilla.com/report/index/db6dbe89-2ebf-4a0b-9f9c-6eb5f0190506

Dana, should this be kept open? This is tracking for 67.

Flags: needinfo?(dkeeler)

Those patches don't fix the bug, they just give us more information. At this point, we're still trying to figure out why this is happening.

Flags: needinfo?(dkeeler)

Setting as fix optional for 67 as we are not going to have a fix in time for the release but if we get a fix after the release and the patch is safe to uplift, I would probably take it in a dot release.

Hi Dana, are there any more diagnostics or info for this one?

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #19)

So far I've found one report that indicates we're getting SEC_ERROR_PKCS11_DEVICE_ERROR upon calling the last-resort NSS_NoDB_Init: https://crash-stats.mozilla.com/report/index/db6dbe89-2ebf-4a0b-9f9c-6eb5f0190506

Kai, is there a reason you can think of why calling NSS_NoDB_Init would result in the error SEC_ERROR_PKCS11_DEVICE_ERROR?

Flags: needinfo?(dkeeler) → needinfo?(kaie)

Does NSS_NoDB_Init() actually still work? Would it make sense to test if Firefox starts up, if you disable the regular calls to init NSS in PSM, and go straight to the last-resort code?

The error might originate from CKR_DEVICE_ERROR. Some of the PKCS modules are "software devices" from NSS' point of view, maybe NSS has trouble loading them?

IIUC these crashes were first seen with Firefox 67, so maybe it's a side effect by changes in NSS 3.43 ?
Looking at https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/NSS_3.43_release_notes there isn't anything obvious.

Flags: needinfo?(kaie)
You need to log in before you can comment on or make changes to this bug.