Crash in shutdownhang | NtWaitForAlertByThreadId | RtlSleepConditionVariableSRW | SleepConditionVariableSRW | mozilla::detail::ConditionVariableImpl::wait | mozilla::CondVar::Wait | nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent | nsT...

RESOLVED WORKSFORME

Status

()

Core
Networking
P1
critical
RESOLVED WORKSFORME
5 months ago
a month ago

People

(Reporter: calixte, Unassigned)

Tracking

({crash, topcrash-thunderbird})

56 Branch
x86
Windows 10
crash, topcrash-thunderbird
Points:
---

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox54 unaffected, firefox55 wontfix, firefox56 wontfix)

Details

(Whiteboard: [necko-active][tbird topcrash], crash signature)

(Reporter)

Description

5 months ago
This bug was filed from the Socorro interface and is 
report bp-8dc5b2ff-2bcd-446f-8552-ea31d0170629.
=============================================================

There are 68 crashes in beta 55 and 9 in nightly 56, they all appeared the 2017-06-29.
:erahm, could you investigate please ?
Flags: needinfo?(erahm)
This looks like a mozilla::net::nsSocketTransportService::Shutdown hang, not sure I can add much here.
Component: XPCOM → Networking
Flags: needinfo?(erahm)
Dragana might have some thoughts?
Flags: needinfo?(dd.mozilla)
Whiteboard: [necko-active]
I think this is out standard shutdown hangs.

I looked at couple of crashes and I notice a lot of nss and psm hangs:
(in mozilla::psm::StopSSLServerCertVerificationThreads())
https://crash-stats.mozilla.com/report/index/8dc5b2ff-2bcd-446f-8552-ea31d0170629#allthreads
https://crash-stats.mozilla.com/report/index/75c92d0f-1aaf-484e-9492-313400170711#allthreads
https://crash-stats.mozilla.com/report/index/c719826d-fc6a-4683-b6cd-cf7940170711#allthreads
https://crash-stats.mozilla.com/report/index/64a66245-22c6-48e6-b791-2a5c00170711#allthreads

(more nss hangs):
https://crash-stats.mozilla.com/report/index/a65a1e27-1440-47bf-a0f4-d96ec0170711#allthreads
https://crash-stats.mozilla.com/report/index/80526016-c258-407f-be40-c51970170711#allthreads
https://crash-stats.mozilla.com/report/index/ac858f02-12de-4d4f-b8e1-14df30170711#allthreads


socketThread is already shutdown at these hangs:
https://crash-stats.mozilla.com/report/index/f5e03c42-6861-4bb4-9df4-cb78c0170711#allthreads
https://crash-stats.mozilla.com/report/index/a843582d-7d2f-43ba-99b2-f14a20170711#allthreads
https://crash-stats.mozilla.com/report/index/789873ed-a4bb-4a6e-86a0-14fe80170711#allthreads
https://crash-stats.mozilla.com/report/index/55b46a98-f916-499d-90ee-017f50170711#allthreads
https://crash-stats.mozilla.com/report/index/cd13db6f-8d29-4232-adcf-9e43e0170711#allthreads


ttaubert, keeler, ca you please look at some psm and nss hangs?
Flags: needinfo?(ttaubert)
Flags: needinfo?(dkeeler)
Flags: needinfo?(dd.mozilla)
FWIW this signature appeared on June 29 because that's when bug 1375511 was deployed on crash-stats, there was an even more generic signature for these issues before then.
(In reply to Dragana Damjanovic [:dragana] from comment #3)
> I think this is out standard shutdown hangs.
> 
> I looked at couple of crashes and I notice a lot of nss and psm hangs:
> (in mozilla::psm::StopSSLServerCertVerificationThreads())
> https://crash-stats.mozilla.com/report/index/8dc5b2ff-2bcd-446f-8552-
> ea31d0170629#allthreads
> https://crash-stats.mozilla.com/report/index/75c92d0f-1aaf-484e-9492-
> 313400170711#allthreads
> https://crash-stats.mozilla.com/report/index/c719826d-fc6a-4683-b6cd-
> cf7940170711#allthreads
> https://crash-stats.mozilla.com/report/index/64a66245-22c6-48e6-b791-
> 2a5c00170711#allthreads
> 

Some of these appear to be hanging in nsNSSHttpRequestSession::internal_send_receive_attempt, much like some of the reports in bug 1375726.

Others seem to be hanging while attempting to acquire a reentrant lock in NSS. Interestingly, these all have loaded the PKCS#11 module "aetpkss1.dll", which I've seen in many of these hangs. Either there's a bug in NSS and/or PSM that's exacerbated by having a PKCS#11 module or this particular module is misbehaving and causing these hangs.

> (more nss hangs):
> https://crash-stats.mozilla.com/report/index/a65a1e27-1440-47bf-a0f4-
> d96ec0170711#allthreads
> https://crash-stats.mozilla.com/report/index/80526016-c258-407f-be40-
> c51970170711#allthreads
> https://crash-stats.mozilla.com/report/index/ac858f02-12de-4d4f-b8e1-
> 14df30170711#allthreads

These all have a PKCS#11 module loaded (aetpkss11.dll or bit4xpki.dll, although the latter doesn't directly show up in the stacks)

> socketThread is already shutdown at these hangs:
> https://crash-stats.mozilla.com/report/index/f5e03c42-6861-4bb4-9df4-
> cb78c0170711#allthreads
> https://crash-stats.mozilla.com/report/index/a843582d-7d2f-43ba-99b2-
> f14a20170711#allthreads
> https://crash-stats.mozilla.com/report/index/789873ed-a4bb-4a6e-86a0-
> 14fe80170711#allthreads
> https://crash-stats.mozilla.com/report/index/55b46a98-f916-499d-90ee-
> 017f50170711#allthreads
> https://crash-stats.mozilla.com/report/index/cd13db6f-8d29-4232-adcf-
> 9e43e0170711#allthreads

These don't seem to have anything to do with NSS or PSM.
Flags: needinfo?(dkeeler)
(In reply to David Keeler [:keeler] (use needinfo?) from comment #5)
> (In reply to Dragana Damjanovic [:dragana] from comment #3)
> Others seem to be hanging while attempting to acquire a reentrant lock in
> NSS. Interestingly, these all have loaded the PKCS#11 module "aetpkss1.dll",
> which I've seen in many of these hangs. Either there's a bug in NSS and/or
> PSM that's exacerbated by having a PKCS#11 module or this particular module
> is misbehaving and causing these hangs.

Yeah, these seem similar to what I wrote in bug 1372505 comment #13.

The last two have the socket thread and multiple pkix threads hanging at nssSlot_EnterMonitor(). The SmartCard thread looks suspicious, and I think that `mod->refLock` as well as `PK11SlotInfo->sessionLock` and `nssSlot->lock` actually all refer to the same locks.

https://searchfox.org/nss/rev/54740990248e08713f43ce1ea0e0440ed28df2dc/lib/pk11wrap/pk11slot.c#365
https://searchfox.org/nss/rev/54740990248e08713f43ce1ea0e0440ed28df2dc/lib/pk11wrap/dev3hack.c#122

There's probably plenty of possibility to deadlock...

Another thing that I wondered, we seem to call/use SECMOD_WaitForAnyTokenEvent(), but never use SECMOD_CancelWait(). Not sure if that's a problem.

https://searchfox.org/nss/search?q=symbol:_Z17SECMOD_CancelWait&redirect=false

> > socketThread is already shutdown at these hangs:
> > https://crash-stats.mozilla.com/report/index/f5e03c42-6861-4bb4-9df4-
> > cb78c0170711#allthreads
> > https://crash-stats.mozilla.com/report/index/a843582d-7d2f-43ba-99b2-
> > f14a20170711#allthreads
> > https://crash-stats.mozilla.com/report/index/789873ed-a4bb-4a6e-86a0-
> > 14fe80170711#allthreads
> > https://crash-stats.mozilla.com/report/index/55b46a98-f916-499d-90ee-
> > 017f50170711#allthreads
> > https://crash-stats.mozilla.com/report/index/cd13db6f-8d29-4232-adcf-
> > 9e43e0170711#allthreads
> 
> These don't seem to have anything to do with NSS or PSM.

I can't find anything that points to NSS/PSM either.
Flags: needinfo?(ttaubert)

Comment 7

5 months ago
#2 crash for Thunderbird 55.0b2.  most of users never crashed prior to 55.0b2. Some examples
bp-4a5221ef-f6f9-49f4-a9a7-cd8670170726
bp-7ee2f376-c54c-49a1-a86f-7080f0170719
bp-8e663371-57c7-4b6e-b964-1241b0170719
bp-ae4e7ffc-69d3-4a18-be8f-fe38a0170726

A few use frontier.  Not sure what to make of it.
Keywords: topcrash-thunderbird
Whiteboard: [necko-active] → [necko-active][tbird topcrash]
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #7)
> #2 crash for Thunderbird 55.0b2.  most of users never crashed prior to
> 55.0b2. Some examples
> bp-4a5221ef-f6f9-49f4-a9a7-cd8670170726
> bp-7ee2f376-c54c-49a1-a86f-7080f0170719
> bp-8e663371-57c7-4b6e-b964-1241b0170719
> bp-ae4e7ffc-69d3-4a18-be8f-fe38a0170726
> 
> A few use frontier.  Not sure what to make of it.

bp-7ee2f376-c54c-49a1-a86f-7080f0170719 and bp-ae4e7ffc-69d3-4a18-be8f-fe38a0170726 are well known IMAP password dialog issue.  User doesn't close this password dialog, so shutdown isn't processed.
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1

Comment 10

3 months ago
Please check https://github.com/greasemonkey/greasemonkey/issues/2573

Comment 11

3 months ago
Can I work around this in my (legacy XPCOM/embedded web) extension?  Am I supposed to be calling a shut down in addition to a start up?

This is happening in Greasemonkey 3.12, but not 3.11, where ~the only change is embedding a webext, for migrating data into.

Or is there a way for me to verify that this is the real cause of the failure we're seeing (comment #10)?
I don't think the GreaseMonkey issue is related to the NSS issue that's discussed at the top of this bug. The GreaseMonkey stacks are hanging while a synchronous XMTLHttpRequest hangs spinning the event loop, e.g: https://crash-stats.mozilla.com/report/index/da1e71e9-3d7d-47c4-a237-7cf3a1170922

Comment 13

3 months ago
A crash report from the Greasemonkey issue: https://crash-stats.mozilla.com/report/index/7c18a99c-428b-40a8-8e30-360e81170921

If you click the "Bugzilla" tab for that tab, it lists two related bugs:
bug 1388370 (marked as fixed over a month ago) and this one.
Also, the crash report signature is the same as this bug, that's why I mentioned it as related in the issue in GitHub.
(In reply to Kostas from comment #13)
> Also, the crash report signature is the same as this bug, that's why I
> mentioned it as related in the issue in GitHub.

Yes, the crash signature matches, but from a quick look the underlying cause is probably different. My comment was addressed at comment 11, which sounds like it didn't realize the initial comments are about a problem that's probably not the same as the one greasemonkey is seeing.

(I might also just be wrong and confusing matters for everyone, so I'll shut up until someone more informed can comment instead)

Comment 15

3 months ago
I filed another bug ticket (bug 1402201) for GM crash. 

I didn't seem to get the same signature as this one (one of report: https://crash-stats.mozilla.com/report/index/01f2d2f5-415b-4c6c-aca3-b9fa30170922).

Comment 16

3 months ago
> I didn't seem to get the same signature as this one (one of report: https://crash-stats.mozilla.com/report/index/01f2d2f5-415b-4c6c-aca3-b9fa30170922).

In that report you use FF 56 beta.
That's why you got a different signature ("shutdownhang | NtWaitForKeyedEvent | RtlSleepConditionVariableCS | SleepConditionVariableCS").

I use FF 55.0.3 stable.
I've recreated this repeatedly, and in no case did I get that signature, 
it's always the same as this bug title:

https://crash-stats.mozilla.com/report/index/bp-ff15a46d-e0e1-4e3e-8293-9eac41170922
https://crash-stats.mozilla.com/report/index/bp-f5292a96-cd71-47e4-82b8-385af0170922
https://crash-stats.mozilla.com/report/index/bp-0e4c18d1-6106-47ee-80c7-063dc1170922
https://crash-stats.mozilla.com/report/index/bp-82af81d5-0122-4455-a9fa-86b7d1170922
https://crash-stats.mozilla.com/report/index/bp-ad897f29-0a40-4f7b-beeb-ffa0d1170922

Comment 17

3 months ago
(In reply to Kostas from comment #16)
> > I didn't seem to get the same signature as this one (one of report: https://crash-stats.mozilla.com/report/index/01f2d2f5-415b-4c6c-aca3-b9fa30170922).
> 
> In that report you use FF 56 beta.
> That's why you got a different signature ("shutdownhang |
> NtWaitForKeyedEvent | RtlSleepConditionVariableCS |
> SleepConditionVariableCS").

I can't get the same signature in stable either.

https://crash-stats.mozilla.com/report/index/dee75707-6a2c-4dca-b6ea-126a41170922
https://crash-stats.mozilla.com/report/index/85e8ed57-45ee-4aba-82ae-630411170922
https://crash-stats.mozilla.com/report/index/efaf3519-1246-4768-9f25-9935c1170922

All GM crashes, have signature of "shutdownhang | NtWaitForKeyedEvent | RtlSleepConditionVariableSRW | SleepConditionVariableSRW |  ...".

Comment 18

3 months ago
Clarification: by "can't get the same" I mean even in stable my signature is different from the title here.

Comment 19

3 months ago
> I mean even in stable my signature is different from the title here.

They are almost the same (except  NtWaitForAlertByThreadId  -> NtWaitForKeyedEvent ):

shutdownhang | NtWaitForAlertByThreadId | RtlSleepConditionVariableSRW | SleepConditionVariableSRW | mozilla::detail::ConditionVariableImpl::wait | mozilla::CondVar::Wait | nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent | nsThread::Ge..

shutdownhang | NtWaitForKeyedEvent      | RtlSleepConditionVariableSRW | SleepConditionVariableSRW | mozilla::detail::ConditionVariableImpl::wait | mozilla::CondVar::Wait | nsEventQueue::GetEvent | nsThread::nsChainedEventQueue::GetEvent | nsThread::GetEven...



I see in your reports that you use win 7. Maybe that's why it's different (I use win 10). 

Or, maybe you used the same profile from FF 56, to FF 55 ?
If yes, do you have a backup of your Firefox profile from FF 55 to restore, and try to recreate the issue ?

Comment 20

3 months ago
I use brand new profile with minimal files:

extensions\{e4a8a97b-f2ed-450b-b12d-ee082ba24781}.xpi (GM)
better_better_booru.user.js (the script I encountered problem.)
config.xml (GM config)
extensions.json (so I don't need to re-install it everytime).

to test.
I don't see any crashes with this signature in the last month. 
Jason, can we close this bug?
status-firefox55: affected → wontfix
status-firefox56: affected → wontfix
Flags: needinfo?(jduell.mcbugs)
Most probably the signature has changed. We can close this bug.
Status: NEW → RESOLVED
Last Resolved: a month ago
Flags: needinfo?(jduell.mcbugs)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.