Closed Bug 1514028 Opened 8 months ago Closed 8 months ago
Crash in shutdownhang | ntdll
This bug was filed from the Socorro interface and is report bp-dbfb8fc5-2517-40b6-9060-9d97b0181213. ============================================================= #3 top overall crash in early 64 data: https://bit.ly/2EwAvmw. This Win 10 shutdown hang has been around in other versions as well: https://bit.ly/2LfCYT7. Some comments mention it occurring during an update from 63.0.3. The URLs don't seem to show a particular trend, but there are a few that are the first run pages: *https://www.mozilla.org/ru/firefox/64.0/whatsnew/?oldversion=63.0.3 *https://www.mozilla.org/pl/firefox/64.0/whatsnew/?oldversion=63.0.3 Top 10 frames of crashing thread: 0 ntdll.dll ntdll.dll@0x9e294 1 ntdll.dll ntdll.dll@0x25f28 2 kernelbase.dll SleepConditionVariableSRW 3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:58 4 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::GetEvent xpcom/threads/ThreadEventQueue.cpp:168 5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1172 6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:530 7 xul.dll static bool mozilla::SpinEventLoopUntil<mozilla::ProcessFailureBehavior::ReportToCaller, `lambda at z:/build/build/src/netwerk/protocol/http/nsHttpConnectionMgr.cpp:256:24'> xpcom/threads/nsThreadUtils.h:347 8 xul.dll mozilla::net::nsHttpConnectionMgr::Shutdown netwerk/protocol/http/nsHttpConnectionMgr.cpp:256 9 xul.dll void mozilla::net::nsHttpHandler::ShutdownConnectionManager netwerk/protocol/http/nsHttpHandler.cpp:2743 =============================================================
> #3 top overall crash in early 64 data: It is now #1 Quite similar are shutdownhang | ntdll.dll@0x6c55c and shutdownhang | ntdll.dll@0x90b80 - both windows 10, and #25 and #26 respectively for 64.0
I dug into the main signature a little. Although it doesn't appear to show in the correlations for all the reports, IPSEng64.dll Version 184.108.40.206 is showing up in a few of the reports I sampled. According to the Symantec website (https://support.symantec.com/en_US/article.TECH174537.html) they show support for Firefox only up to Version 61. Probably worth us doing some outreach to see why. But I also see other anti virus programs, such as 360 Safe Guard. So all the reports don't have the Endpoint protection noted above.
This is the #3 overall top crash in 64, adding keyword. Maybe someone can help me bucket this in a better component? Based on the stack moving it into XPCOM.
Component: General → XPCOM
Adding ni on Nathan for triage.
It looks like the main issue here is this dll thing being in the signature. At least 43% of the crashes look like bug 1420736, and at least 36% of them look like bug 1411908. Maybe the switch to Clang on Windows caused problems with signatures? Maybe we can add ntdll.dll to the skip list?
Bug 1435962 is a better bug for the http shutdown hang and bug 1487194 is a better bug for the quota manager hang.
Will, any ideas what the right thing to do is for a generic dll like this junking up the signatures? Adding something this generic to the irrelevant signature list seems a little dicey, but maybe that's okay.
Flags: needinfo?(nfroyd) → needinfo?(willkg)
Andrew and I mid-aired, here's my comment for posterity with some links: This is particular shutdown hang is a necko shutdown issue. We send an event to process on the STS thread  during HttpConnectionManager shutdown and then spin waiting for it  to set a bool. All the reports I've looked at have the STS thread sitting in some sort of SSL handshake  which prevents running the shutdown event. We could probably use a better signature though, as this just lumps together all shutdown hangs caused by event loops waiting on that CV. A fair amount are also going through QuotaManager (ie bp-52b3fb6e-a19f-47c5-aa38-26d970181218). For stacks like this it's really frame 8 that helps us differentiate. Is there a way to split those out?  https://hg.mozilla.org/releases/mozilla-release/annotate/2085f7f22a53096e032c9c0a4931d282ed3f0d43/netwerk/protocol/http/nsHttpConnectionMgr.cpp#l240  https://hg.mozilla.org/releases/mozilla-release/annotate/2085f7f22a53096e032c9c0a4931d282ed3f0d43/netwerk/protocol/http/nsHttpConnectionMgr.cpp#l256  https://hg.mozilla.org/releases/mozilla-release/annotate/2085f7f22a53096e032c9c0a4931d282ed3f0d43/security/manager/ssl/nsNSSIOLayer.cpp#l433
Bug 1491721 says "according to the proto signature this is likely the same issue as in bug 1487194, but we are missing symbols for some system libraries again after a windows patchday..." Maybe this is the same thing? eg Windows updated but we haven't gotten symbols yet so our nice signature stuff isn't kicking in.
We could add ntdll.dll to the prefix list which would tell signature generation to continue to the next frame of the stack. I can fiddle with that and show some examples tomorrow. Having said that, this is probably a "figure out which symbols we're missing, get them added, reprocess" kind of thing.
Adding ntdll.dll to the prefix list seems promising to at least break up the group into better buckets: app@73b0aa89b441:/app$ socorro-cmd signature dbfb8fc5-2517-40b6-9060-9d97b0181213 Crash id: dbfb8fc5-2517-40b6-9060-9d97b0181213 Original: shutdownhang | ntdll.dll@0x9e294 New: shutdownhang | ntdll.dll@0x9e294 | ntdll.dll@0x25f28 | mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown Same?: False I wrote up bug #1515487 to cover that.
there are no crashes with the signature in last 3 months. the report in comment 0 is coming from here: https://hg.mozilla.org/releases/mozilla-release/annotate/8337ebb86a425a1c65467fc68eb7c26b9046159e/security/nss/lib/ssl/ssl3con.c#l11118 which is engaged when SSLKEYLOGFILE is set to save the ssl key log file. -> NSS + WFM (maybe even WONTFIX) In general, I believe this just belongs to the group of Windows i/o hang bugs.
Assignee: nobody → nobody
Status: NEW → RESOLVED
Closed: 8 months ago
Component: Networking → Libraries
Product: Core → NSS
QA Contact: jjones
Resolution: --- → WORKSFORME
Version: Trunk → other
You need to log in before you can comment on or make changes to this bug.