Closed Bug 1514028 Opened 8 months ago Closed 8 months ago

Crash in shutdownhang | ntdll.dll@0x9e294

Categories

(NSS :: Libraries, defect, critical)

Unspecified
Windows 10
defect
Not set
critical

Tracking

(firefox64 affected, firefox65 affected, firefox66 affected)

RESOLVED WORKSFORME
Tracking Status
firefox64 --- affected
firefox65 --- affected
firefox66 --- affected

People

(Reporter: marcia, Unassigned)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

This bug was filed from the Socorro interface and is
report bp-dbfb8fc5-2517-40b6-9060-9d97b0181213.
=============================================================

#3 top overall crash in early 64 data: https://bit.ly/2EwAvmw. This Win 10 shutdown hang has been around in other versions as well: https://bit.ly/2LfCYT7. Some comments mention it occurring during an update from 63.0.3. The URLs don't seem to show a particular trend, but there are a few that are the first run pages:

*https://www.mozilla.org/ru/firefox/64.0/whatsnew/?oldversion=63.0.3 
*https://www.mozilla.org/pl/firefox/64.0/whatsnew/?oldversion=63.0.3 

Top 10 frames of crashing thread:

0 ntdll.dll ntdll.dll@0x9e294 
1 ntdll.dll ntdll.dll@0x25f28 
2 kernelbase.dll SleepConditionVariableSRW 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:58
4 xul.dll mozilla::ThreadEventQueue<mozilla::PrioritizedEventQueue<mozilla::EventQueue> >::GetEvent xpcom/threads/ThreadEventQueue.cpp:168
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1172
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:530
7 xul.dll static bool mozilla::SpinEventLoopUntil<mozilla::ProcessFailureBehavior::ReportToCaller, `lambda at z:/build/build/src/netwerk/protocol/http/nsHttpConnectionMgr.cpp:256:24'> xpcom/threads/nsThreadUtils.h:347
8 xul.dll mozilla::net::nsHttpConnectionMgr::Shutdown netwerk/protocol/http/nsHttpConnectionMgr.cpp:256
9 xul.dll void mozilla::net::nsHttpHandler::ShutdownConnectionManager netwerk/protocol/http/nsHttpHandler.cpp:2743

=============================================================
> #3 top overall crash in early 64 data:

It is now #1

Quite similar are shutdownhang | ntdll.dll@0x6c55c and shutdownhang | ntdll.dll@0x90b80 - both windows 10, and #25 and #26 respectively for 64.0
I dug into the main signature a little. Although it doesn't appear to show in the correlations for all the reports, IPSEng64.dll  Version 16.2.1.22 is showing up in a few of the reports I sampled. According to the Symantec website (https://support.symantec.com/en_US/article.TECH174537.html) they show support for Firefox only up to Version 61. Probably worth us doing some outreach to see why.

But I also see other anti virus programs, such as 360 Safe Guard. So all the reports don't have the Endpoint protection noted above.
See Also: → 1515133
This is the #3 overall top crash in 64, adding keyword. Maybe someone can help me bucket this in a better component? Based on the stack moving it into XPCOM.
Component: General → XPCOM
Keywords: topcrash
Adding ni on Nathan for triage.
Flags: needinfo?(nfroyd)
It looks like the main issue here is this dll thing being in the signature. At least 43% of the crashes look like bug 1420736, and at least 36% of them look like bug 1411908. Maybe the switch to Clang on Windows caused problems with signatures? Maybe we can add ntdll.dll to the skip list?
Bug 1435962 is a better bug for the http shutdown hang and bug 1487194 is a better bug for the quota manager hang.
Will, any ideas what the right thing to do is for a generic dll like this junking up the signatures? Adding something this generic to the irrelevant signature list seems a little dicey, but maybe that's okay.
Flags: needinfo?(nfroyd) → needinfo?(willkg)
Andrew and I mid-aired, here's my comment for posterity with some links:

This is particular shutdown hang is a necko shutdown issue. We send an event to process on the STS thread [1] during HttpConnectionManager shutdown and then spin waiting for it [2] to set a bool. All the reports I've looked at have the STS thread sitting in some sort of SSL handshake [3] which prevents running the shutdown event.

We could probably use a better signature though, as this just lumps together all shutdown hangs caused by event loops waiting on that CV. A fair amount are also going through QuotaManager (ie bp-52b3fb6e-a19f-47c5-aa38-26d970181218). For stacks like this it's really frame 8 that helps us differentiate. Is there a way to split those out?

[1] https://hg.mozilla.org/releases/mozilla-release/annotate/2085f7f22a53096e032c9c0a4931d282ed3f0d43/netwerk/protocol/http/nsHttpConnectionMgr.cpp#l240
[2] https://hg.mozilla.org/releases/mozilla-release/annotate/2085f7f22a53096e032c9c0a4931d282ed3f0d43/netwerk/protocol/http/nsHttpConnectionMgr.cpp#l256
[3] https://hg.mozilla.org/releases/mozilla-release/annotate/2085f7f22a53096e032c9c0a4931d282ed3f0d43/security/manager/ssl/nsNSSIOLayer.cpp#l433
Bug 1491721 says "according to the proto signature this is likely the same issue as in bug 1487194, but we are missing symbols for some system libraries again after a windows patchday..."

Maybe this is the same thing? eg Windows updated but we haven't gotten symbols yet so our nice signature stuff isn't kicking in.
We could add ntdll.dll to the prefix list which would tell signature generation to continue to the next frame of the stack. I can fiddle with that and show some examples tomorrow.

Having said that, this is probably a "figure out which symbols we're missing, get them added, reprocess" kind of thing.
Flags: needinfo?(willkg)
Adding ntdll.dll to the prefix list seems promising to at least break up the group into better buckets:

app@73b0aa89b441:/app$ socorro-cmd signature dbfb8fc5-2517-40b6-9060-9d97b0181213
Crash id: dbfb8fc5-2517-40b6-9060-9d97b0181213
Original: shutdownhang | ntdll.dll@0x9e294
New:      shutdownhang | ntdll.dll@0x9e294 | ntdll.dll@0x25f28 | mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown
Same?:    False


I wrote up bug #1515487 to cover that.
Moving to Necko based on comment 8 and comment 11.
Component: XPCOM → Networking
there are no crashes with the signature in last 3 months.  the report in comment 0 is coming from here:

https://hg.mozilla.org/releases/mozilla-release/annotate/8337ebb86a425a1c65467fc68eb7c26b9046159e/security/nss/lib/ssl/ssl3con.c#l11118

which is engaged when SSLKEYLOGFILE is set to save the ssl key log file.

-> NSS + WFM (maybe even WONTFIX)


In general, I believe this just belongs to the group of Windows i/o hang bugs.
Assignee: nobody → nobody
Status: NEW → RESOLVED
Closed: 8 months ago
Component: Networking → Libraries
Product: Core → NSS
QA Contact: jjones
Resolution: --- → WORKSFORME
Version: Trunk → other
You need to log in before you can comment on or make changes to this bug.