Hang when initting NSS

RESOLVED FIXED in Firefox 55

Status

()

Core
DOM: Content Processes
RESOLVED FIXED
2 months ago
11 days ago

People

(Reporter: mconley, Assigned: dmajor)

Tracking

(Blocks: 4 bugs)

unspecified
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox53 unaffected, firefox54 unaffected, firefox55 fixed)

Details

(Whiteboard: [e10s-multi:+])

Attachments

(2 attachments)

(Reporter)

Description

2 months ago
Created attachment 8855868 [details]
WinDbg stacks

I've noticed when opening a new tab with e10s multi, the tab will sometimes just not respond, like the content process didn't start up correctly.

I attached WinDbg and got the attached stacks from the content process during the hang.

If I'm understanding these stacks correctly, it looks like we're trying to init NSS... and then we try to load a DLL, and I guess we're deadlocked or something?

Note that the rest of my tabs seem to behave just fine when I'm in this state.
(Reporter)

Comment 1

2 months ago
Hey dmajor, sorry to keep coming to you with problems like this, but does anything obvious jump out from the stacks I captured?
Flags: needinfo?(dmajor)
(Assignee)

Comment 2

2 months ago
No worries mconley! Your ntdll symbols are a bit off but I think this will be fixed by bug 1349444 comment 8.


*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Windows\SYSTEM32\ntdll.dll - 
0  Id: 434c.406c Suspend: 2 Teb: 0000007b`3fb6e000 Unfrozen
 # Child-SP          RetAddr           Call Site
00 0000007b`3fdfafd0 00007fff`977fe6b7 ntdll!RtlReleasePath+0x105
01 0000007b`3fdfb000 00007fff`977fe046 ntdll!RtlIsProcessorFeaturePresent+0xb7
02 0000007b`3fdfb040 00007fff`977fd949 ntdll!RtlIsCriticalSectionLockedByThread+0x356
03 0000007b`3fdfb0a0 00007fff`977fc84c ntdll!RtlAppendUnicodeStringToString+0x1f9
04 0000007b`3fdfb0e0 00007fff`977fa7cb ntdll!RtlIdnToUnicode+0xf3c
05 0000007b`3fdfb1b0 00007fff`977fa559 ntdll!LdrGetDllHandleEx+0x81b
06 0000007b`3fdfb330 00007fff`9781703e ntdll!LdrGetDllHandleEx+0x5a9
07 0000007b`3fdfb390 00007fff`97816add ntdll!RtlFormatCurrentUserKeyPath+0x75e
08 0000007b`3fdfb410 00007fff`977f9efc ntdll!RtlFormatCurrentUserKeyPath+0x1fd
> 09 0000007b`3fdfb5b0 00007fff`8e8ebd47 ntdll!LdrLoadDll+0x8c
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Windows\System32\KERNELBASE.dll - 
0a 0000007b`3fdfb6b0 00007fff`93e5cd7f mozglue!`anonymous namespace'::patched_LdrLoadDll+0x197 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\build\windowsdllblocklist.cpp @ 706]
0b 0000007b`3fdfb810 00007fff`88205092 KERNELBASE!LoadLibraryExW+0x16f
0c 0000007b`3fdfb880 00007fff`8aa34c85 nss3!pr_LoadLibraryByPathname+0x192 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\nsprpub\pr\src\linking\prlink.c @ 724]
0d 0000007b`3fdfbc40 00007fff`8aa34ce9 softokn3!loader_LoadLibInReferenceDir+0xb5 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\security\nss\lib\freebl\genload.c @ 106]


  48  Id: 434c.3ef8 Suspend: 1 Teb: 0000007b`3fbda000 Unfrozen
 # Child-SP          RetAddr           Call Site
00 0000007b`46816fb8 00007fff`9783b26f ntdll!ZwWaitForAlertByThreadId+0x14
01 0000007b`46816fc0 00007fff`9783ac53 ntdll!RtlLookupFunctionEntry+0x6ef
02 0000007b`46817080 00007fff`8e8ec33e ntdll!RtlLookupFunctionEntry+0xd3
03 0000007b`468170d0 00007fff`8e8ec233 mozglue!WalkStackMain64+0x96 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 406]
04 0000007b`46817620 00007fff`64c6a14d mozglue!MozStackWalk+0x15b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 627]
05 0000007b`4681b740 00007fff`64c6e946 xul!DoNativeBacktrace+0x95 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform.cpp @ 822]
06 0000007b`4681f630 00007fff`64c6e6b0 xul!Tick+0x9a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform.cpp @ 1062]
> 07 0000007b`4681f6c0 00007fff`64c6c321 xul!SamplerThread::SuspendAndSampleAndResumeThread+0xac [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cpp @ 222]
08 0000007b`4681fbd0 00007fff`64c6e8a5 xul!SamplerThread::Run+0x121 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform.cpp @ 1665]
09 0000007b`4681fd60 00007fff`94b0cab0 xul!ThreadEntry+0x9 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cpp @ 98]
0a 0000007b`4681fd90 00007fff`96088364 ucrtbase!o__realloc_base+0x60
0b 0000007b`4681fdc0 00007fff`978570d1 KERNEL32!BaseThreadInitThunk+0x14
0c 0000007b`4681fdf0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Flags: needinfo?(dmajor)
How frequently do you see this Mike? Is this a full deadlock or "just" a long hang?
Flags: needinfo?(mconley)
Whiteboard: [e10s-multi:?]

Updated

2 months ago
Blocks: 1304547
Whiteboard: [e10s-multi:?] → [e10s-multi:+]
(Reporter)

Comment 4

a month ago
(In reply to Gabor Krizsanits [:krizsa :gabor] (PTO: 19-24) from comment #3)
> How frequently do you see this Mike? Is this a full deadlock or "just" a
> long hang?

I haven't seen this in days and days now. It sounds like this might be a dupe of bug 1349444 though.
Status: NEW → RESOLVED
Last Resolved: a month ago
Flags: needinfo?(mconley)
Resolution: --- → DUPLICATE
Duplicate of bug: 1349444
(Assignee)

Comment 5

a month ago
So, I've been working on fixes for both LdrLoadDll and LdrUnloadDll in bug 1349444, but I've been re-working that patch for too long. I'm going to split the two APIs into separate bugs, using this bug for LdrLoadDll, in order to land that fix quickly to unblock people, because the LdrLoadDll hang is more common, and the fix is much more straightforward.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(Assignee)

Comment 6

a month ago
Created attachment 8860137 [details] [diff] [review]
Acquire the stack walk workaround lock in LdrLoadDll

Keeping aklotz review from bug 1349444. Already pushed to inbound but pulsebot is asleep.
Assignee: nobody → dmajor
Attachment #8860137 - Flags: review+
(Assignee)

Comment 7

a month ago
overholt, this ought to fix the hangs you were seeing.

Comment 8

a month ago
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d04a44c98bd9
Acquire the stack walk lock in Win64's LdrLoadDll. r=aklotz
dmajor, should we uplift this fix to Beta 54? Or does this hang only affect e10s-multi and/or the Gecko Profiler? We plan to run a Funnelcake experiment in Firefox 54 to test Win64 as the default install.
status-firefox53: --- → wontfix
status-firefox54: --- → ?
status-firefox55: --- → affected
Flags: needinfo?(dmajor)
Blocks: 1340936
(Assignee)

Comment 10

a month ago
This would only come up during stack-walking, for example with the Gecko profiler or the dev tools. Possibly in the future with the BHR stuff but for the time being that's disabled on x64.

According to DXR, AcquireStackWalkWorkaroundLock() doesn't exist on m-b anyway, so I'm guessing that the original bug in this series of hangs wasn't considered to be worth uplifting.
Flags: needinfo?(dmajor)
Thanks.
status-firefox53: wontfix → unaffected
status-firefox54: ? → unaffected
https://hg.mozilla.org/mozilla-central/rev/d04a44c98bd9
Status: REOPENED → RESOLVED
Last Resolved: a month agoa month ago
status-firefox55: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(Assignee)

Updated

a month ago
Depends on: 1359507
(Assignee)

Updated

11 days ago
Blocks: 1366030
You need to log in before you can comment on or make changes to this bug.