Closed Bug 1354611 Opened 3 years ago Closed 3 years ago

Hang when initting NSS

Categories

(Core :: DOM: Content Processes, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed

People

(Reporter: mconley, Assigned: dmajor)

References

(Blocks 1 open bug)

Details

(Whiteboard: [e10s-multi:+])

Attachments

(2 files)

Attached file WinDbg stacks
I've noticed when opening a new tab with e10s multi, the tab will sometimes just not respond, like the content process didn't start up correctly.

I attached WinDbg and got the attached stacks from the content process during the hang.

If I'm understanding these stacks correctly, it looks like we're trying to init NSS... and then we try to load a DLL, and I guess we're deadlocked or something?

Note that the rest of my tabs seem to behave just fine when I'm in this state.
Hey dmajor, sorry to keep coming to you with problems like this, but does anything obvious jump out from the stacks I captured?
Flags: needinfo?(dmajor)
No worries mconley! Your ntdll symbols are a bit off but I think this will be fixed by bug 1349444 comment 8.


*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Windows\SYSTEM32\ntdll.dll - 
0  Id: 434c.406c Suspend: 2 Teb: 0000007b`3fb6e000 Unfrozen
 # Child-SP          RetAddr           Call Site
00 0000007b`3fdfafd0 00007fff`977fe6b7 ntdll!RtlReleasePath+0x105
01 0000007b`3fdfb000 00007fff`977fe046 ntdll!RtlIsProcessorFeaturePresent+0xb7
02 0000007b`3fdfb040 00007fff`977fd949 ntdll!RtlIsCriticalSectionLockedByThread+0x356
03 0000007b`3fdfb0a0 00007fff`977fc84c ntdll!RtlAppendUnicodeStringToString+0x1f9
04 0000007b`3fdfb0e0 00007fff`977fa7cb ntdll!RtlIdnToUnicode+0xf3c
05 0000007b`3fdfb1b0 00007fff`977fa559 ntdll!LdrGetDllHandleEx+0x81b
06 0000007b`3fdfb330 00007fff`9781703e ntdll!LdrGetDllHandleEx+0x5a9
07 0000007b`3fdfb390 00007fff`97816add ntdll!RtlFormatCurrentUserKeyPath+0x75e
08 0000007b`3fdfb410 00007fff`977f9efc ntdll!RtlFormatCurrentUserKeyPath+0x1fd
> 09 0000007b`3fdfb5b0 00007fff`8e8ebd47 ntdll!LdrLoadDll+0x8c
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Windows\System32\KERNELBASE.dll - 
0a 0000007b`3fdfb6b0 00007fff`93e5cd7f mozglue!`anonymous namespace'::patched_LdrLoadDll+0x197 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\build\windowsdllblocklist.cpp @ 706]
0b 0000007b`3fdfb810 00007fff`88205092 KERNELBASE!LoadLibraryExW+0x16f
0c 0000007b`3fdfb880 00007fff`8aa34c85 nss3!pr_LoadLibraryByPathname+0x192 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\nsprpub\pr\src\linking\prlink.c @ 724]
0d 0000007b`3fdfbc40 00007fff`8aa34ce9 softokn3!loader_LoadLibInReferenceDir+0xb5 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\security\nss\lib\freebl\genload.c @ 106]


  48  Id: 434c.3ef8 Suspend: 1 Teb: 0000007b`3fbda000 Unfrozen
 # Child-SP          RetAddr           Call Site
00 0000007b`46816fb8 00007fff`9783b26f ntdll!ZwWaitForAlertByThreadId+0x14
01 0000007b`46816fc0 00007fff`9783ac53 ntdll!RtlLookupFunctionEntry+0x6ef
02 0000007b`46817080 00007fff`8e8ec33e ntdll!RtlLookupFunctionEntry+0xd3
03 0000007b`468170d0 00007fff`8e8ec233 mozglue!WalkStackMain64+0x96 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 406]
04 0000007b`46817620 00007fff`64c6a14d mozglue!MozStackWalk+0x15b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 627]
05 0000007b`4681b740 00007fff`64c6e946 xul!DoNativeBacktrace+0x95 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform.cpp @ 822]
06 0000007b`4681f630 00007fff`64c6e6b0 xul!Tick+0x9a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform.cpp @ 1062]
> 07 0000007b`4681f6c0 00007fff`64c6c321 xul!SamplerThread::SuspendAndSampleAndResumeThread+0xac [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cpp @ 222]
08 0000007b`4681fbd0 00007fff`64c6e8a5 xul!SamplerThread::Run+0x121 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform.cpp @ 1665]
09 0000007b`4681fd60 00007fff`94b0cab0 xul!ThreadEntry+0x9 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cpp @ 98]
0a 0000007b`4681fd90 00007fff`96088364 ucrtbase!o__realloc_base+0x60
0b 0000007b`4681fdc0 00007fff`978570d1 KERNEL32!BaseThreadInitThunk+0x14
0c 0000007b`4681fdf0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Flags: needinfo?(dmajor)
How frequently do you see this Mike? Is this a full deadlock or "just" a long hang?
Flags: needinfo?(mconley)
Whiteboard: [e10s-multi:?]
Whiteboard: [e10s-multi:?] → [e10s-multi:+]
(In reply to Gabor Krizsanits [:krizsa :gabor] (PTO: 19-24) from comment #3)
> How frequently do you see this Mike? Is this a full deadlock or "just" a
> long hang?

I haven't seen this in days and days now. It sounds like this might be a dupe of bug 1349444 though.
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(mconley)
Resolution: --- → DUPLICATE
Duplicate of bug: 1349444
So, I've been working on fixes for both LdrLoadDll and LdrUnloadDll in bug 1349444, but I've been re-working that patch for too long. I'm going to split the two APIs into separate bugs, using this bug for LdrLoadDll, in order to land that fix quickly to unblock people, because the LdrLoadDll hang is more common, and the fix is much more straightforward.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Keeping aklotz review from bug 1349444. Already pushed to inbound but pulsebot is asleep.
Assignee: nobody → dmajor
Attachment #8860137 - Flags: review+
overholt, this ought to fix the hangs you were seeing.
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d04a44c98bd9
Acquire the stack walk lock in Win64's LdrLoadDll. r=aklotz
dmajor, should we uplift this fix to Beta 54? Or does this hang only affect e10s-multi and/or the Gecko Profiler? We plan to run a Funnelcake experiment in Firefox 54 to test Win64 as the default install.
Flags: needinfo?(dmajor)
This would only come up during stack-walking, for example with the Gecko profiler or the dev tools. Possibly in the future with the BHR stuff but for the time being that's disabled on x64.

According to DXR, AcquireStackWalkWorkaroundLock() doesn't exist on m-b anyway, so I'm guessing that the original bug in this series of hangs wasn't considered to be worth uplifting.
Flags: needinfo?(dmajor)
https://hg.mozilla.org/mozilla-central/rev/d04a44c98bd9
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Depends on: 1359507
Blocks: 1366030
You need to log in before you can comment on or make changes to this bug.