Closed Bug 1323460 Opened 8 years ago Closed 8 years ago

Hang at RtlpLookupDynamicFunctionEntry when stackwalking in Win64 builds

Categories

(Core :: Gecko Profiler, defect)

x86_64
Windows 10
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox-esr45 --- fixed
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- fixed
firefox-esr52 --- affected
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: mconley, Assigned: jandem)

References

(Blocks 1 open bug)

Details

(Keywords: hang, reproducible)

Attachments

(1 file)

Attached file WinDbg output
+++ This bug was initially created as a clone of Bug #1263595 +++ STR: 1) In a 64-bit build of Nightly for Windows, install the Gecko Profiler Add-on 2) Enable the Gecko Profiler Add-on, so that you are profiling. 3) Proceed to surf the internet 4) Wait for your browser to eventually hang and never recover. ER: Should not hang like that, even when profiling. AR: A pretty brutal hang. See the attached list of thread stacks at the hang. We landed patches to fix something similar to this in bug 1263595, but there's still more work to do here.
froydnj isolated the threads involved in the deadlock in bug 1263595 comment 52. Quoting directly: Looks like we have four threads digging around in lookup tables, the hang monitor: 17 Id: 2b8c.1b14 Suspend: 1 Teb: 000007ff`fff90000 Unfrozen Child-SP RetAddr Call Site 00000000`0f42ab78 00000000`77308db8 ntdll!NtWaitForSingleObject+0xa 00000000`0f42ab80 00000000`77308cb4 ntdll!RtlpWaitOnCriticalSection+0xe8 00000000`0f42ac30 00000000`77332498 ntdll!RtlEnterCriticalSection+0xd1 00000000`0f42ac60 00000000`77317c9a ntdll!RtlpLookupDynamicFunctionEntry+0x58 00000000`0f42ac90 000007fe`f99d33be ntdll!RtlLookupFunctionEntry+0xa3 00000000`0f42acc0 000007fe`f99d2e7d mozglue!WalkStackMain64+0x1aa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 391] 00000000`0f42b210 000007fe`dc15a46a mozglue!MozStackWalk+0x17d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 612] 00000000`0f42f330 000007fe`dc1114cd xul!mozilla::HangMonitor::GetChromeHangReport+0xe2 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\xpcom\threads\hangmonitor.cpp @ 187] 00000000`0f42f8e0 000007fe`ef262fbe xul!mozilla::HangMonitor::ThreadMain+0x33ce65 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\xpcom\threads\hangmonitor.cpp @ 254] 00000000`0f42f990 000007fe`ef25671a nss3!_PR_NativeRunThread+0xee [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\nsprpub\pr\src\threads\combined\pruthr.c @ 419] 00000000`0f42f9c0 000007fe`f020cab0 nss3!pr_root+0xa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\nsprpub\pr\src\md\windows\w95thred.c @ 96] 00000000`0f42f9f0 00000000`770b59bd ucrtbase!o__realloc_base+0x60 00000000`0f42fa20 00000000`772ea2e1 kernel32!BaseThreadInitThunk+0xd 00000000`0f42fa50 00000000`00000000 ntdll!RtlUserThreadStart+0x1d and the main thread: 0 Id: 2b8c.3204 Suspend: 3 Teb: 000007ff`fffdd000 Unfrozen Child-SP RetAddr Call Site 00000000`00382d00 00000000`77317c9a ntdll!RtlpLookupDynamicFunctionEntry+0x6c 00000000`00382d30 000007fe`f99d33be ntdll!RtlLookupFunctionEntry+0xa3 00000000`00382d60 000007fe`f99d2e7d mozglue!WalkStackMain64+0x1aa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 391] 00000000`003832b0 000007fe`dd376aef mozglue!MozStackWalk+0x17d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 612] 00000000`003873d0 000007fe`dd373862 xul!GeckoSampler::doNativeBacktrace+0x9b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1000] 00000000`0038b2c0 000007fe`dd372c5f xul!GeckoSampler::InplaceTick+0x8a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1218] 00000000`0038b360 000007fe`dc0ee001 xul!GeckoSampler::GetBacktrace+0xdb [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1305] 00000000`0038b8f0 000007fe`dbcbac5f xul!mozilla_sampler_get_backtrace+0x39a96d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform.cpp @ 1139] 00000000`0038b920 000007fe`db729828 xul!mozilla::PresShell::ScheduleReflow+0x2b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\presshell.cpp @ 9197] 00000000`0038b950 000007fe`dbb6eaba xul!mozilla::PresShell::FrameNeedsReflow+0x1bc [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\presshell.cpp @ 2750] 00000000`0038bb80 000007fe`dbff805b xul!mozilla::PresShell::Initialize+0x21e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\presshell.cpp @ 1795] 00000000`0038bc00 000007fe`dc075144 xul!nsDocumentViewer::InitPresentationStuff+0x6562bf [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\nsdocumentviewer.cpp @ 703] 00000000`0038bc70 000007fe`dbb6dfb8 xul!nsDocumentViewer::Show+0x507400 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\nsdocumentviewer.cpp @ 2109] 00000000`0038bd10 000007fe`dbb6bed8 xul!nsDocShell::SetVisibility+0x48 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\docshell\base\nsdocshell.cpp @ 6421] 00000000`0038bd40 000007fe`dbb6bcea xul!nsFrameLoader::Show+0x1bc [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\base\nsframeloader.cpp @ 1070] 00000000`0038beb0 000007fe`dbb6bc0e xul!nsSubDocumentFrame::ShowViewer+0xb2 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\generic\nssubdocumentframe.cpp @ 191] 00000000`0038bf00 000007fe`db66f7dc xul!AsyncFrameInit::Run+0x32 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\generic\nssubdocumentframe.cpp @ 93] 00000000`0038bf30 000007fe`db8e63b2 xul!nsContentUtils::RemoveScriptBlocker+0xec [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\base\nscontentutils.cpp @ 5223] 00000000`0038bf70 000007fe`db9ce76a xul!mozilla::PresShell::FlushPendingNotifications+0x1b2 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\presshell.cpp @ 4122] 00000000`0038c090 000007fe`db8dac8f xul!nsDocument::FlushPendingNotifications+0x11a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\base\nsdocument.cpp @ 7759] and the sampler thread: 47 Id: 2b8c.c94 Suspend: 1 Teb: 000007ff`fff3c000 Unfrozen Child-SP RetAddr Call Site 00000000`39e46a58 00000000`77308db8 ntdll!NtWaitForSingleObject+0xa 00000000`39e46a60 00000000`77308cb4 ntdll!RtlpWaitOnCriticalSection+0xe8 00000000`39e46b10 00000000`77332498 ntdll!RtlEnterCriticalSection+0xd1 00000000`39e46b40 00000000`77317c9a ntdll!RtlpLookupDynamicFunctionEntry+0x58 00000000`39e46b70 000007fe`f99d33be ntdll!RtlLookupFunctionEntry+0xa3 00000000`39e46ba0 000007fe`f99d2e7d mozglue!WalkStackMain64+0x1aa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 391] 00000000`39e470f0 000007fe`dd376aef mozglue!MozStackWalk+0x17d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 612] 00000000`39e4b210 000007fe`dd373862 xul!GeckoSampler::doNativeBacktrace+0x9b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1000] 00000000`39e4f100 000007fe`dd36f68d xul!GeckoSampler::InplaceTick+0x8a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1218] 00000000`39e4f1a0 000007fe`dd36f49e xul!SamplerThread::SampleContext+0x181 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 250] 00000000`39e4f760 000007fe`dbc473a2 xul!SamplerThread::Run+0xda [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 152] 00000000`39e4f7c0 000007fe`f020cab0 xul!`anonymous namespace'::ThreadFunc+0xa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\ipc\chromium\src\base\platform_thread_win.cc @ 29] 00000000`39e4f7f0 00000000`770b59bd ucrtbase!o__realloc_base+0x60 00000000`39e4f820 00000000`772ea2e1 kernel32!BaseThreadInitThunk+0xd 00000000`39e4f850 00000000`00000000 ntdll!RtlUserThreadStart+0x1d and a JS thread: 55 Id: 2b8c.1d00 Suspend: 1 Teb: 000007ff`fff58000 Unfrozen Child-SP RetAddr Call Site 00000000`3281eae8 00000000`77308db8 ntdll!NtWaitForSingleObject+0xa 00000000`3281eaf0 00000000`77308cb4 ntdll!RtlpWaitOnCriticalSection+0xe8 00000000`3281eba0 00000000`773852f0 ntdll!RtlEnterCriticalSection+0xd1 00000000`3281ebd0 000007fe`dbd42a39 ntdll!RtlDeleteFunctionTable+0x20 00000000`3281ec00 000007fe`db891125 xul!js::jit::DeallocateExecutableMemory+0x25 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jit\executableallocatorwin.cpp @ 243] 00000000`3281ec30 000007fe`db89106e xul!js::jit::ExecutableAllocator::releasePoolPages+0x35 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jit\executableallocator.cpp @ 286] 00000000`3281ec80 000007fe`db88e427 xul!js::jit::ExecutableAllocator::poisonCode+0x10e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jit\executableallocator.cpp @ 407] 00000000`3281ecd0 000007fe`db861a37 xul!js::FreeOp::~FreeOp+0x5b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\vm\runtime.cpp @ 713] 00000000`3281ed00 000007fe`db861112 xul!js::gc::GCRuntime::sweepPhase+0x35f [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 5398] 00000000`3281ee50 000007fe`db9466f7 xul!js::gc::GCRuntime::incrementalCollectSlice+0x1da [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 5907] 00000000`3281efc0 000007fe`db9463c0 xul!js::gc::GCRuntime::gcCycle+0x1ab [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 6193] 00000000`3281f0d0 000007fe`db9f4841 xul!js::gc::GCRuntime::collect+0xc0 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 6339] 00000000`3281f1b0 000007fe`db9f473d xul!js::gc::GCRuntime::gc+0x41 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 6399] 00000000`3281f200 000007fe`db9f4697 xul!mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal+0x9d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerprivate.cpp @ 6269] 00000000`3281f230 000007fe`db9493ad xul!`anonymous namespace'::GarbageCollectRunnable::WorkerRun+0x17 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerprivate.cpp @ 1395] 00000000`3281f260 000007fe`db8a5e2d xul!mozilla::dom::workers::WorkerRunnable::Run+0x129 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerrunnable.cpp @ 376] 00000000`3281f410 000007fe`db8a5a3b xul!mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnablesLocked+0x51 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerprivate.cpp @ 5055] 00000000`3281f450 000007fe`dbc495e7 xul!mozilla::dom::workers::WorkerPrivate::DoRunLoop+0x123 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerprivate.cpp @ 4577] 00000000`3281f580 000007fe`db68c3a3 xul!`anonymous namespace'::WorkerThreadPrimaryRunnable::Run+0x16f [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\runtimeservice.cpp @ 2876] 00000000`3281f8b0 000007fe`db8a6152 xul!nsThread::ProcessNextEvent+0x26b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\xpcom\threads\nsthread.cpp @ 1213] (LazyIdleThread is dealing with locks, but I don't think that particular one concerns us?) So these conditions are kind of like what the patch was originally supposed to solve...except that it's clearly not...
Hey Romain, you're listed as Product Management for the team that's driving us towards 64-bit Windows builds by default at https://wiki.mozilla.org/Firefox/Win64#Team. Is it possible to get somebody to look at this? This is going to bite as more and more, especially as we try to do more performance work on 64-bit Windows builds where we need the profiler.
Flags: needinfo?(rtestard)
Naveed, can you recommend someone from the SpiderMonkey team that could investigate this Gecko Profiler hang on 64-bit Firefox? 64-bit Firefox is the default for new installs starting in Firefox 53, so we'll be seeing more people running 64-bit.
Flags: needinfo?(efaustbmo) → needinfo?(nihsanullah)
(In reply to Mike Conley (:mconley) from comment #2) > Hey Romain, you're listed as Product Management for the team that's driving > us towards 64-bit Windows builds by default at > https://wiki.mozilla.org/Firefox/Win64#Team. Is it possible to get somebody > to look at this? This is going to bite as more and more, especially as we > try to do more performance work on 64-bit Windows builds where we need the > profiler. Now back from PTO, Chris beats me to that! I agree this needs attention.
Flags: needinfo?(rtestard)
Blocks: 1329181
Hannes please take a look.
Assignee: nobody → hv1989
Flags: needinfo?(nihsanullah) → needinfo?(hv1989)
Mike, can you still reproduce this 64-bit hang with the Gecko Profiler? Neither Hannes nor I can reproduce with 64-bit Nightly 54.
Flags: needinfo?(hv1989) → needinfo?(mconley)
I just pushed a patch for another bug that changes the JIT memory allocation. A very nice side effect from that is that there's now only a single RtlAddFunctionTable call per process (when we initialize the JS engine) and a single call to RtlDeleteFunctionTable when the process is destroyed.
We can probably also remove the stackwalk lock now (back out bug 1263595), but it looks like we call profiler_init before we call JS_InitWithFailureDiagnostic, so in theory there could still be a deadlock at process startup. The hang monitor is initialized after we call JS_InitWithFailureDiagnostic, so if we do the same for profiler_init (and reverse order on shutdown), we should be safe. Anyway, if my patch sticks, the JS engine no longer touches Win64 function tables between XPCOM init and shutdown, so the situation in comment 1 is no longer possible.
Depends on: CVE-2017-5400
Thanks, Jan. Bug 1334933 is RESOLVED FIXED, so I assume your fix has stuck so far. I'll file a new bug to back out stackwalk lock bug 1263595, as you suggested.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mconley)
Resolution: --- → FIXED
Jan, will your fix for bug 1334933 be uplifted to Aurora 53 or Beta 52? I filed bug 1336877 for someone to back out the stackwalk lock code from bug 1263595.
Flags: needinfo?(jdemooij)
(In reply to Chris Peterson [:cpeterson] from comment #11) > Jan, will your fix for bug 1334933 be uplifted to Aurora 53 or Beta 52? Yes, very likely to both Aurora and Beta.
Flags: needinfo?(jdemooij)
Blocks: 1337499
Assignee: hv1989 → jdemooij
Target Milestone: --- → mozilla54
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: