Closed
Bug 1323460
Opened 8 years ago
Closed 8 years ago
Hang at RtlpLookupDynamicFunctionEntry when stackwalking in Win64 builds
Categories
(Core :: Gecko Profiler, defect)
Tracking
()
RESOLVED
FIXED
mozilla54
People
(Reporter: mconley, Assigned: jandem)
References
(Blocks 1 open bug)
Details
(Keywords: hang, reproducible)
Attachments
(1 file)
143.98 KB,
text/plain
|
Details |
+++ This bug was initially created as a clone of Bug #1263595 +++
STR:
1) In a 64-bit build of Nightly for Windows, install the Gecko Profiler Add-on
2) Enable the Gecko Profiler Add-on, so that you are profiling.
3) Proceed to surf the internet
4) Wait for your browser to eventually hang and never recover.
ER:
Should not hang like that, even when profiling.
AR:
A pretty brutal hang. See the attached list of thread stacks at the hang.
We landed patches to fix something similar to this in bug 1263595, but there's still more work to do here.
Reporter | ||
Comment 1•8 years ago
|
||
froydnj isolated the threads involved in the deadlock in bug 1263595 comment 52.
Quoting directly:
Looks like we have four threads digging around in lookup tables, the hang monitor:
17 Id: 2b8c.1b14 Suspend: 1 Teb: 000007ff`fff90000 Unfrozen
Child-SP RetAddr Call Site
00000000`0f42ab78 00000000`77308db8 ntdll!NtWaitForSingleObject+0xa
00000000`0f42ab80 00000000`77308cb4 ntdll!RtlpWaitOnCriticalSection+0xe8
00000000`0f42ac30 00000000`77332498 ntdll!RtlEnterCriticalSection+0xd1
00000000`0f42ac60 00000000`77317c9a ntdll!RtlpLookupDynamicFunctionEntry+0x58
00000000`0f42ac90 000007fe`f99d33be ntdll!RtlLookupFunctionEntry+0xa3
00000000`0f42acc0 000007fe`f99d2e7d mozglue!WalkStackMain64+0x1aa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 391]
00000000`0f42b210 000007fe`dc15a46a mozglue!MozStackWalk+0x17d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 612]
00000000`0f42f330 000007fe`dc1114cd xul!mozilla::HangMonitor::GetChromeHangReport+0xe2 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\xpcom\threads\hangmonitor.cpp @ 187]
00000000`0f42f8e0 000007fe`ef262fbe xul!mozilla::HangMonitor::ThreadMain+0x33ce65 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\xpcom\threads\hangmonitor.cpp @ 254]
00000000`0f42f990 000007fe`ef25671a nss3!_PR_NativeRunThread+0xee [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\nsprpub\pr\src\threads\combined\pruthr.c @ 419]
00000000`0f42f9c0 000007fe`f020cab0 nss3!pr_root+0xa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\nsprpub\pr\src\md\windows\w95thred.c @ 96]
00000000`0f42f9f0 00000000`770b59bd ucrtbase!o__realloc_base+0x60
00000000`0f42fa20 00000000`772ea2e1 kernel32!BaseThreadInitThunk+0xd
00000000`0f42fa50 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
and the main thread:
0 Id: 2b8c.3204 Suspend: 3 Teb: 000007ff`fffdd000 Unfrozen
Child-SP RetAddr Call Site
00000000`00382d00 00000000`77317c9a ntdll!RtlpLookupDynamicFunctionEntry+0x6c
00000000`00382d30 000007fe`f99d33be ntdll!RtlLookupFunctionEntry+0xa3
00000000`00382d60 000007fe`f99d2e7d mozglue!WalkStackMain64+0x1aa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 391]
00000000`003832b0 000007fe`dd376aef mozglue!MozStackWalk+0x17d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 612]
00000000`003873d0 000007fe`dd373862 xul!GeckoSampler::doNativeBacktrace+0x9b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1000]
00000000`0038b2c0 000007fe`dd372c5f xul!GeckoSampler::InplaceTick+0x8a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1218]
00000000`0038b360 000007fe`dc0ee001 xul!GeckoSampler::GetBacktrace+0xdb [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1305]
00000000`0038b8f0 000007fe`dbcbac5f xul!mozilla_sampler_get_backtrace+0x39a96d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform.cpp @ 1139]
00000000`0038b920 000007fe`db729828 xul!mozilla::PresShell::ScheduleReflow+0x2b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\presshell.cpp @ 9197]
00000000`0038b950 000007fe`dbb6eaba xul!mozilla::PresShell::FrameNeedsReflow+0x1bc [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\presshell.cpp @ 2750]
00000000`0038bb80 000007fe`dbff805b xul!mozilla::PresShell::Initialize+0x21e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\presshell.cpp @ 1795]
00000000`0038bc00 000007fe`dc075144 xul!nsDocumentViewer::InitPresentationStuff+0x6562bf [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\nsdocumentviewer.cpp @ 703]
00000000`0038bc70 000007fe`dbb6dfb8 xul!nsDocumentViewer::Show+0x507400 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\nsdocumentviewer.cpp @ 2109]
00000000`0038bd10 000007fe`dbb6bed8 xul!nsDocShell::SetVisibility+0x48 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\docshell\base\nsdocshell.cpp @ 6421]
00000000`0038bd40 000007fe`dbb6bcea xul!nsFrameLoader::Show+0x1bc [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\base\nsframeloader.cpp @ 1070]
00000000`0038beb0 000007fe`dbb6bc0e xul!nsSubDocumentFrame::ShowViewer+0xb2 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\generic\nssubdocumentframe.cpp @ 191]
00000000`0038bf00 000007fe`db66f7dc xul!AsyncFrameInit::Run+0x32 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\generic\nssubdocumentframe.cpp @ 93]
00000000`0038bf30 000007fe`db8e63b2 xul!nsContentUtils::RemoveScriptBlocker+0xec [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\base\nscontentutils.cpp @ 5223]
00000000`0038bf70 000007fe`db9ce76a xul!mozilla::PresShell::FlushPendingNotifications+0x1b2 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\layout\base\presshell.cpp @ 4122]
00000000`0038c090 000007fe`db8dac8f xul!nsDocument::FlushPendingNotifications+0x11a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\base\nsdocument.cpp @ 7759]
and the sampler thread:
47 Id: 2b8c.c94 Suspend: 1 Teb: 000007ff`fff3c000 Unfrozen
Child-SP RetAddr Call Site
00000000`39e46a58 00000000`77308db8 ntdll!NtWaitForSingleObject+0xa
00000000`39e46a60 00000000`77308cb4 ntdll!RtlpWaitOnCriticalSection+0xe8
00000000`39e46b10 00000000`77332498 ntdll!RtlEnterCriticalSection+0xd1
00000000`39e46b40 00000000`77317c9a ntdll!RtlpLookupDynamicFunctionEntry+0x58
00000000`39e46b70 000007fe`f99d33be ntdll!RtlLookupFunctionEntry+0xa3
00000000`39e46ba0 000007fe`f99d2e7d mozglue!WalkStackMain64+0x1aa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 391]
00000000`39e470f0 000007fe`dd376aef mozglue!MozStackWalk+0x17d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 612]
00000000`39e4b210 000007fe`dd373862 xul!GeckoSampler::doNativeBacktrace+0x9b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1000]
00000000`39e4f100 000007fe`dd36f68d xul!GeckoSampler::InplaceTick+0x8a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1218]
00000000`39e4f1a0 000007fe`dd36f49e xul!SamplerThread::SampleContext+0x181 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 250]
00000000`39e4f760 000007fe`dbc473a2 xul!SamplerThread::Run+0xda [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 152]
00000000`39e4f7c0 000007fe`f020cab0 xul!`anonymous namespace'::ThreadFunc+0xa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\ipc\chromium\src\base\platform_thread_win.cc @ 29]
00000000`39e4f7f0 00000000`770b59bd ucrtbase!o__realloc_base+0x60
00000000`39e4f820 00000000`772ea2e1 kernel32!BaseThreadInitThunk+0xd
00000000`39e4f850 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
and a JS thread:
55 Id: 2b8c.1d00 Suspend: 1 Teb: 000007ff`fff58000 Unfrozen
Child-SP RetAddr Call Site
00000000`3281eae8 00000000`77308db8 ntdll!NtWaitForSingleObject+0xa
00000000`3281eaf0 00000000`77308cb4 ntdll!RtlpWaitOnCriticalSection+0xe8
00000000`3281eba0 00000000`773852f0 ntdll!RtlEnterCriticalSection+0xd1
00000000`3281ebd0 000007fe`dbd42a39 ntdll!RtlDeleteFunctionTable+0x20
00000000`3281ec00 000007fe`db891125 xul!js::jit::DeallocateExecutableMemory+0x25 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jit\executableallocatorwin.cpp @ 243]
00000000`3281ec30 000007fe`db89106e xul!js::jit::ExecutableAllocator::releasePoolPages+0x35 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jit\executableallocator.cpp @ 286]
00000000`3281ec80 000007fe`db88e427 xul!js::jit::ExecutableAllocator::poisonCode+0x10e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jit\executableallocator.cpp @ 407]
00000000`3281ecd0 000007fe`db861a37 xul!js::FreeOp::~FreeOp+0x5b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\vm\runtime.cpp @ 713]
00000000`3281ed00 000007fe`db861112 xul!js::gc::GCRuntime::sweepPhase+0x35f [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 5398]
00000000`3281ee50 000007fe`db9466f7 xul!js::gc::GCRuntime::incrementalCollectSlice+0x1da [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 5907]
00000000`3281efc0 000007fe`db9463c0 xul!js::gc::GCRuntime::gcCycle+0x1ab [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 6193]
00000000`3281f0d0 000007fe`db9f4841 xul!js::gc::GCRuntime::collect+0xc0 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 6339]
00000000`3281f1b0 000007fe`db9f473d xul!js::gc::GCRuntime::gc+0x41 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\js\src\jsgc.cpp @ 6399]
00000000`3281f200 000007fe`db9f4697 xul!mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal+0x9d [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerprivate.cpp @ 6269]
00000000`3281f230 000007fe`db9493ad xul!`anonymous namespace'::GarbageCollectRunnable::WorkerRun+0x17 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerprivate.cpp @ 1395]
00000000`3281f260 000007fe`db8a5e2d xul!mozilla::dom::workers::WorkerRunnable::Run+0x129 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerrunnable.cpp @ 376]
00000000`3281f410 000007fe`db8a5a3b xul!mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnablesLocked+0x51 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerprivate.cpp @ 5055]
00000000`3281f450 000007fe`dbc495e7 xul!mozilla::dom::workers::WorkerPrivate::DoRunLoop+0x123 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\workerprivate.cpp @ 4577]
00000000`3281f580 000007fe`db68c3a3 xul!`anonymous namespace'::WorkerThreadPrimaryRunnable::Run+0x16f [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\workers\runtimeservice.cpp @ 2876]
00000000`3281f8b0 000007fe`db8a6152 xul!nsThread::ProcessNextEvent+0x26b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\xpcom\threads\nsthread.cpp @ 1213]
(LazyIdleThread is dealing with locks, but I don't think that particular one concerns us?)
So these conditions are kind of like what the patch was originally supposed to solve...except that it's clearly not...
Reporter | ||
Comment 2•8 years ago
|
||
Hey Romain, you're listed as Product Management for the team that's driving us towards 64-bit Windows builds by default at https://wiki.mozilla.org/Firefox/Win64#Team. Is it possible to get somebody to look at this? This is going to bite as more and more, especially as we try to do more performance work on 64-bit Windows builds where we need the profiler.
Flags: needinfo?(rtestard)
Comment hidden (obsolete) |
Comment 4•8 years ago
|
||
Naveed, can you recommend someone from the SpiderMonkey team that could investigate this Gecko Profiler hang on 64-bit Firefox?
64-bit Firefox is the default for new installs starting in Firefox 53, so we'll be seeing more people running 64-bit.
Flags: needinfo?(efaustbmo) → needinfo?(nihsanullah)
Comment 5•8 years ago
|
||
(In reply to Mike Conley (:mconley) from comment #2)
> Hey Romain, you're listed as Product Management for the team that's driving
> us towards 64-bit Windows builds by default at
> https://wiki.mozilla.org/Firefox/Win64#Team. Is it possible to get somebody
> to look at this? This is going to bite as more and more, especially as we
> try to do more performance work on 64-bit Windows builds where we need the
> profiler.
Now back from PTO, Chris beats me to that! I agree this needs attention.
Flags: needinfo?(rtestard)
Comment 6•8 years ago
|
||
Hannes please take a look.
Assignee: nobody → hv1989
Flags: needinfo?(nihsanullah) → needinfo?(hv1989)
Comment 7•8 years ago
|
||
Mike, can you still reproduce this 64-bit hang with the Gecko Profiler? Neither Hannes nor I can reproduce with 64-bit Nightly 54.
status-firefox54:
--- → ?
Flags: needinfo?(hv1989) → needinfo?(mconley)
Assignee | ||
Comment 8•8 years ago
|
||
I just pushed a patch for another bug that changes the JIT memory allocation. A very nice side effect from that is that there's now only a single RtlAddFunctionTable call per process (when we initialize the JS engine) and a single call to RtlDeleteFunctionTable when the process is destroyed.
Assignee | ||
Comment 9•8 years ago
|
||
We can probably also remove the stackwalk lock now (back out bug 1263595), but it looks like we call profiler_init before we call JS_InitWithFailureDiagnostic, so in theory there could still be a deadlock at process startup. The hang monitor is initialized after we call JS_InitWithFailureDiagnostic, so if we do the same for profiler_init (and reverse order on shutdown), we should be safe.
Anyway, if my patch sticks, the JS engine no longer touches Win64 function tables between XPCOM init and shutdown, so the situation in comment 1 is no longer possible.
Depends on: CVE-2017-5400
Comment 10•8 years ago
|
||
Thanks, Jan. Bug 1334933 is RESOLVED FIXED, so I assume your fix has stuck so far. I'll file a new bug to back out stackwalk lock bug 1263595, as you suggested.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mconley)
Resolution: --- → FIXED
Comment 11•8 years ago
|
||
Jan, will your fix for bug 1334933 be uplifted to Aurora 53 or Beta 52?
I filed bug 1336877 for someone to back out the stackwalk lock code from bug 1263595.
status-firefox-esr52:
--- → verified
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 12•8 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #11)
> Jan, will your fix for bug 1334933 be uplifted to Aurora 53 or Beta 52?
Yes, very likely to both Aurora and Beta.
Flags: needinfo?(jdemooij)
Updated•8 years ago
|
Updated•8 years ago
|
Blocks: win64-rollout
You need to log in
before you can comment on or make changes to this bug.
Description
•