Closed
Bug 1241684
Opened 8 years ago
Closed 7 years ago
Reduce timeout in Windows stackwalk initialization code to speed up DMD start-up
Categories
(Core :: mozglue, defect)
Core
mozglue
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(2 files, 1 obsolete file)
1.39 KB,
patch
|
ehsan.akhgari
:
feedback-
|
Details | Diff | Splinter Review |
10.19 KB,
patch
|
Details | Diff | Splinter Review |
DMD is really slow to start up on Windows. Emanuel identified why more than a year ago (https://blog.mozilla.org/nnethercote/2014/12/11/cumulative-heap-profiling-in-firefox-with-dmd/comment-page-1/#comment-44068) and I just got around to confirming it. In EnsureWalkThreadReady(), we have this code: > // The thread was created. Try to wait an arbitrary amount of time (1 second > // should be enough) for its event loop to start before posting events to it. > DWORD waitRet = ::WaitForSingleObject(readyEvent, 1000); > if (waitRet == WAIT_TIMEOUT) { > // We get a timeout if we're called during static initialization because > // the thread will only start executing after we return so it couldn't > // have signalled the event. If that is the case, give up for now and > // try again next time we're called. > return false; > } I think this code was written with the assumption that stack-walking is a relatively rare operation, but DMD does stack-walks a *lot*. AFAICT, this means that the "timeout occurs during static initialization" case gets hit a lot, and every time it does there's a 1 second penalty. I tried reducing the timeout and it made a huge difference to how long it takes for the first tab to render. Especially when I tell DMD to record every allocation instead of using sampling. The following table has the numbers. > timeout time (sampling) time (no sampling) > ------- --------------- ------------------ > 1000ms 50s ? (too slow) > 100ms 9s 180s > 10ms 5s 39s > 0ms 4s 18s Running without sampling, which is something that is often useful, is basically impossible with the current timeout length. So I'd like to reduce it as much as possible. I guess the possible consequence of this is that some calls to MozStackWalk() might return false when they would otherwise return true with a longer timeout? If that's a problem, maybe we could use 1000 for the first timeout and then reduce it to 0 for any subsequent ones? (Now that I think about it, I suspect EnsureWalkThreadReady() is not threadsafe. It looks like multiple calls to it could cause multiple walker threads to be created, but only one of them end up stored in |stackWalkThread|?)
Assignee | ||
Comment 1•8 years ago
|
||
Attachment #8710745 -
Flags: feedback?(ehsan)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Comment 2•8 years ago
|
||
I am not trying to upstage you, Nick, but I've had a patch sitting around that does away with that wait in its entirety. Unlike with Windows UI messaging, where the message queue needs to be created explicitly (and thus we need to wait for that to happen), APC queues are implicitly created at the same time as the thread. This patch converts MozStackwalk to use APCs instead of UI messages. Furthermore, this allows us to remove the dependency on user32.dll.
Attachment #8710835 -
Flags: feedback?(ehsan)
Assignee | ||
Comment 3•8 years ago
|
||
I'm very happy to be upstaged... except your patch doesn't work for me :( DMD hangs at startup for at least 90 seconds (I gave up after that long). To debug it, follow the "build" and "run" instructions for DMD at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD#Desktop_Firefox_%28Windows%29.
Comment 4•8 years ago
|
||
Comment on attachment 8710745 [details] [diff] [review] Reduce timeout in Windows stackwalk initialization code to speed up DMD start-up Review of attachment 8710745 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/misc/StackWalk.cpp @@ +302,5 @@ > MOZ_ASSERT((stackWalkThread && readyEvent) || > (!stackWalkThread && !readyEvent)); > > + // The thread was created. Spin the event loop before posting events to it. > + DWORD waitRet = ::WaitForSingleObject(readyEvent, 0); This is wrong. Passing 0 as the second argument will change what this function does completely: it will only tell you the current state of the event object (whether it's signaled or not) and will return immediately, always. So essentially you're destroying the wait mechanism! If you want to pursue this, I prefer changing things that we only wait on this event if the first call to PostThreadMessage() below fails. After all, that's the only time when we definitely need to wait. The other thing that would be interesting to figure out is why it is that we're waiting for one second here in the first place. I'm not sure if I understand that, but I would really like to...
Attachment #8710745 -
Flags: feedback?(ehsan) → feedback-
Comment 5•8 years ago
|
||
Comment on attachment 8710835 [details] [diff] [review] Convert MozStackwalk to use APCs Review of attachment 8710835 [details] [diff] [review]: ----------------------------------------------------------------- I like this approach a lot better. But let's see why this doesn't work for Nick first. f+ for now!
Attachment #8710835 -
Flags: feedback?(ehsan) → feedback+
Comment 6•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #0) > DMD is really slow to start up on Windows. Emanuel identified why more than a > year ago > (https://blog.mozilla.org/nnethercote/2014/12/11/cumulative-heap-profiling- > in-firefox-with-dmd/comment-page-1/#comment-44068) > and I just got around to confirming it. I don't understand what's happening at all, FWIW. :-) > In EnsureWalkThreadReady(), we have this code: > > > // The thread was created. Try to wait an arbitrary amount of time (1 second > > // should be enough) for its event loop to start before posting events to it. > > DWORD waitRet = ::WaitForSingleObject(readyEvent, 1000); > > if (waitRet == WAIT_TIMEOUT) { > > // We get a timeout if we're called during static initialization because > > // the thread will only start executing after we return so it couldn't > > // have signalled the event. If that is the case, give up for now and > > // try again next time we're called. > > return false; > > } > > I think this code was written with the assumption that stack-walking is a > relatively rare operation, but DMD does stack-walks a *lot*. AFAICT, this > means > that the "timeout occurs during static initialization" case gets hit a lot, > and > every time it does there's a 1 second penalty. Which "timeout occurs during static initialization" case? > (Now that I think about it, I suspect EnsureWalkThreadReady() is not > threadsafe. It looks like multiple calls to it could cause multiple walker > threads to be created, but only one of them end up stored in > |stackWalkThread|?) Any usage of NS_StackWalk from anything other than the main thread is unsafe. Is that what DMD does?
Flags: needinfo?(n.nethercote)
Assignee | ||
Comment 7•8 years ago
|
||
> > > // The thread was created. Try to wait an arbitrary amount of time (1 second > > > // should be enough) for its event loop to start before posting events to it. > > > DWORD waitRet = ::WaitForSingleObject(readyEvent, 1000); > > > if (waitRet == WAIT_TIMEOUT) { > > > // We get a timeout if we're called during static initialization because > > > // the thread will only start executing after we return so it couldn't > > > // have signalled the event. If that is the case, give up for now and > > > // try again next time we're called. > > > return false; > > > } > > > > I think this code was written with the assumption that stack-walking is a > > relatively rare operation, but DMD does stack-walks a *lot*. AFAICT, this > > means > > that the "timeout occurs during static initialization" case gets hit a lot, > > and > > every time it does there's a 1 second penalty. > > Which "timeout occurs during static initialization" case? The |return false| case with the comment that says "We get a timeout...". DMD does a stack walk for some fraction (possible all) calls to heap allocation functions like malloc. I confirmed that this case is being hit, and the number of times it gets hit correlates perfectly with the delay I get at startup. > Any usage of NS_StackWalk from anything other than the main thread is > unsafe. Is that what DMD does? Yes, and with abandon! If you turn sampling off we call it for every call to malloc/calloc/realloc/etc. This doesn't seem to have caused problems other than the one this bug is about...
Flags: needinfo?(n.nethercote)
Comment 8•8 years ago
|
||
The sampled stacks in bug 1154263 (probably unrelated?) also showed NS_StackWalk being called from both the HangMonitor thread and the (profiler's?) SamplingThread - see attachment 8592207 [details]. I think people very much expect this API to be thread safe.
Assignee | ||
Comment 9•8 years ago
|
||
The comment at the very top of StackWalk.h says: > /* API for getting a stack trace of the C/C++ stack on the current thread */ Also, MozStackWalk has a |aThread| argument, the comment for which says: > The thread for which the stack is to be retrieved. Passing null causes us to walk the stack of the current thread. On Windows, this is a thread HANDLE. It is currently not supported on any other platform." Finally, StackWalk.cpp has stuff like this: > // Debug routines are not threadsafe, so grab the lock. > EnterCriticalSection(&gDbgHelpCS); The whole module certainly gives the appearance of trying to be threadsafe.
Comment 10•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #3) > I'm very happy to be upstaged... except your patch doesn't work for me :( > DMD hangs at startup for at least 90 seconds (I gave up after that long). > > To debug it, follow the "build" and "run" instructions for DMD at > https://developer.mozilla.org/en-US/docs/Mozilla/Performance/ > DMD#Desktop_Firefox_%28Windows%29. During the startup hang, I see this: Main Thread: ntdll!NtSignalAndWaitForSingleObject+0xc KERNELBASE!SignalObjectAndWait+0x92 dmd!MozStackWalk+0x2be dmd!mozilla::dmd::StackTrace::Get+0xef dmd!mozilla::dmd::AllocCallback+0x95 dmd!replace_malloc+0x6e mozglue!malloc_impl+0x32 mozglue!moz_xmalloc+0xd xul!operator new[]+0xd xul!stagefright::List<stagefright::AString>::prep+0x10 xul!stagefright::List<stagefright::AString>::List<stagefright::AString>+0x18 xul!stagefright::splat_type<stagefright::List<stagefright::AString> >+0x47 xul!stagefright::Vector<stagefright::List<stagefright::AString> >::do_splat+0x18 xul!stagefright::VectorImpl::_do_splat+0x20 xul!stagefright::VectorImpl::insertAt+0x50 xul!stagefright::VectorImpl::push+0x1e xul!stagefright::Vector<stagefright::List<stagefright::AString> >::push+0x13 xul!stagefright::AAtomizer::AAtomizer+0x4b xul!`dynamic initializer for 'stagefright::AAtomizer::gAtomizer''+0xd MSVCR120!_initterm+0x29 xul!_CRT_INIT+0x1b0 xul!__DllMainCRTStartup+0x69 xul!_DllMainCRTStartup+0x1c ntdll!LdrxCallInitRoutine+0x16 ntdll!LdrpCallInitRoutine+0x43 ntdll!LdrpInitializeNode+0x101 ntdll!LdrpInitializeGraphRecurse+0x71 ntdll!LdrpPrepareModuleForExecution+0x8b ntdll!LdrpLoadDllInternal+0x121 ntdll!LdrpLoadDll+0x92 ntdll!LdrLoadDll+0xd9 mozglue!`anonymous namespace'::patched_LdrLoadDll+0x73e KERNELBASE!LoadLibraryExW+0x138 firefox!GetLibHandle+0x14 firefox!ReadDependentCB+0x23 firefox!ReadDependentCB+0x43 firefox!XPCOMGlueLoad+0x273 firefox!XPCOMGlueStartup+0x30 firefox!InitXPCOMGlue+0x115 firefox!NS_internal_main+0x7f firefox!wmain+0x12e firefox!__tmainCRTStartup+0xfe KERNEL32!BaseThreadInitThunk+0x24 ntdll!__RtlUserThreadStart+0x2f ntdll!_RtlUserThreadStart+0x1b Stack walker thread: ntdll!NtWaitForSingleObject+0xc ntdll!LdrpDrainWorkQueue+0xbd ntdll!LdrpInitializeThread+0x117 ntdll!_LdrpInitialize+0x30f ntdll!LdrpInitialize+0x25 ntdll!LdrInitializeThunk+0x10 The problem is a deadlock on the loader lock: the first memory allocation is taking place during invocation of static constructors. The loader lock is held while this happens. Then we call MozStackWalk which is waiting on the stack walker thread to initialize. The stack walker thread can't initialize because it can't grab the loader lock! In the case of DMD, we cannot rely on lazy initialization for the stack walker thread. I've modified the patch so that we can *optionally* prime that thread in advance so that it is ready for such scenarios.
Comment 11•8 years ago
|
||
Attachment #8710835 -
Attachment is obsolete: true
Attachment #8716054 -
Flags: review?(ehsan)
Assignee | ||
Comment 12•8 years ago
|
||
Patch r2 still hangs for me at start-up with DMD is enabled :(
Comment 13•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #12) > Patch r2 still hangs for me at start-up with DMD is enabled :( Booo! But it works on my machine! ;-) Can you please post stack traces of the main thread and the stack walker thread?
Flags: needinfo?(n.nethercote)
Comment 14•8 years ago
|
||
Comment on attachment 8716054 [details] [diff] [review] Convert MozStackwalk to use APCs (r2) Review of attachment 8716054 [details] [diff] [review]: ----------------------------------------------------------------- Clearing the review until this works for njn... Let me know if there's value in reviewing before that time.
Attachment #8716054 -
Flags: review?(ehsan)
Assignee | ||
Comment 15•8 years ago
|
||
Sorry for the delayed response. I'm a total n00b at Windows debugging but I've done my best. Note that when I start Firefox with DMD a window pops up, but it stays completely blank and the cursor spins continually. I ran it in the Visual Studio debugger and stopped execution at that point. The "Main Thread" stack is as follows. > [External Code] > xul.dll!mozilla::widget::WinUtils::WaitForMessage(unsigned long aTimeoutMs) Line 725 C++ > xul.dll!nsAppShell::ProcessNextNativeEvent(bool mayWait) Line 380 C++ > xul.dll!nsBaseAppShell::DoProcessNextNativeEvent(bool mayWait) Line 139 C++ > xul.dll!nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal * thr, bool mayWait) Line 289 C++ > xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 990 C++ > xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 297 C++ > xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 128 C++ > xul.dll!MessageLoop::RunHandler() Line 228 C++ > xul.dll!MessageLoop::Run() Line 202 C++ > xul.dll!nsBaseAppShell::Run() Line 158 C++ > xul.dll!nsAppShell::Run() Line 259 C++ > xul.dll!nsAppStartup::Run() Line 282 C++ > xul.dll!XREMain::XRE_mainRun() Line 4291 C++ > xul.dll!XREMain::XRE_main(int argc, char * * argv, const nsXREAppData * aAppData) Line 4380 C++ > xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData, unsigned int aFlags) Line 4482 C++ > firefox.exe!do_main(int argc, char * * argv, char * * envp, nsIFile * xreDirectory) Line 220 C++ > firefox.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 360 C++ > firefox.exe!wmain(int argc, wchar_t * * argv) Line 138 C++ > [External Code] The stack walker thread is called "msvcr120.dll" (I worked that out by luck). Its stack is as follows. > [External Code] > dmd.dll!WalkStackThread(void * aData) Line 480 C++ > [External Code] Line 480 is the loop in WalkStackThread(). All that was with a 64-bit build.
Flags: needinfo?(n.nethercote)
Comment 16•8 years ago
|
||
That's a normal stack for the main thread -- it's just waiting for input. Based on those stacks, I'd say that everything looks fine. Could you try again, but this time grab traces for both the WalkStackThread and whichever thread has called into MozStackWalk?
Assignee | ||
Comment 17•8 years ago
|
||
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #16) > Could you try > again, but this time grab traces for both the WalkStackThread and whichever > thread has called into MozStackWalk? I looked at every thread, and AFAICT none of them are in MozStackWalk when I interrupt the hang. The fact that the hang occurs only after a (blank) window has appeared indicates that execution has proceeded well past the first few allocations. So something else must be going wrong, but I don't know what it is :(
Assignee | ||
Comment 18•8 years ago
|
||
I have a partially-working browser. I can create new tabs and type into the address bar, but no websites will load. Visiting about:memory and taking a measurement works. This is weird.
Comment 19•8 years ago
|
||
Did you check what the content process is doing? It sounds like that's what's being handging.
Assignee | ||
Comment 20•8 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #19) > Did you check what the content process is doing? It sounds like that's > what's being handging. I... really should have thought of that myself. Thank you for the suggestion.
Assignee | ||
Comment 21•8 years ago
|
||
This is interesting: if I disable e10s then everything runs fine. But with e10s enabled the hang is in the content process. When it hangs the main thread in the content process looks like this: > [External Code] > dmd.dll!MozStackWalk(void (unsigned int, void *, void *, void *) * aCallback, unsigned int aSkipFrames, unsigned int aMaxFrames, void * aClosure, unsigned __int64 aThread, void * aPlatformData) Line 576C++ > dmd.dll!mozilla::dmd::StackTrace::Get(mozilla::dmd::Thread * aT) Line 787C++ > dmd.dll!mozilla::dmd::AllocCallback(void * aPtr, unsigned __int64 aReqSize, mozilla::dmd::Thread * aT) Line 1176C++ > dmd.dll!replace_malloc(unsigned __int64 aSize) Line 1257C++ > mozglue.dll!moz_xmalloc(unsigned __int64 size) Line 84C++ > xul.dll!stagefright::AAtomizer::AAtomizer() Line 33C++ > xul.dll!`dynamic initializer for 'stagefright::AAtomizer::gAtomizer''() Line 24C++ > [External Code] > plugin-container.exe!__delayLoadHelper2(const ImgDelayDescr * pidd, __int64 (void) * * ppfnIATEntry) Line 289C++ > [External Code] > plugin-container.exe!content_process_main(int argc, char * * argv) Line 180C++ > plugin-container.exe!wmain(int argc, wchar_t * * argv) Line 138C++ > [External Code] Line 576 is this one: > if (walkerReturn != WAIT_OBJECT_0 && data.walkCallingThread) { There's only one other thread in the content process. It's called "msvcr120.dll" (which is the same name as the stack walking thread in the parent process) and its stack just contains this: > [External Code] When I dig more deeply I just get a stack trace whose entries are all in ntdll.dll.
Comment 22•8 years ago
|
||
Do you have Microsoft symbols turned on in your debugger? That should allow Visual Studio symbolicate the frames coming from Windows DLLs which may help us figure out what's going on.
Comment 23•8 years ago
|
||
(In reply to :Ehsan Akhgari from comment #22) > Do you have Microsoft symbols turned on in your debugger? That should allow > Visual Studio symbolicate the frames coming from Windows DLLs which may help > us figure out what's going on. https://developer.mozilla.org/en-US/docs/Mozilla/Using_the_Mozilla_symbol_server For reference
Assignee | ||
Comment 25•7 years ago
|
||
I found a different way to make DMD work on Win32 in bug 1382076 -- I changed it to use FramePointerStackWalk() instead of MozStackWalk().
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•