Open
Bug 1204682
Opened 9 years ago
Updated 2 years ago
Add Wait Chain Traversal to hang reports
Categories
(Toolkit :: Crash Reporting, enhancement, P3)
Toolkit
Crash Reporting
Tracking
()
NEW
People
(Reporter: bugzilla, Unassigned)
References
(Depends on 1 open bug)
Details
Attachments
(3 files, 2 obsolete files)
24.14 KB,
patch
|
ehsan.akhgari
:
review+
|
Details | Diff | Splinter Review |
7.90 KB,
patch
|
jimm
:
review+
|
Details | Diff | Splinter Review |
2.43 KB,
patch
|
vladan
:
review+
|
Details | Diff | Splinter Review |
I think we should add wait chain traversal to the Windows crash reporter for hang reports (for Vista and newer). It should be able to give us additional information about hung threads, in particular who they are blocked on and why. Since WCT groks COM, GUI messaging, ALPC, and other things, I think it could give us some insight that we wouldn't otherwise have access to. See also: https://msdn.microsoft.com/en-us/library/windows/desktop/ms681622%28v=vs.85%29.aspx
Reporter | ||
Updated•9 years ago
|
Reporter | ||
Comment 1•9 years ago
|
||
Reporter | ||
Comment 2•9 years ago
|
||
Attachment #8700878 -
Attachment is obsolete: true
Attachment #8700880 -
Flags: review?(ehsan)
Reporter | ||
Comment 3•9 years ago
|
||
Bah, saw something I didn't like in moz.build.
Attachment #8700880 -
Attachment is obsolete: true
Attachment #8700880 -
Flags: review?(ehsan)
Attachment #8700882 -
Flags: review?(ehsan)
Reporter | ||
Comment 4•9 years ago
|
||
Attachment #8701130 -
Flags: review?(jmathies)
Reporter | ||
Comment 5•9 years ago
|
||
Attachment #8701131 -
Flags: review?(vladan.bugzilla)
Comment 6•9 years ago
|
||
Comment on attachment 8701131 [details] [diff] [review] Part 3 - Hang Monitor Review of attachment 8701131 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/threads/HangMonitor.cpp @@ +129,5 @@ > + } > + if (GeckoProcessType_Default == XRE_GetProcessType()) { > + // Let's get the main thread Ids of all content processes > + nsTArray<mozilla::dom::ContentParent*> contentParents; > + mozilla::dom::ContentParent::GetAll(contentParents); did you test this patch manually with multiple content processes? @@ +132,5 @@ > + nsTArray<mozilla::dom::ContentParent*> contentParents; > + mozilla::dom::ContentParent::GetAll(contentParents); > + for (uint32_t i = 0, len = contentParents.Length(); i < len; ++i) { > + int32_t pid = contentParents[i]->Pid(); > + nsAutoHandle snapshot(::CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, pid)); couldn't the CreateToolhelp32Snapshot call fail if the process exits? e.g. if it's a thumbnail capture process with a short lifetime
Attachment #8701131 -
Flags: review?(vladan.bugzilla) → review+
Comment 7•8 years ago
|
||
Comment on attachment 8701130 [details] [diff] [review] Part 2 Review of attachment 8701130 [details] [diff] [review]: ----------------------------------------------------------------- sorry for the wait. ::: dom/plugins/ipc/PluginModuleParent.cpp @@ +1248,5 @@ > + if (NS_SUCCEEDED(NS_GetMainThread(&mainThread)) && mainThread && > + NS_SUCCEEDED(mainThread->GetPRThread(&mainPrThread)) && mainPrThread) { > + waitChainTids.AppendElement(PR_GetThreadID(mainPrThread)); > + } > + waitChainTids.AppendElement(crashReporter->ChildMainThreadId()); nit - add a comment explaining what we're doing here please. @@ +1314,5 @@ > // include those minidumps as well. > if (CreatePluginMinidump(mFlashProcess1, 0, pluginDumpFile, > NS_LITERAL_CSTRING("flash1"))) { > additionalDumps.AppendLiteral(",flash1"); > + waitChainTids.AppendElement(GetFirstThreadId(mFlashProcess1)); - ditto @@ +3133,5 @@ > > +#if defined(XP_WIN) > + > +DWORD > +PluginModuleChromeParent::GetFirstThreadId(DWORD aProcessId) nit - comment me
Attachment #8701130 -
Flags: review?(jmathies) → review+
Comment 8•8 years ago
|
||
Comment on attachment 8700882 [details] [diff] [review] Part 1 Review of attachment 8700882 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/tests/windows/TestWaitChainTraversal.cpp @@ +46,5 @@ > + { > + return !mMutex; > + } > + > + CRITICAL_SECTION mCS; Nit: Please use AutoCriticalSection. @@ +59,5 @@ > +{ > + // This function figures out what the value of CRITICAL_SECTION::LockCount > + // should be when the CS is held but not contended. We do it this way because > + // the actual semantics of that field can change between Windows versions. > + CRITICAL_SECTION tempCS; Ditto. ::: xpcom/threads/WaitChainTraversal.h @@ +10,5 @@ > +#if !defined(XP_WIN) > +#error This feature is only supported on Windows builds. > +#endif > + > +#include <windows.h> Including <windows.h> in a header is a Bad Idea. See <https://dxr.mozilla.org/mozilla-central/search?q=%22%23undef+min%22&redirect=false&case=true> for example. :-) Please move the implementation to a .cpp file to avoid this header inclusion. @@ +40,5 @@ > + { > + if (!InitWCTFunctionPtrs()) { > + return; > + } > + mHandle = OpenFnPtr()(0, nullptr); Hmm... What prevents this from crashing if InitWCTFunctionPtrs() fails? @@ +87,5 @@ > + static HMODULE& GetModule() > + { > + static HMODULE sModule = NULL; > + return sModule; > + } Out of curiosity, why not just use a normal static member. This is a strange idiom, and I'm not sure if I understand why it's needed. @@ +118,5 @@ > + if (advApi32 && openFn && closeFn && getFn) { > + return true; > + } > + if (!advApi32) { > + advApi32 = ::LoadLibraryW(L"advapi32.dll"); Please use GetModuleHandleW, so that you don't have to FreeLibrary(). @@ +279,5 @@ > +} > + > +#if defined(MOZ_CRASHREPORTER) && defined(MOZILLA_INTERNAL_API) > +inline void > +TraverseWaitChain(const nsTArray<DWORD>& aTids) I think this is a misleading name. How about calling this AnnotateCrashReportWithWaitChain() or something like that?
Attachment #8700882 -
Flags: review?(ehsan) → review+
Reporter | ||
Comment 9•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=114d698149b3
Reporter | ||
Comment 10•3 years ago
|
||
This might interest you, but feel free to resolve as INVALID/WONTFIX if not...
Assignee: bugzilla → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(gsvelto)
Comment 11•3 years ago
|
||
Oooh, this is very interesting! Definitely not closing it as WONTFIX. Hang reporting needs some love and I think this would be quite useful, especially to diagnose shutdown hangs. For those we rely on reading code and hunches to figure out if we're looking at a complex deadlock scenario.
Type: defect → enhancement
Flags: needinfo?(gsvelto)
Priority: -- → P3
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•