Closed Bug 1536850 Opened 9 months ago Closed 9 months ago

Crash in [@ shutdownhang | NtGetContextThread]

Categories

(Toolkit :: Crash Reporting, defect, critical)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox66 --- unaffected
firefox67 --- fixed
firefox68 --- fixed

People

(Reporter: calixte, Assigned: gsvelto)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug is for crash report bp-0800ebfb-4dc3-4cec-88d6-ef1770190320.

Top 10 frames of crashing thread:

0 ntdll.dll NtGetContextThread 
1 kernelbase.dll GetThreadContext 
2 dbgcore.dll Win32LiveSystemProvider::GetThreadContextEx 
3 dbgcore.dll NtWin32LiveSystemProvider::GetThreadContextEx 
4 dbgcore.dll GenAllocateThreadObject 
5 dbgcore.dll GenGetProcessInfo 
6 dbgcore.dll MiniDumpProvideDump 
7 dbgcore.dll MiniDumpWriteDump 
8 xul.dll bool google_breakpad::ExceptionHandler::WriteMinidumpWithExceptionForProcess toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc:950
9 xul.dll google_breakpad::ExceptionHandler::WriteMinidumpForChild toolkit/crashreporter/breakpad-client/windows/handler/exception_handler.cc:814

There are 4 crashes starting with buildid 20190319095054.
The pushlog for this build is:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=fe798624cda0&tochange=880331515823
It appears that it could be related to patch for bug 1533842.
:gsvelto, could you investigate please ?

Flags: needinfo?(gsvelto)
Component: General → Crash Reporting
Product: Core → Toolkit

This is very odd, it looks as if bug 1498942 has come back but the change in bug 1533842 should not have changed the behavior at all. It's late now but I'll check again tomorrow first thing in the morning.

Investigating this issue turned out to be more complex than I had anticipated. As it turns out my fix for bug 1498942 worked purely by chance which explains quite a few things. First of all since we were not initializing the mShuttingDown variable when creating a ContentParent object the variable ended up holding garbage, but since we use a non-zero poison pattern for freed memory the variable evaluated to true most of the time. This in turn prevented us from taking minidumps of hung content processes even when we weren't shutting down which is bad because it must have artificially lowered our content process hang rate on crash-stats.

That being said since we weren't taking minidumps of content process anymore on shutdown we didn't hit this particular crash. When I fixed the variable initialization the patch started behaving like it should except that there was a second issue in the way: we set the mShuttingDown variable inside the profile-before-change/xpcom-shutdown observer but - depending on the state of the content process - there seem to be a race that causes the observer not to be called and so we never set mShuttingDown. I must decouple the shutdown detection from the rest of the ContentParent logic because it's the only way to prevent this issue from happening.

Flags: needinfo?(gsvelto)

I've identified the race between the process being killed and receiving profile-before-change, the fix should be straightforward.

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED

This code was already present but it had two flaws: the first one was that the
variable used to prevent minidumps from being taken when shutting down was not
being initialized which caused it to be true most of the time and thus
prevented captures even in valid scenarios. The second was that we relied on
the "profile-before-change" change event to detect shutdown but this often
happens after we've already started closing tabs and shutting down processes
thus the minidump generation would often race against it.

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/088e0dd9ed88
Properly detect when the browser is shutting down to avoid taking content process minidumps too late r=Ehsan
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

I don't see any new crashes after buildid 20190329220047 so this should be fixed for good.

Comment on attachment 9053236 [details]
Bug 1536850 - Properly detect when the browser is shutting down to avoid taking content process minidumps too late

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 1498942
  • User impact if declined: Crash reports for hung content processes are never sent
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The changes uses the proper interface to detect shutdown and the patch is small
  • String changes made/needed: None
Attachment #9053236 - Flags: approval-mozilla-beta?

(In reply to Gabriele Svelto [:gsvelto] from comment #8)

  • List of other uplifts needed: None

I believe we'll want to re-land bug 1533842 if this gets approved?

Comment on attachment 9053236 [details]
Bug 1536850 - Properly detect when the browser is shutting down to avoid taking content process minidumps too late

Shutdown crash fix, approved for 67 beta 10, thanks!

Attachment #9053236 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)

(In reply to Gabriele Svelto [:gsvelto] from comment #8)

  • List of other uplifts needed: None

I believe we'll want to re-land bug 1533842 if this gets approved?

Gabriele, should we reland bug 1533842? Thanks

Flags: needinfo?(gsvelto)

No, the patch here replaces bug 1533842 entirely.

Flags: needinfo?(gsvelto)

Tried to uplift to beta, failed at builds/worker/workspace/build/src/dom/ipc/ContentParent.cpp

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&resultStatus=testfailed%2Cbusted%2Cexception&revision=11ddfc9bd642f08e487fb76d23b7626f0b179e58

Backout link: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&resultStatus=testfailed%2Cbusted%2Cexception&revision=92601d63125e1c4f043a665476ab8cea39c9e592

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=239550295&repo=mozilla-beta&lineNumber=23084

[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/dom/ipc'
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - /builds/worker/workspace/build/src/clang/bin/clang++ -o Unified_cpp_dom_ipc0.o -c -I/builds/worker/workspace/build/src/obj-firefox/dist/stl_wrappers -I/builds/worker/workspace/build/src/obj-firefox/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -DDEBUG=1 -DOS_POSIX=1 -DOS_LINUX=1 '-DBIN_SUFFIX=""' -DMOZ_TOOLKIT_SEARCH -DSTATIC_EXPORTABLE_JS_API -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/dom/ipc -I/builds/worker/workspace/build/src/obj-firefox/dom/ipc -I/builds/worker/workspace/build/src/obj-firefox/ipc/ipdl/ipdlheaders -I/builds/worker/workspace/build/src/ipc/chromium/src -I/builds/worker/workspace/build/src/ipc/glue -I/builds/worker/workspace/build/src/caps -I/builds/worker/workspace/build/src/chrome -I/builds/worker/workspace/build/src/docshell/base -I/builds/worker/workspace/build/src/dom/base -I/builds/worker/workspace/build/src/dom/bindings -I/builds/worker/workspace/build/src/dom/events -I/builds/worker/workspace/build/src/dom/filesystem -I/builds/worker/workspace/build/src/dom/geolocation -I/builds/worker/workspace/build/src/dom/media/webspeech/synth/ipc -I/builds/worker/workspace/build/src/dom/security -I/builds/worker/workspace/build/src/dom/storage -I/builds/worker/workspace/build/src/extensions/cookie -I/builds/worker/workspace/build/src/extensions/spellcheck/src -I/builds/worker/workspace/build/src/gfx/2d -I/builds/worker/workspace/build/src/hal/sandbox -I/builds/worker/workspace/build/src/js/xpconnect/loader -I/builds/worker/workspace/build/src/js/xpconnect/src -I/builds/worker/workspace/build/src/layout/base -I/builds/worker/workspace/build/src/media/webrtc -I/builds/worker/workspace/build/src/netwerk/base -I/builds/worker/workspace/build/src/netwerk/protocol/http -I/builds/worker/workspace/build/src/toolkit/components/printingui/ipc -I/builds/worker/workspace/build/src/toolkit/crashreporter -I/builds/worker/workspace/build/src/toolkit/xre -I/builds/worker/workspace/build/src/uriloader/exthandler -I/builds/worker/workspace/build/src/widget -I/builds/worker/workspace/build/src/xpcom/base -I/builds/worker/workspace/build/src/xpcom/threads -I/builds/worker/workspace/build/src/modules/libjar -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -Qunused-arguments -U_FORTIFY_SOURCE -fno-common -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wc++1z-compat -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Werror=non-literal-null-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=return-std-move -Wno-error=atomic-alignment -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-unknown-warning-option -Wno-return-type-c-linkage -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fsanitize=address -fcrash-diagnostics-dir=/builds/worker/artifacts -U_FORTIFY_SOURCE -fno-common -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -Xclang -load -Xclang /builds/worker/workspace/build/src/obj-firefox/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -O1 -fno-omit-frame-pointer -funwind-tables -Werror -I/builds/worker/workspace/build/src/widget/gtk/compat-gtk3 -pthread -I/usr/include/gtk-3.0 -I/usr/include/atk-1.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/pango-1.0 -I/usr/include/gio-unix-2.0/ -I/usr/include/cairo -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/harfbuzz -I/usr/include/freetype2 -I/usr/include/pixman-1 -I/usr/include/libpng12 -I/usr/include/gtk-3.0/unix-print -Wno-error=shadow -MD -MP -MF .deps/Unified_cpp_dom_ipc0.o.pp /builds/worker/workspace/build/src/obj-firefox/dom/ipc/Unified_cpp_dom_ipc0.cpp
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - In file included from /builds/worker/workspace/build/src/obj-firefox/dom/ipc/Unified_cpp_dom_ipc0.cpp:56:
[task 2019-04-11T03:06:34.423Z] 03:06:34 ERROR - /builds/worker/workspace/build/src/dom/ipc/ContentParent.cpp:3380:54: error: too few arguments to function call, single argument 'aShuttingDown' was not specified
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - if (mCrashReporter && !appStartup->GetShuttingDown() &&
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/nsIAppStartup.h:78:25: note: 'GetShuttingDown' declared here
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - JS_HAZ_CAN_RUN_SCRIPT NS_IMETHOD GetShuttingDown(bool *aShuttingDown) = 0;
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - ^
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/nscore.h:134:20: note: expanded from macro 'NS_IMETHOD'
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - #define NS_IMETHOD NS_IMETHOD
(nsresult)
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - ^
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - /builds/worker/workspace/build/src/obj-firefox/dist/include/nscore.h:125:29: note: expanded from macro 'NS_IMETHOD_'
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - # define NS_IMETHOD_(type) virtual type
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - ^
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - 1 error generated.
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - /builds/worker/workspace/build/src/config/rules.mk:805: recipe for target 'Unified_cpp_dom_ipc0.o' failed
[task 2019-04-11T03:06:34.423Z] 03:06:34 ERROR - make[4]: *** [Unified_cpp_dom_ipc0.o] Error 1
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/dom/ipc'
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - make[4]: *** Waiting for unfinished jobs....
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/dom/worklet'
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - mkdir -p '.deps/'

Flags: needinfo?(pascalc)

(In reply to Daniel Varga [:dvarga] from comment #13)

[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - In file included from /builds/worker/workspace/build/src/obj-firefox/dom/ipc/Unified_cpp_dom_ipc0.cpp:56:
[task 2019-04-11T03:06:34.423Z] 03:06:34 ERROR - /builds/worker/workspace/build/src/dom/ipc/ContentParent.cpp:3380:54: error: too few arguments to function call, single argument 'aShuttingDown' was not specified
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - if (mCrashReporter && !appStartup->GetShuttingDown() &&
[task 2019-04-11T03:06:34.423Z] 03:06:34 INFO - ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^

Ugh, there's a problem with my original patch I hadn't spotted... and more importantly this error is present in a few other places in the code which fooled me into thinking I was doing the right thing. GetShuttingDown() will always return true here which is not what we wanted. Ignore this patch for now, I'll file a follow-up.

OK, scratch part of my previous comment: the patch is working on central, it's failing on beta because it implicitly relied on bug 1523609 (which changed how GetShuttingDown() behaves by adding an inlined-version which does what you'd expect it to do). I'll write a beta-specific patch and attach it here.

Attached patch Patch for betaSplinter Review
Flags: needinfo?(pascalc)
You need to log in before you can comment on or make changes to this bug.