Closed Bug 1043112 Opened 11 years ago Closed 11 years ago

[B2G] Add support for core dumps in custom builds

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

FxOS-S3 (24Jul)

People

(Reporter: aosmond, Assigned: aosmond)

References

Details

Attachments

(3 files, 12 obsolete files)

Pull request on B2G 11 years ago Andrew Osmond [:aosmond] (he/him) 44 bytes, text/x-github-pull-request		Details \| Review
Pull request on platform_system_core 11 years ago Andrew Osmond [:aosmond] (he/him) 59 bytes, text/x-github-pull-request		Details \| Review
Pull request on gonk-misc 11 years ago Andrew Osmond [:aosmond] (he/him) 50 bytes, text/x-github-pull-request	aosmond : feedback+	Details \| Review
Pull request on device-flame 11 years ago Andrew Osmond [:aosmond] (he/him) 52 bytes, text/x-github-pull-request		Details \| Review
Force b2g process to generate core dump 11 years ago Andrew Osmond [:aosmond] (he/him) 877 bytes, patch	benjamin : review-	Details \| Diff \| Splinter Review
Force b2g process to generate core dump, v2 11 years ago Andrew Osmond [:aosmond] (he/him) 1.60 KB, patch		Details \| Diff \| Splinter Review
post_sigsegv_respawn_core_backtrace.log 11 years ago Andrew Osmond [:aosmond] (he/him) 12.89 KB, text/x-log		Details
Force b2g process to generate core dump, v3 11 years ago Andrew Osmond [:aosmond] (he/him) 2.67 KB, patch	jld : review+	Details \| Diff \| Splinter Review
Force b2g process to generate core dump, v3.1 11 years ago Andrew Osmond [:aosmond] (he/him) 1.71 KB, patch	aosmond : review+	Details \| Diff \| Splinter Review
Pull request on device-flame, v1.1 11 years ago Andrew Osmond [:aosmond] (he/him) 52 bytes, text/x-github-pull-request		Details \| Review
Pull request on B2G, v2 11 years ago Andrew Osmond [:aosmond] (he/him) 44 bytes, text/x-github-pull-request	dhylands : review+	Details \| Review
Pull request on gonk-misc, v1.1 11 years ago Andrew Osmond [:aosmond] (he/him) 50 bytes, text/x-github-pull-request		Details \| Review
Pull request on gonk-misc, v3 11 years ago Andrew Osmond [:aosmond] (he/him) 50 bytes, text/x-github-pull-request	mwu : review+	Details \| Review
Pull request on gonk-misc, v4 11 years ago Andrew Osmond [:aosmond] (he/him) 50 bytes, text/x-github-pull-request	aosmond : review+	Details \| Review
Force b2g process to generate core dump, v3.2 [carries r=jld] 10 years ago Andrew Osmond [:aosmond] (he/him) 1.84 KB, patch	aosmond : review+	Details \| Diff \| Splinter Review

Andrew Osmond [:aosmond] (he/him)

Assignee

Description

•

11 years ago

Presently when a process crashes on B2G devices, the only trace generated is in the logcat. The build process and run-gdb scripts should be updated to permit enabling core dumping of processes for development purposes. This should not be enabled by default on production builds.

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

11 years ago

Status: NEW → ASSIGNED

QA Contact: aosmond

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 1

•

11 years ago

Attached file Pull request on B2G (obsolete) — Details

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 2

•

11 years ago

Attached file Pull request on platform_system_core (obsolete) — Details

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 3

•

11 years ago

Attached file Pull request on gonk-misc (obsolete) — Details

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

11 years ago

Assignee: nobody → aosmond

QA Contact: aosmond

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 4

•

11 years ago

Comment on attachment 8461289 [details] [review] Pull request on gonk-misc mwu: I have been able to generate coredumps by manually changing system/core/rootfiles/init.rc to import init.b2g.debug.rc, however I am having trouble figuring out how to automate the patching of said file. There is an existing mechanism to patch init.rc via the TARGET_PROVIDES_B2G_INIT_RC flag. However if I try to follow the same method and simply always patch init.rc, I see that it does do it, but the file is then overwritten by the original, presumably because there is no way for make to know there is a dependency between gonk-misc/init.b2g.rc and system/core/rootfiles/init.rc. Ideally I would toggle coredumps based on some developer controlled flag (i.e. say B2G_DEBUG=1 or B2G_DEBUG_COREDUMP=1 in .userconfig) but even doing it always is challenging enough :). Any suggestions would be appreciated! I'm starting to think the best way to handle this is to just put in a .patch file, and say please manually patch system/core/rootfiles/init.rc using this file and build again...

Attachment #8461289 - Flags: feedback?(mwu)

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 5

•

11 years ago

Would it work to have b2g.sh enable core dumps, instead of init.rc?

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 6

•

11 years ago

(In reply to Jed Davis [:jld] from comment #5) > Would it work to have b2g.sh enable core dumps, instead of init.rc? We might be able to move some of it but the following bit is the crucial part: on post-fs-data # Make processes dumpable setrlimit 4 -1 -1 The coredump file size limit is inherited by child processes, so if we moved the equivalent of that into the b2g.sh (ulimit -c unlimited), then only processes spawned by that shell script will get cores. That gets some of the value, but not all the way for services (at least) I care about, such as mediaserver.

Michael Wu [:mwu]

Comment 7

•

11 years ago

We don't need to have this in a separate init.rc file. It should be part of init.b2g.rc and trigger on some property.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 8

•

11 years ago

(In reply to Michael Wu [:mwu] from comment #7) > We don't need to have this in a separate init.rc file. It should be part of > init.b2g.rc and trigger on some property. I agree that would be ideal, but: 1) It looks like property events are triggered after the services are started, and so I don't think they won't inherit the correct rlimit 4 setting. 2) init.b2g.rc is not always used. On flame for example, it doesn't import it and so it isn't executed as far as I can tell.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 9

•

11 years ago

> I agree that would be ideal, but: > > 1) It looks like property events are triggered after the services are > started, and so I don't think they won't inherit the correct rlimit 4 > setting. > 2) init.b2g.rc is not always used. On flame for example, it doesn't import > it and so it isn't executed as far as I can tell. Err 1), that is to say I don't think they *will* inherit the correct rlimit 4 setting. Processes started after should however.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 10

•

11 years ago

(In reply to Michael Wu [:mwu] from comment #7) > We don't need to have this in a separate init.rc file. It should be part of > init.b2g.rc and trigger on some property. Hm, there is one possibility that came to mind although testing today has revealed there are some significant roadblocks. In theory if we trigger on a property, we can update the rlimit so that future processes are set, and *then* execute a script which goes through each existing process and sets the max core size using the prlimit API (supported by our kernel). (The only alternative to prlimit is to write to /proc/<pid>/limits but support for this is not in the mainline or Android kernels.) However our bionic libc version does not support prlimit. It was added early this year although it does seem to have some limitations which may or may not be relevant: https://github.com/android/platform_bionic/commit/0f461e35f63200641fc53bba222845a84589c024 I'm not sure how easy it would be to get our partners to push this change into their versions of libc since we pull that from their repositories. And with all that said, we still need a way to shove some changes into the init.rc/init.b2g.rc which isn't clear to me as I pointed out in comment 8. I would be happy if we had a less dynamic solution requiring a rebuild today since we have no simple means to get core files, and could look at improving it in the future (maybe the move to KK will bring some of the necessary changes??).

Michael Wu [:mwu]

Comment 11

•

11 years ago

(In reply to Andrew Osmond [:aosmond] from comment #8) > 2) init.b2g.rc is not always used. On flame for example, it doesn't import > it and so it isn't executed as far as I can tell. That's not a big deal. init.b2g.rc is the thing we support. We can port the change to devices which don't use it.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 12

•

11 years ago

Attached file Pull request on device-flame (obsolete) — Details

Turn on using the modified gonk-misc version of init.rc for flame.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 13

•

11 years ago

(In reply to Michael Wu [:mwu] from comment #11) > (In reply to Andrew Osmond [:aosmond] from comment #8) > > 2) init.b2g.rc is not always used. On flame for example, it doesn't import > > it and so it isn't executed as far as I can tell. > > That's not a big deal. init.b2g.rc is the thing we support. We can port the > change to devices which don't use it. I see what you mean now. There are a number of places where TARGET_PROVIDES_B2G_INIT_RC and TARGET_PROVIDES_INIT_RC are set, and I didn't realize *we* controlled the makefile which had the final setting. I've added a new pull request to turn change TARGET_PROVIDES_INIT_RC to true on flame to avoid trampling our updated version :). However since the vendor init.rc under system/core/rootdir itself already includes the bits we need from init.b2g.rc (specifically the "service b2g" definition), TARGET_PROVIDES_B2G_INIT_RC should remain false. As such, I modified gonk-misc/Android.mk to generate init.rc when TARGET_PROVIDES_INIT_RC=true. It will import init.b2g.rc if TARGET_PROVIDES_B2G_INIT_RC=false. It will import init.b2g.debug.rc if B2G_DEBUG_CORE is set; perhaps this should be B2G_DEBUG? The core files can be *quite* large and contain personal information depending on the process dumped, so I'm not sure what you think about turning it on in debug builds (based on B2G_DEBUG) by default. I'm willing to look at this again once we get the libc support to make a runtime switch based on a property possible. Another thing to consider is that if debug symbols are not created (optimized/release build?), the cores are probably useless so we probably only want to support it when B2G_DEBUG is set anyways.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 14

•

11 years ago

Comment on attachment 8461286 [details] [review] Pull request on platform_system_core Obsoleted by gonk-misc/device-flame pull requests

Attachment #8461286 - Attachment is obsolete: true

Wander Lairson Costa

Comment 15

•

11 years ago

This won't core dump assertion failures by default because crash report catches SIGABRT.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 16

•

11 years ago

(In reply to Wander Lairson Costa :wcosta from comment #15) > This won't core dump assertion failures by default because crash report > catches SIGABRT. If I run "kill -s SIGABRT <process>", I see the crash reporter pops up an indicator on screen, but the core dump is still generated. It also captures SIGSEGV with the same effect. Are there other corner cases I'm missing? I was lucky (?) that I hit an abort error when playing with the camera app, although oddly it claims it is SIGSEGV which brought it down, despite the trace being clear it is an abort: Program terminated with signal 11, Segmentation fault. #0 0xb603db36 in ?? () warning: Breakpoint address adjusted from 0xb6f3db89 to 0xb6f3db88. warning: Dynamic linker breakpoint function now found. Pending breakpoints in dynamically-loaded libraries will now work. (gdb) bt #0 0xb603db36 in mozalloc_abort (msg=<optimized out>) at ../../../../B2G_HG/b2g-inbound/memory/mozalloc/mozalloc_abort.cpp:30 #1 0xb489d270 in Abort ( aMsg=0xbed66434 "[Child 1107] ###!!! ABORT: Element::UnbindFromTree should have destroyed the element transitions object: 'collection->mElement->GetCrossShadowCurrentDoc() == mPresContext->Document()', file ../../../."...) at ../../../../B2G_HG/b2g-inbound/xpcom/base/nsDebugImpl.cpp:482

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

11 years ago

Attachment #8461285 - Flags: review?(mwu)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 17

•

11 years ago

Comment on attachment 8461289 [details] [review] Pull request on gonk-misc Updated the pull request to trigger turning on core dumping based on a property update, which is not set by default (so no cores). Added utility to expose the kernel system call that is missing from libc.

Attachment #8461289 - Flags: review?(mwu)

Attachment #8461289 - Flags: feedback?(mwu)

Attachment #8461289 - Flags: feedback+

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

11 years ago

Attachment #8464745 - Flags: review?(mwu)

Wander Lairson Costa

Comment 18

•

11 years ago

(In reply to Andrew Osmond [:aosmond] from comment #16) > (In reply to Wander Lairson Costa :wcosta from comment #15) > > This won't core dump assertion failures by default because crash report > > catches SIGABRT. > > If I run "kill -s SIGABRT <process>", I see the crash reporter pops up an > indicator on screen, but the core dump is still generated. It also captures > SIGSEGV with the same effect. Are there other corner cases I'm missing? > > I was lucky (?) that I hit an abort error when playing with the camera app, > although oddly it claims it is SIGSEGV which brought it down, despite the > trace being clear it is an abort: > > Program terminated with signal 11, Segmentation fault. > #0 0xb603db36 in ?? () > warning: Breakpoint address adjusted from 0xb6f3db89 to 0xb6f3db88. > warning: Dynamic linker breakpoint function now found. Pending > breakpoints in dynamically-loaded libraries will now work. > (gdb) bt > #0 0xb603db36 in mozalloc_abort (msg=<optimized out>) at > ../../../../B2G_HG/b2g-inbound/memory/mozalloc/mozalloc_abort.cpp:30 > #1 0xb489d270 in Abort ( > aMsg=0xbed66434 "[Child 1107] ###!!! ABORT: Element::UnbindFromTree > should have destroyed the element transitions object: > 'collection->mElement->GetCrossShadowCurrentDoc() == > mPresContext->Document()', file ../../../."...) at > ../../../../B2G_HG/b2g-inbound/xpcom/base/nsDebugImpl.cpp:482 This is really weird, I tried sending SIGABRT to the b2g process and I got no dumps. When I send SIGQUIT, it core dumps.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 19

•

11 years ago

(In reply to Wander Lairson Costa :wcosta from comment #18) > This is really weird, I tried sending SIGABRT to the b2g process and I got > no dumps. When I send SIGQUIT, it core dumps. Ah, this seems to be the difference between b2g and its child processes plugin-container. I can reproduce what you are referring to now. It looks like that signal (along with SIGSEGV and SIGILL) is captured in toolkit/xre/nsSigHandlers.cpp and it doesn't pass the signal along to the kernel when it shuts down. It should not a problem to work around however; I'll see if I can whip together a patch today.

Wander Lairson Costa

Comment 20

•

11 years ago

(In reply to Andrew Osmond [:aosmond] from comment #19) > (In reply to Wander Lairson Costa :wcosta from comment #18) > > This is really weird, I tried sending SIGABRT to the b2g process and I got > > no dumps. When I send SIGQUIT, it core dumps. > > Ah, this seems to be the difference between b2g and its child processes > plugin-container. I can reproduce what you are referring to now. > > It looks like that signal (along with SIGSEGV and SIGILL) is captured in > toolkit/xre/nsSigHandlers.cpp and it doesn't pass the signal along to the > kernel when it shuts down. Yes, this is exactly what I observed. > It should not a problem to work around however; > I'll see if I can whip together a patch today. This would be great, because currently we have no core dumps for assertion failures on debug builds.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 21

•

11 years ago

Attached patch Force b2g process to generate core dump (obsolete) — Details — Splinter Review

(In reply to Wander Lairson Costa :wcosta from comment #20) > (In reply to Andrew Osmond [:aosmond] from comment #19) > > (In reply to Wander Lairson Costa :wcosta from comment #18) > > > This is really weird, I tried sending SIGABRT to the b2g process and I got > > > no dumps. When I send SIGQUIT, it core dumps. > > > > Ah, this seems to be the difference between b2g and its child processes > > plugin-container. I can reproduce what you are referring to now. > > > > It looks like that signal (along with SIGSEGV and SIGILL) is captured in > > toolkit/xre/nsSigHandlers.cpp and it doesn't pass the signal along to the > > kernel when it shuts down. > > Yes, this is exactly what I observed. > > > It should not a problem to work around however; > > I'll see if I can whip together a patch today. > > This would be great, because currently we have no core dumps for assertion > failures on debug builds. This patch should cause it to generate core dumps as expected now. Reading the comments, it seems like it *should* already be, but if we call exit, it seems to swallow any pending signals. I'm not sure if it behaves differently in other circumstances, so I just changed the behaviour on MOZ_B2G builds.

Attachment #8471845 - Flags: review?(mwu)

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

11 years ago

Attachment #8471845 - Flags: review?(mwu) → review?(benjamin)

Benjamin Smedberg

Comment 22

•

11 years ago

Comment on attachment 8471845 [details] [diff] [review] Force b2g process to generate core dump I am about 99% certain that this will break release crashreporting, which also uses a signal handler. If crash reporting is enabled, we must use that instead of saving a core.

Attachment #8471845 - Flags: review?(benjamin) → review-

Wander Lairson Costa

Comment 23

•

11 years ago

(In reply to Andrew Osmond [:aosmond] from comment #21) > Created attachment 8471845 [details] [diff] [review] > Force b2g process to generate core dump > > (In reply to Wander Lairson Costa :wcosta from comment #20) > > (In reply to Andrew Osmond [:aosmond] from comment #19) > > > (In reply to Wander Lairson Costa :wcosta from comment #18) > > > > This is really weird, I tried sending SIGABRT to the b2g process and I got > > > > no dumps. When I send SIGQUIT, it core dumps. > > > > > > Ah, this seems to be the difference between b2g and its child processes > > > plugin-container. I can reproduce what you are referring to now. > > > > > > It looks like that signal (along with SIGSEGV and SIGILL) is captured in > > > toolkit/xre/nsSigHandlers.cpp and it doesn't pass the signal along to the > > > kernel when it shuts down. > > > > Yes, this is exactly what I observed. > > > > > It should not a problem to work around however; > > > I'll see if I can whip together a patch today. > > > > This would be great, because currently we have no core dumps for assertion > > failures on debug builds. > > This patch should cause it to generate core dumps as expected now. Reading > the comments, it seems like it *should* already be, but if we call exit, it > seems to swallow any pending signals. I'm not sure if it behaves differently > in other circumstances, so I just changed the behaviour on MOZ_B2G builds. hmm, not exactly as expected. It indeed generates a core dump, but the stack trace recurs from the kill(getpid(), signo) call: #0 kill () at bionic/libc/arch-arm/bionic/kill.S:45 #1 0xb55738e4 in nsProfileLock::FatalSignalHandler (signo=11, info=0xbea2e008, context=0xbea2e088) at /home/wander/work/gecko-dev/profile/dirserviceprovider/nsProfileLock.cpp:192 #2 0xb58785a8 in AsmJSFaultHandler (context=0xbea2e088, info=0xbea2e008, signum=11) at ../../../../gecko-dev/js/src/asmjs/AsmJSSignalHandlers.cpp:975 #3 AsmJSFaultHandler (signum=11, info=0xbea2e008, context=0xbea2e088) at ../../../../gecko-dev/js/src/asmjs/AsmJSSignalHandlers.cpp:957 #4 0xb6f0965c in ?? () #5 0xb6f0965c in ?? ()

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 24

•

11 years ago

(In reply to Wander Lairson Costa [:wcosta] from comment #23) > hmm, not exactly as expected. It indeed generates a core dump, but the stack > trace recurs from the > kill(getpid(), signo) call: That's probably a SIGSYS from the sandbox. You should call raise(3), which uses tgkill(2) to post the signal to the calling thread, rather than an unspecified thread in the same process as kill(2) does.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 25

•

11 years ago

(In reply to Wander Lairson Costa [:wcosta] from comment #23) > #3 AsmJSFaultHandler (signum=11, info=0xbea2e008, context=0xbea2e088) at > ../../../../gecko-dev/js/src/asmjs/AsmJSSignalHandlers.cpp:957 > #4 0xb6f0965c in ?? () > #5 0xb6f0965c in ?? () I commented a bit too quickly earlier. There are several things going on here: 1. As seen in the quoted text above, gdb doesn't know how to unwind out of the signal handler on this platform. So if you want to re-raise a SIGSEGV/SIGBUS that was caused by an actual fault (rather than by kill/tgkill; this is available in the si_code field of the siginfo), you have to return from the signal handler after resetting the signal handling. 2. kill(getpid(), ...) shouldn't be used in a multithreaded process, as it can signal a thread other than the caller; see above about raise(). 3. kill(...) is forbidden in sandboxed content processes and will raise SIGSYS; tgkill is allowed only if the tgid is the calling process.

Wander Lairson Costa

Comment 26

•

11 years ago

(In reply to Jed Davis [:jld] from comment #25) > (In reply to Wander Lairson Costa [:wcosta] from comment #23) > > #3 AsmJSFaultHandler (signum=11, info=0xbea2e008, context=0xbea2e088) at > > ../../../../gecko-dev/js/src/asmjs/AsmJSSignalHandlers.cpp:957 > > #4 0xb6f0965c in ?? () > > #5 0xb6f0965c in ?? () > > I commented a bit too quickly earlier. There are several things going on > here: > > 1. As seen in the quoted text above, gdb doesn't know how to unwind out of > the signal handler on this platform. So if you want to re-raise a > SIGSEGV/SIGBUS that was caused by an actual fault (rather than by > kill/tgkill; this is available in the si_code field of the siginfo), you > have to return from the signal handler after resetting the signal handling. > > 2. kill(getpid(), ...) shouldn't be used in a multithreaded process, as it > can signal a thread other than the caller; see above about raise(). > > 3. kill(...) is forbidden in sandboxed content processes and will raise > SIGSYS; tgkill is allowed only if the tgid is the calling process. Returning from the handler to re-raise the signal sounds really complex in some common scenarios. How about adding a build switch to not handle faulty signals at all? I guess would be more work, but less tricky.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 27

•

11 years ago

(In reply to Wander Lairson Costa [:wcosta] from comment #26) > Returning from the handler to re-raise the signal sounds really complex in > some common scenarios. How about adding a build switch to not handle faulty > signals at all? I guess would be more work, but less tricky. It's not that complicated; Breakpad already has code for it: https://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc#324

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 28

•

11 years ago

Attached patch Force b2g process to generate core dump, v2 (obsolete) — Details — Splinter Review

(In reply to Benjamin Smedberg [:bsmedberg] from comment #22) > Comment on attachment 8471845 [details] [diff] [review] > Force b2g process to generate core dump > > I am about 99% certain that this will break release crashreporting, which > also uses a signal handler. If crash reporting is enabled, we must use that > instead of saving a core. I'm not quite sure how this is possible. The signal handler originally calls _exit at the end of the function, which will bring the process down without triggering further handlers. If the crash reporter runs, it must have already done so. But in any event, at least with his new patch (with or without the MOZ_CRASHREPORTER define), if you look at the backtrace produced from "kill -s SIGSEGV <pid>": (gdb) bt #0 syscall () at bionic/libc/arch-arm/bionic/syscall.S:41 #1 0xb5564bfe in tgkill (sig=11, tid=<optimized out>, tgid=1909) at ../../../../../../../../../B2G_HG/b2g-inbound/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc:105 #2 google_breakpad::ExceptionHandler::SignalHandler (sig=11, info=0xbe907210, uc=0xbe907290) at ../../../../../../../../../B2G_HG/b2g-inbound/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc:346 #3 0xb555245c in nsProfileLock::FatalSignalHandler (signo=11, info=0xbe907210, context=0xbe907290) at /home/aosmond/dev/mozilla/B2G_HG/b2g-inbound/profile/dirserviceprovider/nsProfileLock.cpp:189 #4 0xb58594d4 in AsmJSFaultHandler (context=0xbe907290, info=0xbe907210, signum=11) at ../../../../B2G_HG/b2g-inbound/js/src/asmjs/AsmJSSignalHandlers.cpp:978 #5 AsmJSFaultHandler (signum=11, info=0xbe907210, context=0xbe907290) at ../../../../B2G_HG/b2g-inbound/js/src/asmjs/AsmJSSignalHandlers.cpp:960 #6 0xb6f48388 in ?? () #7 0xb6f48388 in ?? () It seems to me crash reporter ran just fine :). (In reply to Jed Davis [:jld] from comment #27) > It's not that complicated; Breakpad already has code for it: > > https://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google- > breakpad/src/client/linux/handler/exception_handler.cc#324 I've been looking at this closely, reading up on _exit, details on signals and I think I may have over complicated the solution :). It *would* be more or less a copy-paste situation except for the fact that if crash reporter is present, it is doing what we need anyways! We just need to stop calling _exit which prevents the core. Now there is something I should note in case somebody has a solution (or if this is a known limitation?). If I crash b2g with kill -s SIGSEGV, it will attempt to respawn. But until the device is rebooted, all of these respawn attempts generate a core. If this happens in normal crashes, we will quickly create a lot of core files. I could change the core_pattern definition so that it overwrites the same file each time, but that wouldn't stop the cycle, just the wasted disk space / write cycles. I will attach a stack trace in case anybody wishes to look at it. (Note the continuous respawning happens even without any changes, so it has nothing to do with the signal handler.)

Attachment #8480274 - Flags: review?(benjamin)

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

11 years ago

Attachment #8471845 - Attachment is obsolete: true

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 29

•

11 years ago

Attached file post_sigsegv_respawn_core_backtrace.log (obsolete) — Details

Michael Wu [:mwu]

Comment 30

•

11 years ago

If you have a crash loop you want to investigate, start b2g manually: adb shell stop b2g adb shell b2g.sh

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 31

•

11 years ago

Attached patch Force b2g process to generate core dump, v3 (obsolete) — Details — Splinter Review

(In reply to Michael Wu [:mwu] from comment #30) > If you have a crash loop you want to investigate, start b2g manually: > > adb shell stop b2g > adb shell b2g.sh Thanks Michael, that was useful :). I have resolved the crash loop and b2g is able to restart itself once more without issue. If there is a crash reporter waiting to be processed when it starts up, it would crash because we hadn't finished initializing everything yet; it wouldn't normally initialize until the first gecko child process was spawned which would be too late. I am very unfamiliar with the startup code for gecko so if there is a better place to put this, by all means make a suggestion (I was considering XRE_main in toolkit/xre/nsAppRunner.cpp).

Attachment #8480274 - Attachment is obsolete: true

Attachment #8480275 - Attachment is obsolete: true

Attachment #8480274 - Flags: review?(benjamin)

Attachment #8480854 - Flags: review?(benjamin)

Benjamin Smedberg

Comment 32

•

11 years ago

Comment on attachment 8480854 [details] [diff] [review] Force b2g process to generate core dump, v3 I don't understand the OOPInit addition. It seems like you're initializing OOP crash reporting unconditionally on startup, and that's not what we want in most cases (at least on desktop builds). Why was that added/why is it necessary? The other bits here need to be reveiwed by somebody who understands linux syscalls, which is not really me. I'm going to pick jld out of a hat!

Attachment #8480854 - Flags: review?(jld)

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 33

•

11 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #32) > Comment on attachment 8480854 [details] [diff] [review] > Force b2g process to generate core dump, v3 > > I don't understand the OOPInit addition. It seems like you're initializing > OOP crash reporting unconditionally on startup, and that's not what we want > in most cases (at least on desktop builds). Why was that added/why is it > necessary? > > The other bits here need to be reveiwed by somebody who understands linux > syscalls, which is not really me. I'm going to pick jld out of a hat! It shouldn't be unconditionally. CheckForLastRunCrash is always executed at startup. If there is an unprocessed crash report (e.g. in the scenario where I crash the main b2g process with kill -s SIGSEGV), it wants to move it to the pending directory so that the crash report can be eventually submitted (at least as far as I understand the process). It will *not* have reached the init code I added unless it finds that minidump file and successfully opens/extracts info from it. The critical call path follows immediately after where I added the init check: CheckForLastRunCrash calls MoveToPending which calls GetPendingDir. GetPendingDir requires that OOP should be initialized in order to continue and checks this with a MOZ_ASSERT; this check fails without the above patch, and that causes the process to go down in my debug build. b2g gets restarted by the init process (?) and the cycle starts again because the crash report has yet to be moved to the pending folder. Note the phone won't recover without deleting the crash report files or a reflash (reboots do not help). This can get very frustrating. Either the MOZ_ASSERT is superfluous in which case we can remove it, or we need OOPInit to be called (or redesign!). Maybe the test case I am using is somehow flawed, using an external signal to bring it down? I do most of my work in the plugin-container context and those crash reports / core dumps are generated just fine, so I'm not too sure where the best place to try this out would be :). wcosta: Did you experience a process crash loop due to an assert in GetPending after crashing the process using the aborts mentioned earlier in the thread?

Flags: needinfo?(aosmond) → needinfo?(wcosta)

Wander Lairson Costa

Comment 34

•

11 years ago

(In reply to Andrew Osmond [:aosmond] from comment #33) [snip] > wcosta: Did you experience a process crash loop due to an assert in > GetPending after crashing the process using the aborts mentioned earlier in > the thread? Yes I did, details on how to reproduce it at Bug 1048402 (it was closed as "won't fix").

Flags: needinfo?(wcosta)

Benjamin Smedberg

Comment 35

•

11 years ago

The GetPendingDir change should be moved to a separate bug to avoid confusing this one. The basic assumption here is that GetPendingDir is only called by OOP crash reporting code. It relies on OOPInit in order to initialize `pendingDir` and so that's why we're asserting the relationship. It seems that CheckForLastRunCrash violates that assumption because it uses GetPendingDir in an entirely separate context. I don't think that initializing all of OOP crash reporting to solve this initialization issue is the correct solution, but we should continue in the new bug so Ted can help sort this through.

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 36

•

11 years ago

Comment on attachment 8480854 [details] [diff] [review] Force b2g process to generate core dump, v3 Review of attachment 8480854 [details] [diff] [review]: ----------------------------------------------------------------- ::: profile/dirserviceprovider/nsProfileLock.cpp @@ +33,5 @@ > #include <rmsdef.h> > #endif > > +#if defined(MOZ_B2G) && !defined(MOZ_CRASHREPORTER) > +#include <sys/syscall.h> See below about raise(). @@ +200,5 @@ > + case SIGQUIT: > + case SIGILL: > + case SIGABRT: > + case SIGSEGV: > +#ifndef MOZ_CRASHREPORTER This could use a comment to the effect that, if MOZ_CRASHREPORTER *is* defined, then its handler already did this (when called as `oldact->sa_sigaction`, above), so if that handler returned instead of terminating the process, then this handler should also return. @@ +204,5 @@ > +#ifndef MOZ_CRASHREPORTER > + // Retrigger the signal for those that can generate a core dump > + signal(signo, SIG_DFL); > + if (info->si_code <= 0) { > + if (syscall(__NR_tgkill, getpid(), syscall(__NR_gettid), signo) < 0) { This can just be `raise(signo)`, I think. In particular, on B2G/Android we use the definition of `raise` in mozglue/build/BionicGlue.cpp to work around Android bugs, and it's exactly the above collection of syscalls.

Attachment #8480854 - Flags: review?(jld) → review+

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 37

•

11 years ago

Comment on attachment 8480854 [details] [diff] [review] Force b2g process to generate core dump, v3 Review of attachment 8480854 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/crashreporter/nsExceptionHandler.cpp @@ +2778,5 @@ > > + if (!OOPInitialized()) { > + MOZ_ASSERT(NS_IsMainThread()); > + OOPInit(); > + } Is this a fix for bug 820716?

Benjamin Smedberg

Comment 38

•

11 years ago

Comment on attachment 8480854 [details] [diff] [review] Force b2g process to generate core dump, v3 Please move the nsExceptionHandler.cpp changes to a different bug. This one can land with jld's review.

Attachment #8480854 - Flags: review?(benjamin)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 39

•

11 years ago

Attached patch Force b2g process to generate core dump, v3.1 (obsolete) — Details — Splinter Review

(In reply to Jed Davis [:jld] from comment #37) > Comment on attachment 8480854 [details] [diff] [review] > Force b2g process to generate core dump, v3 > > Review of attachment 8480854 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: toolkit/crashreporter/nsExceptionHandler.cpp > @@ +2778,5 @@ > > > > + if (!OOPInitialized()) { > > + MOZ_ASSERT(NS_IsMainThread()); > > + OOPInit(); > > + } > > Is this a fix for bug 820716? Yes. Well, an attempt ;). (In reply to Benjamin Smedberg [:bsmedberg] from comment #38) > Comment on attachment 8480854 [details] [diff] [review] > Force b2g process to generate core dump, v3 > > Please move the nsExceptionHandler.cpp changes to a different bug. This one > can land with jld's review. Removed; just has the signal handler changes now.

Attachment #8480854 - Attachment is obsolete: true

Attachment #8491972 - Flags: review+

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 40

•

11 years ago

Attached file Pull request on device-flame, v1.1 (obsolete) — Details

Collapse changes into single commit.

Attachment #8464745 - Attachment is obsolete: true

Attachment #8464745 - Flags: review?(mwu)

Attachment #8491995 - Flags: review?(mwu)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 41

•

11 years ago

Attached file Pull request on B2G, v2 — Details

Collapse changes into single commit.

Attachment #8461285 - Attachment is obsolete: true

Attachment #8461285 - Flags: review?(mwu)

Attachment #8491996 - Flags: review?(mwu)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 42

•

11 years ago

Attached file Pull request on gonk-misc, v1.1 (obsolete) — Details

Collapse changes into single commit. Note that system/core/init/builtins.c stubbed out the exec keyword for the init process on flame-kk, so you need to run b2g-prlimit manually at present if testing (but it otherwise functions as expected).

Attachment #8461289 - Attachment is obsolete: true

Attachment #8461289 - Flags: review?(mwu)

Attachment #8491998 - Flags: review?(mwu)

Michael Wu [:mwu]

Comment 43

•

11 years ago

Comment on attachment 8491996 [details] [review] Pull request on B2G, v2 I think Dave might know this script better than I do at this point. Feel free to pass it back to me if it doesn't make sense.

Attachment #8491996 - Flags: review?(mwu) → review?(dhylands)

Dave Hylands [:dhylands]

Comment 44

•

11 years ago

Comment on attachment 8491996 [details] [review] Pull request on B2G, v2 I'd like to see error handling for required arguments to core. What if the extra arguments aren't provided? If they point to a file, they should error if the file doesn't exist. Also, I should be able to use ./run-ddd.sh or ./run-gdb.sh with a core file (ddd is a graphical front-end for gdb).

Attachment #8491996 - Flags: review?(dhylands)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 45

•

11 years ago

Comment on attachment 8491996 [details] [review] Pull request on B2G, v2 (In reply to Dave Hylands [:dhylands] from comment #44) > Comment on attachment 8491996 [details] [review] > Pull request on B2G, v1.1 > > I'd like to see error handling for required arguments to core. What if the > extra arguments aren't provided? If they point to a file, they should error > if the file doesn't exist. > > Also, I should be able to use ./run-ddd.sh or ./run-gdb.sh with a core file > (ddd is a graphical front-end for gdb). Updated.

Attachment #8491996 - Attachment description: Pull request on B2G, v1.1 → Pull request on B2G, v2

Attachment #8491996 - Flags: review?(dhylands)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 46

•

11 years ago

Attached file Pull request on gonk-misc, v3 (obsolete) — Details

Update patch to avoid dependencies on init.rc. - This makes it easier to work with all platforms (not just flame) at the cost of losing dynamic property updates. - Set persist.debug.coredump to either "all" (same as before, all processes now will dump cores) or "b2g" (only b2g processes will dump cores), followed by a device reboot, after which it will remain persistent and automatically turn on. To disable coredumps simply clear the persist.debug.coredump property and reboot. - Can only enable core dumps on userdebug or eng builds.

Attachment #8491995 - Attachment is obsolete: true

Attachment #8491998 - Attachment is obsolete: true

Attachment #8491995 - Flags: review?(mwu)

Attachment #8491998 - Flags: review?(mwu)

Attachment #8499950 - Flags: review?(mwu)

Michael Wu [:mwu]

Comment 47

•

11 years ago

Comment on attachment 8499950 [details] [review] Pull request on gonk-misc, v3 Review on PR. Looks good overall - just a few things need adjusting.

Attachment #8499950 - Flags: review?(mwu)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 48

•

11 years ago

Comment on attachment 8499950 [details] [review] Pull request on gonk-misc, v3 Updated based on comments.

Attachment #8499950 - Attachment description: Pull request on gonk-misc, v2 → Pull request on gonk-misc, v3

Attachment #8499950 - Flags: review?(mwu)

Michael Wu [:mwu]

Comment 49

•

11 years ago

Comment on attachment 8499950 [details] [review] Pull request on gonk-misc, v3 Looks good. Might want to fold the commits together before merging.

Attachment #8499950 - Flags: review?(mwu) → review+

Dave Hylands [:dhylands]

Comment 50

•

11 years ago

Comment on attachment 8491996 [details] [review] Pull request on B2G, v2 Looks reasonable - just a couple minor nits.

Attachment #8491996 - Flags: review?(dhylands) → review+

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 51

•

11 years ago

Fixed nits and squashed pull requests.

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 52

•

11 years ago

https://github.com/mozilla-b2g/B2G/commit/9a12f3fe3cea54a2dd8f4343a21c52e2365d5a91 https://github.com/mozilla-b2g/gonk-misc/commit/8ae27b558791b8b47559a5336e797c6ca8d0f017

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Keywords: checkin-needed

Resolution: --- → FIXED

Target Milestone: --- → 2.1 S6 (10oct)

Carsten Book [:Tomcat]

Comment 53

•

11 years ago

(In reply to Carsten Book [:Tomcat] from comment #52) > https://github.com/mozilla-b2g/gonk-misc/commit/ > 8ae27b558791b8b47559a5336e797c6ca8d0f017 had to revert this change for bustage like https://treeherder.mozilla.org/ui/logviewer.html#?job_id=605782&repo=b2g-inbound

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 54

•

11 years ago

Attached file Pull request on gonk-misc, v4 — Details

New pull request which should fix the issue on hamachi (and other devices with older kernels).

Attachment #8499950 - Attachment is obsolete: true

Attachment #8501683 - Flags: review+

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

11 years ago

Keywords: checkin-needed

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 55

•

11 years ago

Documentation may be found/updated at: https://developer.mozilla.org/en-US/Firefox_OS/Debugging/Debugging_B2G_using_gdb#Debugging_core_files

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 56

•

11 years ago

(In reply to Andrew Osmond [:aosmond] from comment #54) > Created attachment 8501683 [details] [review] > Pull request on gonk-misc, v4 > > New pull request which should fix the issue on hamachi (and other devices > with older kernels). Nit: The actual kernel probably does support it, but the header files in that version of the Android SDK don't. The prlimit64 syscall (at least on ARM) landed in Linux 2.6.36; every physical B2G device is at least 3.0.x, but the ICS B2G emulator is 2.6.29. So, for future reference if we ever want this feature on older devices, something like this should work (or else fail cleanly at runtime with ENOSYS if the kernel is actually too old): #ifndef __NR_prlimit64 #ifdef __ARM_EABI__ #define __NR_prlimit64 369 #endif #endif

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 57

•

11 years ago

(In reply to Jed Davis [:jld] from comment #56) > (In reply to Andrew Osmond [:aosmond] from comment #54) > > Created attachment 8501683 [details] [review] > > Pull request on gonk-misc, v4 > > > > New pull request which should fix the issue on hamachi (and other devices > > with older kernels). > > Nit: The actual kernel probably does support it, but the header files in > that version of the Android SDK don't. The prlimit64 syscall (at least on > ARM) landed in Linux 2.6.36; every physical B2G device is at least 3.0.x, > but the ICS B2G emulator is 2.6.29. > > So, for future reference if we ever want this feature on older devices, > something like this should work (or else fail cleanly at runtime with ENOSYS > if the kernel is actually too old): > > #ifndef __NR_prlimit64 > #ifdef __ARM_EABI__ > #define __NR_prlimit64 369 > #endif > #endif Huh, obviously I didn't look at it closely enough :). I vaguely recalled looking at the versions that introduced the API ages ago, and just assumed that's what the symbol missing was referring to. Especially without a hamachi device at hand to do testing with. Thanks for the tips! If b2g-prlimit doesn't work, you can still set persist.debug.coredump=b2g and get core dumps for b2g/content processes only, even on the ICS B2G emulator. That depends on setting ulimit for the shell process before it spawns b2g rather than prlimit for arbitrary ones.

Ryan VanderMeulen [:RyanVM]

Comment 58

•

11 years ago

Master: https://github.com/mozilla-b2g/gonk-misc/commit/cc1f362ce43dce92ac786187ff4abf39060094bd

Keywords: checkin-needed

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

10 years ago

Blocks: 1181715

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 59

•

10 years ago

Attached patch Force b2g process to generate core dump, v3.2 [carries r=jld] — Details — Splinter Review

Somehow this was missed when the rest of the patches were landed. Until recently I was able to get core dumps from b2g anyways (but probably not the parent process). This should fix that in addition to letting the debugger capture signals properly. try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4372d927f282

Attachment #8491972 - Attachment is obsolete: true

Attachment #8633699 - Flags: review+

Pulsebot

Comment 60

•

10 years ago

https://hg.mozilla.org/integration/b2g-inbound/rev/1bc523435edd

Ryan VanderMeulen [:RyanVM]

Comment 61

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/1bc523435edd

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

Target Milestone: 2.1 S6 (10oct) → FxOS-S3 (24Jul)

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Updated

•

10 years ago

Depends on: 1190965

You need to log in before you can comment on or make changes to this bug.