Closed Bug 866937 Opened 11 years ago Closed 8 years ago

B2G crash stacks are missing symbol information

Categories

(Release Engineering :: General, defect)

x86
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jrmuizel, Unassigned)

References

Details

(Whiteboard: [leave-open])

Attachments

(1 file)

If you look here you'll see we're missing symbols for libui and libc
Blocks: 818103
No longer blocks: 818103
Blocks: 818103
Component: Release Engineering → Release Engineering: Automation (General)
QA Contact: catlee
Jeff, do you have an example crash and hopefully some details of the build the crash came from ?

This came from a recent unagi nightly
 http://symbols.mozilla.org/b2g/b2g-18.0-Android-20130429070204-arm-symbols.txt
and contains libui.so and libc.so

For kicks, both m-b2g18 and mozilla-b2g18_v1_0_1 are posting manifests with the exact filename. They'll be overwriting each other, but we'll get the union of the symbols sets.
OS: Mac OS X → Gonk (Firefox OS)
I think the "here" where comment 0 meant you to look was at bug 818103, where we crash emulator test runs (or hang them, it's never been entirely clear) all day long, always like https://tbpl.mozilla.org/php/getParsedLog.php?id=22381622&tree=Mozilla-Central at libc.so + 0xdc04
The emulator images are prebuilt, AIUI, so we don't have symbols for them in the symbol packages that get uploaded with the builds. We only have symbols for Gecko. This basically boils down to bug 528231. :-/
That's not the whole story, because the call to the test harness has 
  --symbols-path <url to gecko symbols>
and what's missing is the emulator symbols. Those appear to present inside the emulator package already, for example
  b2g-distro/out/target/product/generic/symbols/system/lib/libc.so

Could we teach the test harness to take more than one --symbols-path ?
Apparently today is not a proof reading day, because ted said that already. I think the point about non-stripped binaries in b2g-distro/out/target/product/generic/symbols stands though, and the question that follows from that.
We would have to dump those into Breakpad format to make them usable, currently. If we can make that happen I'd be happy to teach the harness how to take an extra symbols path.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #6)
> We would have to dump those into Breakpad format to make them usable,
> currently. If we can make that happen I'd be happy to teach the harness how
> to take an extra symbols path.

What's involved in dumping those symbols in breakpad format?
It's mostly just running dump_syms on every binary in $(PRODUCT_OUT)/symbols) from the build. If you have a B2G build dir you can just "./build.sh buildsymbols" and grab the symbols.zip from gecko-objdir/dist.
In order for this to work with the way that emulators are currently used in buildbot, we'd need to produce the buildsymbols in Jenkins, upload them to tooltool along with the emulator they came from, and download them in the relevant mozharness scripts.

I'm fine with that, but I think we may be getting full-stack emulator builds in buildbot soon-ish (bug 807792) which would make that work obsolete, so I'm tempted just to let this slide until that happens.
That doesn't sound like too much work.. We already need to copy the emulator to tooltool manually, I don't think copying one extra file at the same time will make things much more complicated than they already are.

I guess it depends on:
1) When will bug 807792 be finished? (catlee's last comment was that he hasn't had a chance to look at it yet)
2) Is this blocking bug 818103? If so, might be worth doing sooner rather than later.
Blocks: 873454
Product: mozilla.org → Release Engineering
All of the debug B2G tests are failing now, and crashing in libc with no stack trace, e.g., https://tbpl.mozilla.org/php/getParsedLog.php?id=29445946&tree=Cedar&full=1

:ahal, do we know what to do to fix this?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> It's mostly just running dump_syms on every binary in
> $(PRODUCT_OUT)/symbols) from the build. If you have a B2G build dir you can
> just "./build.sh buildsymbols" and grab the symbols.zip from
> gecko-objdir/dist.

I don't know any more than what :ted mentioned in comment 6 and comment 8. It doesn't sound too difficult of a problem though, and if you want me to handle the multiple --symbols-path options once we have the binaries in breakpad format I can look into it.
(In reply to Andrew Halberstadt [:ahal] from comment #12)
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> > It's mostly just running dump_syms on every binary in
> > $(PRODUCT_OUT)/symbols) from the build. If you have a B2G build dir you can
> > just "./build.sh buildsymbols" and grab the symbols.zip from
> > gecko-objdir/dist.
> 
> I don't know any more than what :ted mentioned in comment 6 and comment 8.
> It doesn't sound too difficult of a problem though, and if you want me to
> handle the multiple --symbols-path options once we have the binaries in
> breakpad format I can look into it.

That would be great, thanks.
Comment 11 doesn't show "missing symbols", it shows a complete lack of a minidump. It's hitting a MOZ_ASSERT, which should be a totally safe reproducible crash. It doesn't look like we're catching it at all, which leads me to believe that Breakpad is not enabled for some reason.

From that log:
10:24:05     INFO -  10-21 17:24:04.690    45    45 F MOZ_Assert: Assertion failure: mState != hal::SWITCH_STATE_UNKNOWN, at ../../../gecko/dom/system/gonk/AudioChannelManager.h:52
10:24:05    ERROR -  10-21 17:24:04.690    45    45 F libc    : Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
10:24:05    ERROR -  This usually indicates the B2G process has crashed
I think breakpad is enabled [1], but it's possible that check_for_crashes isn't getting called anymore for some reason. I added some debugging info [2] to cedar which should help us figure out what's going on.

[1] https://github.com/mozilla/mozbase/blob/master/mozrunner/mozrunner/remote.py#L96
[2] https://hg.mozilla.org/projects/cedar/rev/18f3bc305b50
(In reply to Jonathan Griffin (:jgriffin) from comment #11)
> All of the debug B2G tests are failing now, and crashing in libc with no
> stack trace, e.g.,
> https://tbpl.mozilla.org/php/getParsedLog.php?id=29445946&tree=Cedar&full=1
> 
> :ahal, do we know what to do to fix this?

This is bug 929139.  I've asked dougt to review mchen's patch there in baku's absence.
I did some debugging on Cedar. First, we aren't checking for crashes in all the places we should be, so I fixed that. Though even with that fixed there still aren't any minidumps being found (ctrl-f for "checking for crashes"):
https://tbpl.mozilla.org/php/getParsedLog.php?id=29518599&tree=Cedar&full=1

In this case, it looks like the crash is happening before we restart the b2g process with the env variables in comment 15, so ted's assessment that the crashreporter isn't enabled is likely correct. For this particular instance we can either:

1) try to pass in an environment to the initial b2g process on emulator startup (not sure if this is possible)
2) enable it in the builds
3) live without stacks for startup crashes
(In reply to Andrew Halberstadt [:ahal] from comment #17)
> I did some debugging on Cedar. First, we aren't checking for crashes in all
> the places we should be, so I fixed that. Though even with that fixed there
> still aren't any minidumps being found (ctrl-f for "checking for crashes"):
> https://tbpl.mozilla.org/php/getParsedLog.php?id=29518599&tree=Cedar&full=1
> 
> In this case, it looks like the crash is happening before we restart the b2g
> process with the env variables in comment 15, so ted's assessment that the
> crashreporter isn't enabled is likely correct. For this particular instance
> we can either:
> 
> 1) try to pass in an environment to the initial b2g process on emulator
> startup (not sure if this is possible)
> 2) enable it in the builds
> 3) live without stacks for startup crashes

Ouch!  We can't do 1) without modifying the build, so 1) and 2) are effectively the same.  Would there be any disadvantage to enabling the crash report in the build, for engineering builds?
Flags: needinfo?(ted)
ahal and I discussed this on IRC. bug 717538 enabled crash reporting by default for all non-debug builds. We left debug builds out because crash reporting interferes with debugging, so we didn't want to make developers' lives harder. I'd be fine with changing that default on B2G if it solves this problem and isn't a big inconvenience to developers. (You can still set MOZ_CRASHREPORTER_DISABLE=1 to turn it off.)
Flags: needinfo?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #19)
> ahal and I discussed this on IRC. bug 717538 enabled crash reporting by
> default for all non-debug builds. We left debug builds out because crash
> reporting interferes with debugging, so we didn't want to make developers'
> lives harder. I'd be fine with changing that default on B2G if it solves
> this problem and isn't a big inconvenience to developers. (You can still set
> MOZ_CRASHREPORTER_DISABLE=1 to turn it off.)

I think that would be acceptable, as long as we document it well.
What's the status here? Is it possible to get symbols and stacks on debug builds?
(In reply to Gregor Wagner [:gwagner] from comment #21)
> What's the status here? Is it possible to get symbols and stacks on debug
> builds?

Is there a crash that this is holding up investigation for? It's hard to tell what is and isn't working at this point, but I believe we should be getting stacks on debug builds that happen during a test run. It's only if the emulator crashes before the test harness has a chance to manually turn crash reporting on that we wouldn't get symbols. Of course that's in theory.

But yes, we should enable it by default on debug builds either way. I'll figure out how to do that.
nsExceptionHandler seemed like the better place to fix this. Otherwise we'd be overriding an override.
Attachment #8339501 - Flags: review?(ted)
(In reply to Andrew Halberstadt [:ahal] from comment #22)
> (In reply to Gregor Wagner [:gwagner] from comment #21)
> > What's the status here? Is it possible to get symbols and stacks on debug
> > builds?
> 
> Is there a crash that this is holding up investigation for? It's hard to
> tell what is and isn't working at this point, but I believe we should be
> getting stacks on debug builds that happen during a test run. It's only if
> the emulator crashes before the test harness has a chance to manually turn
> crash reporting on that we wouldn't get symbols. Of course that's in theory.
> 
> But yes, we should enable it by default on debug builds either way. I'll
> figure out how to do that.

I am looking for example at the debug emulator with marionette:
https://tbpl.mozilla.org/php/getParsedLog.php?id=31127573&tree=Pine&full=1#error75

It would be nice to have a stack here.
Ah, it's possible that marionette isn't setting MOZ_CRASHREPORTER=1 anywhere. If that's the case, the above patch should fix it.
Comment on attachment 8339501 [details] [diff] [review]
Patch 1.0 - enable crashreporter by default on b2g debug builds

Review of attachment 8339501 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +807,5 @@
>  
> +#if !defined(DEBUG) || defined(MOZ_WIDGET_GONK)
> +  // In non-debug builds, enable the crash reporter by default, and allow
> +  // disabling it with the MOZ_CRASHREPORTER_DISABLE environment variable.
> +  // Also enable it by default in debug b2g builds as it is difficult to

Might want to say "gonk" instead of "b2g" here since that's what you're using.
Attachment #8339501 - Flags: review?(ted) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/4231aceecfe0

Leaving the bug open because of comment 27 and because I'm still not clear which symbols are actually missing.
Whiteboard: [leave-open]
(In reply to Gregor Wagner [:gwagner] from comment #27)
> This didn't do the trick here:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=31190112&tree=Pine&full=1

Are you sure that's a crash? There's no segfault in the logcat anymore.
(In reply to Andrew Halberstadt [:ahal] from comment #29)
> (In reply to Gregor Wagner [:gwagner] from comment #27)
> > This didn't do the trick here:
> > https://tbpl.mozilla.org/php/getParsedLog.php?id=31190112&tree=Pine&full=1
> 
> Are you sure that's a crash? There's no segfault in the logcat anymore.

Grep for MOZ_CRASH. I don't know why it's not linked any more.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: