Closed Bug 843296 Opened 11 years ago Closed 11 years ago

No crash information on b2g emulator crashes

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(b2g18 fixed)

RESOLVED FIXED
Tracking Status
b2g18 --- fixed

People

(Reporter: jrmuizel, Assigned: ahal)

References

Details

Attachments

(3 files, 5 obsolete files)

In the following log I have a crash but don't get any information about the where the crash is happening.

https://tbpl.mozilla.org/php/getParsedLog.php?id=19910355&tree=Mozilla-Inbound&full=1

This makes debugging very difficult.
We're not actually building symbols for these.
I don't recall whether there was some issue that prevented us from building+uploading symbols, but that would be a prerequisite for fixing this bug.
Depends on: 843303
I just filed a bug to add build symbols for these.  After that's done, we can modify the harnesses to dump the crash reports when they occur.
I've seen several complaints about how difficult it is to debug these types of crashes on irc and various places since this bug was filed. I think this should probably be a top b2g automation priority.
Blocks: b2g-testing
Blocks: 818103
Blocks: 821420
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
Forgot to pass the symbols path into the harnesses
These three patches *may* be all that's needed to get this working. I pushed the mozharness changes to mozharness_ash, watching:
https://tbpl.mozilla.org/?tree=Ash&rev=07e3c1fa05c6
(In reply to Andrew Halberstadt [:ahal] from comment #6)
> Created attachment 720041 [details] [diff] [review]
> Patch 1.0 (gecko) - use mozcrash instead of automationutils
> 
> These three patches *may* be all that's needed to get this working. I pushed
> the mozharness changes to mozharness_ash, watching:
> https://tbpl.mozilla.org/?tree=Ash&rev=07e3c1fa05c6

11:31:20     INFO -  Traceback (most recent call last):
11:31:20     INFO -    File "runreftestb2g.py", line 19, in <module>
11:31:20     INFO -      from b2gautomation import B2GRemoteAutomation
11:31:20     INFO -    File "/home/cltbld/talos-slave/test/build/tests/reftest/b2gautomation.py", line 5, in <module>
11:31:20     INFO -      import mozcrash
11:31:20     INFO -  ImportError: No module named mozcrash
Guessing we need the mozharness equivalent of bug 839569?
We have it for mh desktop unittests, but maybe not for mh b2g unittests.
For b2g unittests, we don't use mozbase modules from tests.zip, but from puppetagain.  So we need to add mozcrash to http://mxr.mozilla.org/mozilla-central/source/b2g/test/b2g-unittest-requirements.txt, and make sure it's available at http://puppetagain.pub.build.mozilla.org/data/python/packages/.
Mozcrash is already in puppetagain, need to add it to requirements.txt:
https://tbpl.mozilla.org/?tree=Ash&rev=dfa95695ef58
Ugh, mozlog isn't even listed as a dependency of mozcrash's setup.py:
https://tbpl.mozilla.org/php/getParsedLog.php?id=20343560&tree=Ash&full=1#error0
Fixed the dependency issue in bug 847970 and did a new push:
https://tbpl.mozilla.org/?tree=Ash&rev=098b9ab83b4b
Depends on: 848102
Depends on: 848124
Depends on: 849270
This is all that's needed on the mozharness side (symbols path is getting passed into harness on ash). This won't pass it in to marionette, but the marionette harness doesn't even have a --symbols-path option yet so I'll hold off on that until the other unittests have landed.

For an update on the gecko side, the crash dumps are being saved and left in the profile, but there seems to be a bug in devicemanagerADB when trying to pull them off the device. Currently investigating.
Attachment #720004 - Attachment is obsolete: true
Attachment #720007 - Attachment is obsolete: true
Attachment #722899 - Flags: review?(aki)
Comment on attachment 722899 [details] [diff] [review]
Patch 2.0 (mozharness) - pass symbols to unittest harness

This still becomes |--download-symbols true| for debug tests, right?
Attachment #722899 - Flags: review?(aki) → review+
(In reply to Aki Sasaki [:aki] from comment #15)
> Comment on attachment 722899 [details] [diff] [review]
> Patch 2.0 (mozharness) - pass symbols to unittest harness
> 
> This still becomes |--download-symbols true| for debug tests, right?

Yes. Actually I noticed that and couldn't figure out why it was happening :p
Depends on: 849822
This works locally, but before it can land the following needs to happen:
1) devicemanager 0.20 released (which includes fix in bug 849822)
2) devicemanager on puppetagain updated to 0.20
3) mozharness patch needs to land
4) set MINIDUMP_STACKWALK env on test slaves? Not sure if this last one is already done or not. If it is I'm not sure how to do it.
Attachment #720041 - Attachment is obsolete: true
Attachment #721254 - Attachment is obsolete: true
Attachment #723637 - Flags: review?(jgriffin)
No longer depends on: 848102
(In reply to Jonathan Griffin (:jgriffin) from comment #10)
> For b2g unittests, we don't use mozbase modules from tests.zip, but from
> puppetagain.  So we need to add mozcrash to
> http://mxr.mozilla.org/mozilla-central/source/b2g/test/b2g-unittest-
> requirements.txt, and make sure it's available at
> http://puppetagain.pub.build.mozilla.org/data/python/packages/.

I'm not wild about us having two different ways of getting mozbase for different unit test runs. Why is this different? We have a mirror of mozbase in m-c, we should use that.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #18)
> (In reply to Jonathan Griffin (:jgriffin) from comment #10)
> > For b2g unittests, we don't use mozbase modules from tests.zip, but from
> > puppetagain.  So we need to add mozcrash to
> > http://mxr.mozilla.org/mozilla-central/source/b2g/test/b2g-unittest-
> > requirements.txt, and make sure it's available at
> > http://puppetagain.pub.build.mozilla.org/data/python/packages/.
> 
> I'm not wild about us having two different ways of getting mozbase for
> different unit test runs. Why is this different? We have a mirror of mozbase
> in m-c, we should use that.

In mozharness, we share a single mozharness script across all of our gecko branches.  The details of what Python packages need to be installed in order to run e.g. Marionette live in this script.

We encountered a problem that different mozbase modules were required to run Marionette in m-c vs mozilla-b2g18...e.g., some packages started depending on some previously unrequired packages (e.g., moznetwork).  We couldn't update the mozharness script to reflect this, since those mozbase changes weren't universal and couldn't easily be uplifted.

Installing via puppetagain lets us call 'pip install /path/to/marionette' and all the deps get installed correctly on all branches.

FWIW, this is not different from how developers run B2G tests locally.  They current either user a helper sh script which runs Marionette's setup.py (installing packages from pypi rather than puppetagain), or they run setup.py themselves directly.

We could potentially change this in both places (mozharness and helper scripts) to use tests.zip again (in the mozharness case, by running setup_development.py in order to install all mozbase packages instead of cherry-picking the ones we actually use), but this isn't completely clean either.  There are test runs that use Marionette but don't need tests.zip, so we'd just be making those different from the unittest runs in favor of not having B2G vs desktop unittest runs install mozbase packages differently.
Comment on attachment 723637 [details] [diff] [review]
Patch 2.0 (gecko) - b2g emulator tests with minidump stackwalking

Review of attachment 723637 [details] [diff] [review]:
-----------------------------------------------------------------

\o/

::: build/mobile/b2gautomation.py
@@ +112,5 @@
>              try:
> +                crashed = mozcrash.check_for_crashes(local_dump_dir, symbolsPath, test_name=self.lastTestSeen)
> +            finally:
> +                shutil.rmtree(local_dump_dir)
> +                self._devicemanager.removeDir(remote_dump_dir)

Can we omit this step for B2G?  For emulators, which is the only config that unit tests work on atm, we don't need to do this cleanup, since none of the profile changes are persistent anyway.  On a device, the test profile would get removed anyway, so this is redundant.
Attachment #723637 - Flags: review?(jgriffin) → review+
(In reply to Jonathan Griffin (:jgriffin) from comment #20)
> Can we omit this step for B2G?  For emulators, which is the only config that
> unit tests work on atm, we don't need to do this cleanup, since none of the
> profile changes are persistent anyway.  On a device, the test profile would
> get removed anyway, so this is redundant.

Are you sure about that? When testing locally I noticed when launching the emulator a second time after a crash there would exist an empty /data/local/tests/minidumps folder and a new entry in "/data/mozilla/b2g/Crash Reports"
Jeff is working on that in bug 838374 and hopes to have it landed by end of week.
Sigh, ignore comment 22 (wrong bug)
(In reply to Andrew Halberstadt [:ahal] from comment #21)
> (In reply to Jonathan Griffin (:jgriffin) from comment #20)
> > Can we omit this step for B2G?  For emulators, which is the only config that
> > unit tests work on atm, we don't need to do this cleanup, since none of the
> > profile changes are persistent anyway.  On a device, the test profile would
> > get removed anyway, so this is redundant.
> 
> Are you sure about that? When testing locally I noticed when launching the
> emulator a second time after a crash there would exist an empty
> /data/local/tests/minidumps folder and a new entry in
> "/data/mozilla/b2g/Crash Reports"

Interesting.  Well, I guess it doesn't hurt to keep this code, it shouldn't cost much time to do the cleanup.
(In reply to Jonathan Griffin (:jgriffin) from comment #24)
> Interesting.  Well, I guess it doesn't hurt to keep this code, it shouldn't
> cost much time to do the cleanup.

Plus it only gets executed if a crash was detected, so should have a negligible impact on slave load.
Depends on: 851091
Comment on attachment 722899 [details] [diff] [review]
Patch 2.0 (mozharness) - pass symbols to unittest harness

Pushed mozharness patch: https://hg.mozilla.org/build/mozharness/rev/f00b96a6eba0
Attachment #722899 - Flags: checked-in+
Blocks: 834016
https://hg.mozilla.org/integration/mozilla-inbound/rev/dadc8416f4df

I'm going to leave open to work on adding support for this in marionette tests as well as in case the test slaves don't have the minidum_stackwalk binary available.
Whiteboard: [leave-open]
checkForCrashes isn't getting run in the case where an exception is thrown (e.g socket timeouts) which covers a lot of the use cases we care about (https://tbpl.mozilla.org/php/getParsedLog.php?id=20780370&tree=Mozilla-Inbound&full=1)

I pushed a patch to try which should fix this: https://tbpl.mozilla.org/?tree=Try&rev=7025d6ae8156
These patches seem to have gotten rid of the "SIGSEGV" messages in the logcat, but not the timeouts. This makes no sense to me.
(In reply to Andrew Halberstadt [:ahal] from comment #31)
> These patches seem to have gotten rid of the "SIGSEGV" messages in the
> logcat, but not the timeouts. This makes no sense to me.

Me either, but I've caught these cases locally, and the B2G process *has* crashed.  I'm not sure why we're no longer seeing it in the log, however.
So either the crashes aren't generating minidumps in the profile or the slaves don't have minidump_stackwalk (or both). This try push should tell us more: https://tbpl.mozilla.org/?tree=Try&rev=6dd63c4f5e9f
Depends on: 853544
That really helped. There are two problems here:
1) no minidump_stackwalk binary (filed bug 853544)
2) mochitest and reftest are still using the in-tree mozdevice which doesn't have the fix for bug 849822 yet. To fix this we can either wait for mozbase to be synced (bug 838374) or we can change them to use the puppetagain packages (which I think we wanted to do anyway)
Depends on: 853558
Depends on: 853591
Attachment #723637 - Flags: checked-in+
Finally, we have crash stacks! https://tbpl.mozilla.org/php/getParsedLog.php?id=21057455&tree=Mozilla-Inbound&full=1#error2

For anyone blocked by this bug: for most purposes you can consider this RESOLVED FIXED. I'm going to leave it open for now to catch a few edge case crashes we've seen during setup (before the test harness is even invoked).
This patch should run check_for_crashes after every time we try to connect to marionette, thus catching all those socket timeout errors we've been seeing intermittently. I haven't actually seen this working in the wild yet, so pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=edfe996a5402
Attachment #728966 - Flags: review?(jgriffin)
Beh, I noticed a bug.. need to use posixpath.dirname() when building the minidump path with 'IsRelative' set to 1. Sorry for the spam

Re-pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=931eac8d9c5c
Attachment #728966 - Attachment is obsolete: true
Attachment #728966 - Flags: review?(jgriffin)
Attachment #728974 - Flags: review?(jgriffin)
Comment on attachment 728974 [details] [diff] [review]
Patch 3.1 (gecko) - add check_for_crashes ability to marionette and run during setup

Review of attachment 728974 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

::: testing/marionette/client/marionette/marionette.py
@@ +394,5 @@
> +        try:
> +            # We are ignoring desired_capabilities, at least for now.
> +            self.session = self._send_message('newSession', 'value')
> +        except:
> +            self.check_for_crash()

We should probably surround this also with try/except/print traceback, so that if an error occurs during check_for_crash, we still report the root cause (the socket.timeout, e.g.).
Attachment #728974 - Flags: review?(jgriffin) → review+
(In reply to Jonathan Griffin (:jgriffin) from comment #39)
> > +        except:
> > +            self.check_for_crash()
> 
> We should probably surround this also with try/except/print traceback, so
> that if an error occurs during check_for_crash, we still report the root
> cause (the socket.timeout, e.g.).

Good catch, I thought I had a "raise" there, but apparently not.
You did, I was just concerned if that self.check_for_crash() also threw, it would obscure the original exception.
Good point. I realized that wouldn't work though since doing the 'raise' afterwards would now refer to the newer exception. I fixed it by just printing the original exception and calling sys.exit() afterwards:
https://hg.mozilla.org/integration/mozilla-inbound/rev/21ce35576177

I'm going to call this bug done and work on marionette unittest crash support in a new bug.
Whiteboard: [leave-open]
Depends on: 854996
Blocks: 855279
https://hg.mozilla.org/mozilla-central/rev/21ce35576177
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Please can we backport this to b2g18? :-)
Yes, I'm working on it. Unfortunately there is a lot of merge mess and no try server.
Depends on: 860764
Depends on: 860773
Depends on: 870742
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: