Last Comment Bug 843296 - No crash information on b2g emulator crashes
: No crash information on b2g emulator crashes
Status: RESOLVED FIXED
:
Product: Release Engineering
Classification: Other
Component: Other (show other bugs)
: other
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: Andrew Halberstadt [:ahal]
:
:
Mentors:
: 839161 (view as bug list)
Depends on: 843303 848124 849270 849822 851091 853544 853558 853591 854996 860764 860773 870742
Blocks: 818103 821420 822399 834016 b2g-testing 853024 855279
  Show dependency treegraph
 
Reported: 2013-02-20 12:25 PST by Jeff Muizelaar [:jrmuizel]
Modified: 2013-08-12 21:54 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
fixed


Attachments
Patch 1.0 (mozharness) - set download_symbols to ondemand (518 bytes, patch)
2013-03-01 09:43 PST, Andrew Halberstadt [:ahal]
no flags Details | Diff | Splinter Review
Patch 1.1 (mozharness) - pass symbols_path to harnesses (2.81 KB, patch)
2013-03-01 09:59 PST, Andrew Halberstadt [:ahal]
no flags Details | Diff | Splinter Review
Patch 1.0 (gecko) - use mozcrash instead of automationutils (1.58 KB, patch)
2013-03-01 10:53 PST, Andrew Halberstadt [:ahal]
no flags Details | Diff | Splinter Review
Patch 1.1 (gecko) - add mozcrash to b2g_requirements.txt (350 bytes, patch)
2013-03-05 07:46 PST, Andrew Halberstadt [:ahal]
no flags Details | Diff | Splinter Review
Patch 2.0 (mozharness) - pass symbols to unittest harness (3.27 KB, patch)
2013-03-08 12:09 PST, Andrew Halberstadt [:ahal]
aki: review+
ahalberstadt: checked‑in+
Details | Diff | Splinter Review
Patch 2.0 (gecko) - b2g emulator tests with minidump stackwalking (3.41 KB, patch)
2013-03-11 14:01 PDT, Andrew Halberstadt [:ahal]
jgriffin: review+
ahalberstadt: checked‑in+
Details | Diff | Splinter Review
Patch 3.0 (gecko) - add check_for_crashes capability to marionette and run it during setup (14.26 KB, patch)
2013-03-25 07:15 PDT, Andrew Halberstadt [:ahal]
no flags Details | Diff | Splinter Review
Patch 3.1 (gecko) - add check_for_crashes ability to marionette and run during setup (14.26 KB, patch)
2013-03-25 07:34 PDT, Andrew Halberstadt [:ahal]
jgriffin: review+
Details | Diff | Splinter Review

Description Jeff Muizelaar [:jrmuizel] 2013-02-20 12:25:46 PST
In the following log I have a crash but don't get any information about the where the crash is happening.

https://tbpl.mozilla.org/php/getParsedLog.php?id=19910355&tree=Mozilla-Inbound&full=1

This makes debugging very difficult.
Comment 1 Aki Sasaki [:aki] 2013-02-20 12:29:05 PST
We're not actually building symbols for these.
I don't recall whether there was some issue that prevented us from building+uploading symbols, but that would be a prerequisite for fixing this bug.
Comment 2 Jonathan Griffin (:jgriffin) 2013-02-20 12:43:05 PST
I just filed a bug to add build symbols for these.  After that's done, we can modify the harnesses to dump the crash reports when they occur.
Comment 3 Andrew Halberstadt [:ahal] 2013-02-27 12:46:10 PST
I've seen several complaints about how difficult it is to debug these types of crashes on irc and various places since this bug was filed. I think this should probably be a top b2g automation priority.
Comment 4 Andrew Halberstadt [:ahal] 2013-03-01 09:43:32 PST
Created attachment 720004 [details] [diff] [review]
Patch 1.0 (mozharness) - set download_symbols to ondemand
Comment 5 Andrew Halberstadt [:ahal] 2013-03-01 09:59:40 PST
Created attachment 720007 [details] [diff] [review]
Patch 1.1 (mozharness) - pass symbols_path to harnesses

Forgot to pass the symbols path into the harnesses
Comment 6 Andrew Halberstadt [:ahal] 2013-03-01 10:53:25 PST
Created attachment 720041 [details] [diff] [review]
Patch 1.0 (gecko) - use mozcrash instead of automationutils

These three patches *may* be all that's needed to get this working. I pushed the mozharness changes to mozharness_ash, watching:
https://tbpl.mozilla.org/?tree=Ash&rev=07e3c1fa05c6
Comment 7 Ed Morley [:emorley] 2013-03-01 16:59:45 PST
(In reply to Andrew Halberstadt [:ahal] from comment #6)
> Created attachment 720041 [details] [diff] [review]
> Patch 1.0 (gecko) - use mozcrash instead of automationutils
> 
> These three patches *may* be all that's needed to get this working. I pushed
> the mozharness changes to mozharness_ash, watching:
> https://tbpl.mozilla.org/?tree=Ash&rev=07e3c1fa05c6

11:31:20     INFO -  Traceback (most recent call last):
11:31:20     INFO -    File "runreftestb2g.py", line 19, in <module>
11:31:20     INFO -      from b2gautomation import B2GRemoteAutomation
11:31:20     INFO -    File "/home/cltbld/talos-slave/test/build/tests/reftest/b2gautomation.py", line 5, in <module>
11:31:20     INFO -      import mozcrash
11:31:20     INFO -  ImportError: No module named mozcrash
Comment 8 Ed Morley [:emorley] 2013-03-01 17:01:12 PST
Guessing we need the mozharness equivalent of bug 839569?
Comment 9 Aki Sasaki [:aki] 2013-03-01 17:03:42 PST
We have it for mh desktop unittests, but maybe not for mh b2g unittests.
Comment 10 Jonathan Griffin (:jgriffin) 2013-03-04 11:31:17 PST
For b2g unittests, we don't use mozbase modules from tests.zip, but from puppetagain.  So we need to add mozcrash to http://mxr.mozilla.org/mozilla-central/source/b2g/test/b2g-unittest-requirements.txt, and make sure it's available at http://puppetagain.pub.build.mozilla.org/data/python/packages/.
Comment 11 Andrew Halberstadt [:ahal] 2013-03-05 07:46:07 PST
Created attachment 721254 [details] [diff] [review]
Patch 1.1 (gecko) - add mozcrash to b2g_requirements.txt

Mozcrash is already in puppetagain, need to add it to requirements.txt:
https://tbpl.mozilla.org/?tree=Ash&rev=dfa95695ef58
Comment 12 Andrew Halberstadt [:ahal] 2013-03-05 08:51:18 PST
Ugh, mozlog isn't even listed as a dependency of mozcrash's setup.py:
https://tbpl.mozilla.org/php/getParsedLog.php?id=20343560&tree=Ash&full=1#error0
Comment 13 Andrew Halberstadt [:ahal] 2013-03-05 09:42:01 PST
Fixed the dependency issue in bug 847970 and did a new push:
https://tbpl.mozilla.org/?tree=Ash&rev=098b9ab83b4b
Comment 14 Andrew Halberstadt [:ahal] 2013-03-08 12:09:37 PST
Created attachment 722899 [details] [diff] [review]
Patch 2.0 (mozharness) - pass symbols to unittest harness

This is all that's needed on the mozharness side (symbols path is getting passed into harness on ash). This won't pass it in to marionette, but the marionette harness doesn't even have a --symbols-path option yet so I'll hold off on that until the other unittests have landed.

For an update on the gecko side, the crash dumps are being saved and left in the profile, but there seems to be a bug in devicemanagerADB when trying to pull them off the device. Currently investigating.
Comment 15 Aki Sasaki [:aki] 2013-03-08 12:41:02 PST
Comment on attachment 722899 [details] [diff] [review]
Patch 2.0 (mozharness) - pass symbols to unittest harness

This still becomes |--download-symbols true| for debug tests, right?
Comment 16 Andrew Halberstadt [:ahal] 2013-03-08 13:12:36 PST
(In reply to Aki Sasaki [:aki] from comment #15)
> Comment on attachment 722899 [details] [diff] [review]
> Patch 2.0 (mozharness) - pass symbols to unittest harness
> 
> This still becomes |--download-symbols true| for debug tests, right?

Yes. Actually I noticed that and couldn't figure out why it was happening :p
Comment 17 Andrew Halberstadt [:ahal] 2013-03-11 14:01:08 PDT
Created attachment 723637 [details] [diff] [review]
Patch 2.0 (gecko) - b2g emulator tests with minidump stackwalking

This works locally, but before it can land the following needs to happen:
1) devicemanager 0.20 released (which includes fix in bug 849822)
2) devicemanager on puppetagain updated to 0.20
3) mozharness patch needs to land
4) set MINIDUMP_STACKWALK env on test slaves? Not sure if this last one is already done or not. If it is I'm not sure how to do it.
Comment 18 Ted Mielczarek [:ted.mielczarek] 2013-03-12 06:51:02 PDT
(In reply to Jonathan Griffin (:jgriffin) from comment #10)
> For b2g unittests, we don't use mozbase modules from tests.zip, but from
> puppetagain.  So we need to add mozcrash to
> http://mxr.mozilla.org/mozilla-central/source/b2g/test/b2g-unittest-
> requirements.txt, and make sure it's available at
> http://puppetagain.pub.build.mozilla.org/data/python/packages/.

I'm not wild about us having two different ways of getting mozbase for different unit test runs. Why is this different? We have a mirror of mozbase in m-c, we should use that.
Comment 19 Jonathan Griffin (:jgriffin) 2013-03-12 18:15:39 PDT
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #18)
> (In reply to Jonathan Griffin (:jgriffin) from comment #10)
> > For b2g unittests, we don't use mozbase modules from tests.zip, but from
> > puppetagain.  So we need to add mozcrash to
> > http://mxr.mozilla.org/mozilla-central/source/b2g/test/b2g-unittest-
> > requirements.txt, and make sure it's available at
> > http://puppetagain.pub.build.mozilla.org/data/python/packages/.
> 
> I'm not wild about us having two different ways of getting mozbase for
> different unit test runs. Why is this different? We have a mirror of mozbase
> in m-c, we should use that.

In mozharness, we share a single mozharness script across all of our gecko branches.  The details of what Python packages need to be installed in order to run e.g. Marionette live in this script.

We encountered a problem that different mozbase modules were required to run Marionette in m-c vs mozilla-b2g18...e.g., some packages started depending on some previously unrequired packages (e.g., moznetwork).  We couldn't update the mozharness script to reflect this, since those mozbase changes weren't universal and couldn't easily be uplifted.

Installing via puppetagain lets us call 'pip install /path/to/marionette' and all the deps get installed correctly on all branches.

FWIW, this is not different from how developers run B2G tests locally.  They current either user a helper sh script which runs Marionette's setup.py (installing packages from pypi rather than puppetagain), or they run setup.py themselves directly.

We could potentially change this in both places (mozharness and helper scripts) to use tests.zip again (in the mozharness case, by running setup_development.py in order to install all mozbase packages instead of cherry-picking the ones we actually use), but this isn't completely clean either.  There are test runs that use Marionette but don't need tests.zip, so we'd just be making those different from the unittest runs in favor of not having B2G vs desktop unittest runs install mozbase packages differently.
Comment 20 Jonathan Griffin (:jgriffin) 2013-03-12 18:20:29 PDT
Comment on attachment 723637 [details] [diff] [review]
Patch 2.0 (gecko) - b2g emulator tests with minidump stackwalking

Review of attachment 723637 [details] [diff] [review]:
-----------------------------------------------------------------

\o/

::: build/mobile/b2gautomation.py
@@ +112,5 @@
>              try:
> +                crashed = mozcrash.check_for_crashes(local_dump_dir, symbolsPath, test_name=self.lastTestSeen)
> +            finally:
> +                shutil.rmtree(local_dump_dir)
> +                self._devicemanager.removeDir(remote_dump_dir)

Can we omit this step for B2G?  For emulators, which is the only config that unit tests work on atm, we don't need to do this cleanup, since none of the profile changes are persistent anyway.  On a device, the test profile would get removed anyway, so this is redundant.
Comment 21 Andrew Halberstadt [:ahal] 2013-03-13 06:53:00 PDT
(In reply to Jonathan Griffin (:jgriffin) from comment #20)
> Can we omit this step for B2G?  For emulators, which is the only config that
> unit tests work on atm, we don't need to do this cleanup, since none of the
> profile changes are persistent anyway.  On a device, the test profile would
> get removed anyway, so this is redundant.

Are you sure about that? When testing locally I noticed when launching the emulator a second time after a crash there would exist an empty /data/local/tests/minidumps folder and a new entry in "/data/mozilla/b2g/Crash Reports"
Comment 22 Andrew Halberstadt [:ahal] 2013-03-13 06:56:25 PDT
Jeff is working on that in bug 838374 and hopes to have it landed by end of week.
Comment 23 Andrew Halberstadt [:ahal] 2013-03-13 06:57:12 PDT
Sigh, ignore comment 22 (wrong bug)
Comment 24 Jonathan Griffin (:jgriffin) 2013-03-13 19:58:36 PDT
(In reply to Andrew Halberstadt [:ahal] from comment #21)
> (In reply to Jonathan Griffin (:jgriffin) from comment #20)
> > Can we omit this step for B2G?  For emulators, which is the only config that
> > unit tests work on atm, we don't need to do this cleanup, since none of the
> > profile changes are persistent anyway.  On a device, the test profile would
> > get removed anyway, so this is redundant.
> 
> Are you sure about that? When testing locally I noticed when launching the
> emulator a second time after a crash there would exist an empty
> /data/local/tests/minidumps folder and a new entry in
> "/data/mozilla/b2g/Crash Reports"

Interesting.  Well, I guess it doesn't hurt to keep this code, it shouldn't cost much time to do the cleanup.
Comment 25 Andrew Halberstadt [:ahal] 2013-03-14 06:28:30 PDT
(In reply to Jonathan Griffin (:jgriffin) from comment #24)
> Interesting.  Well, I guess it doesn't hurt to keep this code, it shouldn't
> cost much time to do the cleanup.

Plus it only gets executed if a crash was detected, so should have a negligible impact on slave load.
Comment 26 Andrew Halberstadt [:ahal] 2013-03-15 07:16:51 PDT
Comment on attachment 722899 [details] [diff] [review]
Patch 2.0 (mozharness) - pass symbols to unittest harness

Pushed mozharness patch: https://hg.mozilla.org/build/mozharness/rev/f00b96a6eba0
Comment 27 Andrew Halberstadt [:ahal] 2013-03-15 07:32:34 PDT
Pushed gecko patch to try: https://tbpl.mozilla.org/?tree=Try&rev=8a978802a99c
Comment 28 Andrew Halberstadt [:ahal] 2013-03-18 07:04:54 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/dadc8416f4df

I'm going to leave open to work on adding support for this in marionette tests as well as in case the test slaves don't have the minidum_stackwalk binary available.
Comment 29 Andrew Halberstadt [:ahal] 2013-03-18 09:00:18 PDT
checkForCrashes isn't getting run in the case where an exception is thrown (e.g socket timeouts) which covers a lot of the use cases we care about (https://tbpl.mozilla.org/php/getParsedLog.php?id=20780370&tree=Mozilla-Inbound&full=1)

I pushed a patch to try which should fix this: https://tbpl.mozilla.org/?tree=Try&rev=7025d6ae8156
Comment 30 Ed Morley [:emorley] 2013-03-18 13:12:50 PDT
https://hg.mozilla.org/mozilla-central/rev/dadc8416f4df
Comment 31 Andrew Halberstadt [:ahal] 2013-03-18 14:19:26 PDT
These patches seem to have gotten rid of the "SIGSEGV" messages in the logcat, but not the timeouts. This makes no sense to me.
Comment 32 Phil Ringnalda (:philor) 2013-03-19 22:31:16 PDT
Suddenly, a wild SIGSEGV! https://tbpl.mozilla.org/php/getParsedLog.php?id=20864738&tree=Mozilla-Inbound
Comment 33 Jonathan Griffin (:jgriffin) 2013-03-20 17:03:41 PDT
(In reply to Andrew Halberstadt [:ahal] from comment #31)
> These patches seem to have gotten rid of the "SIGSEGV" messages in the
> logcat, but not the timeouts. This makes no sense to me.

Me either, but I've caught these cases locally, and the B2G process *has* crashed.  I'm not sure why we're no longer seeing it in the log, however.
Comment 34 Andrew Halberstadt [:ahal] 2013-03-21 08:27:55 PDT
So either the crashes aren't generating minidumps in the profile or the slaves don't have minidump_stackwalk (or both). This try push should tell us more: https://tbpl.mozilla.org/?tree=Try&rev=6dd63c4f5e9f
Comment 35 Andrew Halberstadt [:ahal] 2013-03-21 11:39:31 PDT
That really helped. There are two problems here:
1) no minidump_stackwalk binary (filed bug 853544)
2) mochitest and reftest are still using the in-tree mozdevice which doesn't have the fix for bug 849822 yet. To fix this we can either wait for mozbase to be synced (bug 838374) or we can change them to use the puppetagain packages (which I think we wanted to do anyway)
Comment 36 Andrew Halberstadt [:ahal] 2013-03-25 06:58:56 PDT
Finally, we have crash stacks! https://tbpl.mozilla.org/php/getParsedLog.php?id=21057455&tree=Mozilla-Inbound&full=1#error2

For anyone blocked by this bug: for most purposes you can consider this RESOLVED FIXED. I'm going to leave it open for now to catch a few edge case crashes we've seen during setup (before the test harness is even invoked).
Comment 37 Andrew Halberstadt [:ahal] 2013-03-25 07:15:37 PDT
Created attachment 728966 [details] [diff] [review]
Patch 3.0 (gecko) - add check_for_crashes capability to marionette and run it during setup

This patch should run check_for_crashes after every time we try to connect to marionette, thus catching all those socket timeout errors we've been seeing intermittently. I haven't actually seen this working in the wild yet, so pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=edfe996a5402
Comment 38 Andrew Halberstadt [:ahal] 2013-03-25 07:34:59 PDT
Created attachment 728974 [details] [diff] [review]
Patch 3.1 (gecko) - add check_for_crashes ability to marionette and run during setup

Beh, I noticed a bug.. need to use posixpath.dirname() when building the minidump path with 'IsRelative' set to 1. Sorry for the spam

Re-pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=931eac8d9c5c
Comment 39 Jonathan Griffin (:jgriffin) 2013-03-25 11:02:56 PDT
Comment on attachment 728974 [details] [diff] [review]
Patch 3.1 (gecko) - add check_for_crashes ability to marionette and run during setup

Review of attachment 728974 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

::: testing/marionette/client/marionette/marionette.py
@@ +394,5 @@
> +        try:
> +            # We are ignoring desired_capabilities, at least for now.
> +            self.session = self._send_message('newSession', 'value')
> +        except:
> +            self.check_for_crash()

We should probably surround this also with try/except/print traceback, so that if an error occurs during check_for_crash, we still report the root cause (the socket.timeout, e.g.).
Comment 40 Andrew Halberstadt [:ahal] 2013-03-25 11:07:29 PDT
(In reply to Jonathan Griffin (:jgriffin) from comment #39)
> > +        except:
> > +            self.check_for_crash()
> 
> We should probably surround this also with try/except/print traceback, so
> that if an error occurs during check_for_crash, we still report the root
> cause (the socket.timeout, e.g.).

Good catch, I thought I had a "raise" there, but apparently not.
Comment 41 Jonathan Griffin (:jgriffin) 2013-03-25 11:08:15 PDT
You did, I was just concerned if that self.check_for_crash() also threw, it would obscure the original exception.
Comment 42 Andrew Halberstadt [:ahal] 2013-03-26 06:52:45 PDT
Good point. I realized that wouldn't work though since doing the 'raise' afterwards would now refer to the newer exception. I fixed it by just printing the original exception and calling sys.exit() afterwards:
https://hg.mozilla.org/integration/mozilla-inbound/rev/21ce35576177

I'm going to call this bug done and work on marionette unittest crash support in a new bug.
Comment 43 Andrew Halberstadt [:ahal] 2013-03-27 07:53:23 PDT
*** Bug 839161 has been marked as a duplicate of this bug. ***
Comment 44 Ryan VanderMeulen [:RyanVM] 2013-03-27 10:02:01 PDT
https://hg.mozilla.org/mozilla-central/rev/21ce35576177
Comment 45 Ed Morley [:emorley] 2013-04-01 14:09:02 PDT
Please can we backport this to b2g18? :-)
Comment 46 Andrew Halberstadt [:ahal] 2013-04-02 07:21:06 PDT
Yes, I'm working on it. Unfortunately there is a lot of merge mess and no try server.
Comment 48 Andrew Halberstadt [:ahal] 2013-04-03 09:44:07 PDT
Typo follow-up fix:
https://hg.mozilla.org/releases/mozilla-b2g18/rev/474f14c3894b

Note You need to log in before you can comment on or make changes to this bug.