No crash information on b2g emulator crashes

RESOLVED FIXED

Status

Release Engineering
General
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: jrmuizel, Assigned: ahal)

Tracking

other
x86
Mac OS X
Dependency tree / graph

Firefox Tracking Flags

(b2g18 fixed)

Details

Attachments

(3 attachments, 5 obsolete attachments)

(Reporter)

Description

4 years ago
In the following log I have a crash but don't get any information about the where the crash is happening.

https://tbpl.mozilla.org/php/getParsedLog.php?id=19910355&tree=Mozilla-Inbound&full=1

This makes debugging very difficult.

Comment 1

4 years ago
We're not actually building symbols for these.
I don't recall whether there was some issue that prevented us from building+uploading symbols, but that would be a prerequisite for fixing this bug.
Depends on: 843303
I just filed a bug to add build symbols for these.  After that's done, we can modify the harnesses to dump the crash reports when they occur.
(Assignee)

Comment 3

4 years ago
I've seen several complaints about how difficult it is to debug these types of crashes on irc and various places since this bug was filed. I think this should probably be a top b2g automation priority.
(Assignee)

Updated

4 years ago
Blocks: 846091

Updated

4 years ago
Blocks: 822399
(Assignee)

Updated

4 years ago
Blocks: 818103

Updated

4 years ago
Blocks: 821420
(Assignee)

Comment 4

4 years ago
Created attachment 720004 [details] [diff] [review]
Patch 1.0 (mozharness) - set download_symbols to ondemand
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
(Assignee)

Comment 5

4 years ago
Created attachment 720007 [details] [diff] [review]
Patch 1.1 (mozharness) - pass symbols_path to harnesses

Forgot to pass the symbols path into the harnesses
(Assignee)

Comment 6

4 years ago
Created attachment 720041 [details] [diff] [review]
Patch 1.0 (gecko) - use mozcrash instead of automationutils

These three patches *may* be all that's needed to get this working. I pushed the mozharness changes to mozharness_ash, watching:
https://tbpl.mozilla.org/?tree=Ash&rev=07e3c1fa05c6

Comment 7

4 years ago
(In reply to Andrew Halberstadt [:ahal] from comment #6)
> Created attachment 720041 [details] [diff] [review]
> Patch 1.0 (gecko) - use mozcrash instead of automationutils
> 
> These three patches *may* be all that's needed to get this working. I pushed
> the mozharness changes to mozharness_ash, watching:
> https://tbpl.mozilla.org/?tree=Ash&rev=07e3c1fa05c6

11:31:20     INFO -  Traceback (most recent call last):
11:31:20     INFO -    File "runreftestb2g.py", line 19, in <module>
11:31:20     INFO -      from b2gautomation import B2GRemoteAutomation
11:31:20     INFO -    File "/home/cltbld/talos-slave/test/build/tests/reftest/b2gautomation.py", line 5, in <module>
11:31:20     INFO -      import mozcrash
11:31:20     INFO -  ImportError: No module named mozcrash

Comment 8

4 years ago
Guessing we need the mozharness equivalent of bug 839569?

Comment 9

4 years ago
We have it for mh desktop unittests, but maybe not for mh b2g unittests.
For b2g unittests, we don't use mozbase modules from tests.zip, but from puppetagain.  So we need to add mozcrash to http://mxr.mozilla.org/mozilla-central/source/b2g/test/b2g-unittest-requirements.txt, and make sure it's available at http://puppetagain.pub.build.mozilla.org/data/python/packages/.
(Assignee)

Comment 11

4 years ago
Created attachment 721254 [details] [diff] [review]
Patch 1.1 (gecko) - add mozcrash to b2g_requirements.txt

Mozcrash is already in puppetagain, need to add it to requirements.txt:
https://tbpl.mozilla.org/?tree=Ash&rev=dfa95695ef58
(Assignee)

Comment 12

4 years ago
Ugh, mozlog isn't even listed as a dependency of mozcrash's setup.py:
https://tbpl.mozilla.org/php/getParsedLog.php?id=20343560&tree=Ash&full=1#error0
(Assignee)

Comment 13

4 years ago
Fixed the dependency issue in bug 847970 and did a new push:
https://tbpl.mozilla.org/?tree=Ash&rev=098b9ab83b4b
(Assignee)

Updated

4 years ago
Depends on: 848102
(Assignee)

Updated

4 years ago
Depends on: 848124
(Assignee)

Updated

4 years ago
Depends on: 849270
(Assignee)

Comment 14

4 years ago
Created attachment 722899 [details] [diff] [review]
Patch 2.0 (mozharness) - pass symbols to unittest harness

This is all that's needed on the mozharness side (symbols path is getting passed into harness on ash). This won't pass it in to marionette, but the marionette harness doesn't even have a --symbols-path option yet so I'll hold off on that until the other unittests have landed.

For an update on the gecko side, the crash dumps are being saved and left in the profile, but there seems to be a bug in devicemanagerADB when trying to pull them off the device. Currently investigating.
Attachment #720004 - Attachment is obsolete: true
Attachment #720007 - Attachment is obsolete: true
Attachment #722899 - Flags: review?(aki)

Comment 15

4 years ago
Comment on attachment 722899 [details] [diff] [review]
Patch 2.0 (mozharness) - pass symbols to unittest harness

This still becomes |--download-symbols true| for debug tests, right?
Attachment #722899 - Flags: review?(aki) → review+
(Assignee)

Comment 16

4 years ago
(In reply to Aki Sasaki [:aki] from comment #15)
> Comment on attachment 722899 [details] [diff] [review]
> Patch 2.0 (mozharness) - pass symbols to unittest harness
> 
> This still becomes |--download-symbols true| for debug tests, right?

Yes. Actually I noticed that and couldn't figure out why it was happening :p
(Assignee)

Updated

4 years ago
Depends on: 849822
(Assignee)

Comment 17

4 years ago
Created attachment 723637 [details] [diff] [review]
Patch 2.0 (gecko) - b2g emulator tests with minidump stackwalking

This works locally, but before it can land the following needs to happen:
1) devicemanager 0.20 released (which includes fix in bug 849822)
2) devicemanager on puppetagain updated to 0.20
3) mozharness patch needs to land
4) set MINIDUMP_STACKWALK env on test slaves? Not sure if this last one is already done or not. If it is I'm not sure how to do it.
Attachment #720041 - Attachment is obsolete: true
Attachment #721254 - Attachment is obsolete: true
Attachment #723637 - Flags: review?(jgriffin)
(Assignee)

Updated

4 years ago
No longer depends on: 848102
(In reply to Jonathan Griffin (:jgriffin) from comment #10)
> For b2g unittests, we don't use mozbase modules from tests.zip, but from
> puppetagain.  So we need to add mozcrash to
> http://mxr.mozilla.org/mozilla-central/source/b2g/test/b2g-unittest-
> requirements.txt, and make sure it's available at
> http://puppetagain.pub.build.mozilla.org/data/python/packages/.

I'm not wild about us having two different ways of getting mozbase for different unit test runs. Why is this different? We have a mirror of mozbase in m-c, we should use that.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #18)
> (In reply to Jonathan Griffin (:jgriffin) from comment #10)
> > For b2g unittests, we don't use mozbase modules from tests.zip, but from
> > puppetagain.  So we need to add mozcrash to
> > http://mxr.mozilla.org/mozilla-central/source/b2g/test/b2g-unittest-
> > requirements.txt, and make sure it's available at
> > http://puppetagain.pub.build.mozilla.org/data/python/packages/.
> 
> I'm not wild about us having two different ways of getting mozbase for
> different unit test runs. Why is this different? We have a mirror of mozbase
> in m-c, we should use that.

In mozharness, we share a single mozharness script across all of our gecko branches.  The details of what Python packages need to be installed in order to run e.g. Marionette live in this script.

We encountered a problem that different mozbase modules were required to run Marionette in m-c vs mozilla-b2g18...e.g., some packages started depending on some previously unrequired packages (e.g., moznetwork).  We couldn't update the mozharness script to reflect this, since those mozbase changes weren't universal and couldn't easily be uplifted.

Installing via puppetagain lets us call 'pip install /path/to/marionette' and all the deps get installed correctly on all branches.

FWIW, this is not different from how developers run B2G tests locally.  They current either user a helper sh script which runs Marionette's setup.py (installing packages from pypi rather than puppetagain), or they run setup.py themselves directly.

We could potentially change this in both places (mozharness and helper scripts) to use tests.zip again (in the mozharness case, by running setup_development.py in order to install all mozbase packages instead of cherry-picking the ones we actually use), but this isn't completely clean either.  There are test runs that use Marionette but don't need tests.zip, so we'd just be making those different from the unittest runs in favor of not having B2G vs desktop unittest runs install mozbase packages differently.
Comment on attachment 723637 [details] [diff] [review]
Patch 2.0 (gecko) - b2g emulator tests with minidump stackwalking

Review of attachment 723637 [details] [diff] [review]:
-----------------------------------------------------------------

\o/

::: build/mobile/b2gautomation.py
@@ +112,5 @@
>              try:
> +                crashed = mozcrash.check_for_crashes(local_dump_dir, symbolsPath, test_name=self.lastTestSeen)
> +            finally:
> +                shutil.rmtree(local_dump_dir)
> +                self._devicemanager.removeDir(remote_dump_dir)

Can we omit this step for B2G?  For emulators, which is the only config that unit tests work on atm, we don't need to do this cleanup, since none of the profile changes are persistent anyway.  On a device, the test profile would get removed anyway, so this is redundant.
Attachment #723637 - Flags: review?(jgriffin) → review+
(Assignee)

Comment 21

4 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #20)
> Can we omit this step for B2G?  For emulators, which is the only config that
> unit tests work on atm, we don't need to do this cleanup, since none of the
> profile changes are persistent anyway.  On a device, the test profile would
> get removed anyway, so this is redundant.

Are you sure about that? When testing locally I noticed when launching the emulator a second time after a crash there would exist an empty /data/local/tests/minidumps folder and a new entry in "/data/mozilla/b2g/Crash Reports"
(Assignee)

Comment 22

4 years ago
Jeff is working on that in bug 838374 and hopes to have it landed by end of week.
(Assignee)

Comment 23

4 years ago
Sigh, ignore comment 22 (wrong bug)
(In reply to Andrew Halberstadt [:ahal] from comment #21)
> (In reply to Jonathan Griffin (:jgriffin) from comment #20)
> > Can we omit this step for B2G?  For emulators, which is the only config that
> > unit tests work on atm, we don't need to do this cleanup, since none of the
> > profile changes are persistent anyway.  On a device, the test profile would
> > get removed anyway, so this is redundant.
> 
> Are you sure about that? When testing locally I noticed when launching the
> emulator a second time after a crash there would exist an empty
> /data/local/tests/minidumps folder and a new entry in
> "/data/mozilla/b2g/Crash Reports"

Interesting.  Well, I guess it doesn't hurt to keep this code, it shouldn't cost much time to do the cleanup.
(Assignee)

Comment 25

4 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #24)
> Interesting.  Well, I guess it doesn't hurt to keep this code, it shouldn't
> cost much time to do the cleanup.

Plus it only gets executed if a crash was detected, so should have a negligible impact on slave load.
(Assignee)

Updated

4 years ago
Depends on: 851091
(Assignee)

Comment 26

4 years ago
Comment on attachment 722899 [details] [diff] [review]
Patch 2.0 (mozharness) - pass symbols to unittest harness

Pushed mozharness patch: https://hg.mozilla.org/build/mozharness/rev/f00b96a6eba0
Attachment #722899 - Flags: checked-in+
(Assignee)

Comment 27

4 years ago
Pushed gecko patch to try: https://tbpl.mozilla.org/?tree=Try&rev=8a978802a99c

Updated

4 years ago
Blocks: 834016
(Assignee)

Comment 28

4 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/dadc8416f4df

I'm going to leave open to work on adding support for this in marionette tests as well as in case the test slaves don't have the minidum_stackwalk binary available.
Whiteboard: [leave-open]
(Assignee)

Comment 29

4 years ago
checkForCrashes isn't getting run in the case where an exception is thrown (e.g socket timeouts) which covers a lot of the use cases we care about (https://tbpl.mozilla.org/php/getParsedLog.php?id=20780370&tree=Mozilla-Inbound&full=1)

I pushed a patch to try which should fix this: https://tbpl.mozilla.org/?tree=Try&rev=7025d6ae8156
https://hg.mozilla.org/mozilla-central/rev/dadc8416f4df
(Assignee)

Comment 31

4 years ago
These patches seem to have gotten rid of the "SIGSEGV" messages in the logcat, but not the timeouts. This makes no sense to me.
Suddenly, a wild SIGSEGV! https://tbpl.mozilla.org/php/getParsedLog.php?id=20864738&tree=Mozilla-Inbound
Blocks: 853024
(In reply to Andrew Halberstadt [:ahal] from comment #31)
> These patches seem to have gotten rid of the "SIGSEGV" messages in the
> logcat, but not the timeouts. This makes no sense to me.

Me either, but I've caught these cases locally, and the B2G process *has* crashed.  I'm not sure why we're no longer seeing it in the log, however.
(Assignee)

Comment 34

4 years ago
So either the crashes aren't generating minidumps in the profile or the slaves don't have minidump_stackwalk (or both). This try push should tell us more: https://tbpl.mozilla.org/?tree=Try&rev=6dd63c4f5e9f
(Assignee)

Updated

4 years ago
Depends on: 853544
(Assignee)

Comment 35

4 years ago
That really helped. There are two problems here:
1) no minidump_stackwalk binary (filed bug 853544)
2) mochitest and reftest are still using the in-tree mozdevice which doesn't have the fix for bug 849822 yet. To fix this we can either wait for mozbase to be synced (bug 838374) or we can change them to use the puppetagain packages (which I think we wanted to do anyway)
(Assignee)

Updated

4 years ago
Depends on: 853558
(Assignee)

Updated

4 years ago
Depends on: 853591
(Assignee)

Updated

4 years ago
Attachment #723637 - Flags: checked-in+
(Assignee)

Comment 36

4 years ago
Finally, we have crash stacks! https://tbpl.mozilla.org/php/getParsedLog.php?id=21057455&tree=Mozilla-Inbound&full=1#error2

For anyone blocked by this bug: for most purposes you can consider this RESOLVED FIXED. I'm going to leave it open for now to catch a few edge case crashes we've seen during setup (before the test harness is even invoked).
(Assignee)

Comment 37

4 years ago
Created attachment 728966 [details] [diff] [review]
Patch 3.0 (gecko) - add check_for_crashes capability to marionette and run it during setup

This patch should run check_for_crashes after every time we try to connect to marionette, thus catching all those socket timeout errors we've been seeing intermittently. I haven't actually seen this working in the wild yet, so pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=edfe996a5402
Attachment #728966 - Flags: review?(jgriffin)
(Assignee)

Comment 38

4 years ago
Created attachment 728974 [details] [diff] [review]
Patch 3.1 (gecko) - add check_for_crashes ability to marionette and run during setup

Beh, I noticed a bug.. need to use posixpath.dirname() when building the minidump path with 'IsRelative' set to 1. Sorry for the spam

Re-pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=931eac8d9c5c
Attachment #728966 - Attachment is obsolete: true
Attachment #728966 - Flags: review?(jgriffin)
Attachment #728974 - Flags: review?(jgriffin)
Comment on attachment 728974 [details] [diff] [review]
Patch 3.1 (gecko) - add check_for_crashes ability to marionette and run during setup

Review of attachment 728974 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

::: testing/marionette/client/marionette/marionette.py
@@ +394,5 @@
> +        try:
> +            # We are ignoring desired_capabilities, at least for now.
> +            self.session = self._send_message('newSession', 'value')
> +        except:
> +            self.check_for_crash()

We should probably surround this also with try/except/print traceback, so that if an error occurs during check_for_crash, we still report the root cause (the socket.timeout, e.g.).
Attachment #728974 - Flags: review?(jgriffin) → review+
(Assignee)

Comment 40

4 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #39)
> > +        except:
> > +            self.check_for_crash()
> 
> We should probably surround this also with try/except/print traceback, so
> that if an error occurs during check_for_crash, we still report the root
> cause (the socket.timeout, e.g.).

Good catch, I thought I had a "raise" there, but apparently not.
You did, I was just concerned if that self.check_for_crash() also threw, it would obscure the original exception.
(Assignee)

Comment 42

4 years ago
Good point. I realized that wouldn't work though since doing the 'raise' afterwards would now refer to the newer exception. I fixed it by just printing the original exception and calling sys.exit() afterwards:
https://hg.mozilla.org/integration/mozilla-inbound/rev/21ce35576177

I'm going to call this bug done and work on marionette unittest crash support in a new bug.
Whiteboard: [leave-open]

Updated

4 years ago
Depends on: 854996
(Assignee)

Updated

4 years ago
Blocks: 855279
(Assignee)

Updated

4 years ago
Duplicate of this bug: 839161
https://hg.mozilla.org/mozilla-central/rev/21ce35576177
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Please can we backport this to b2g18? :-)
(Assignee)

Comment 46

4 years ago
Yes, I'm working on it. Unfortunately there is a lot of merge mess and no try server.
(Assignee)

Comment 47

4 years ago
https://hg.mozilla.org/releases/mozilla-b2g18/rev/6f9961ab0f7a
https://hg.mozilla.org/releases/mozilla-b2g18/rev/5c187b17c37c
status-b2g18: --- → fixed
(Assignee)

Comment 48

4 years ago
Typo follow-up fix:
https://hg.mozilla.org/releases/mozilla-b2g18/rev/474f14c3894b
(Assignee)

Updated

4 years ago
Depends on: 860764
(Assignee)

Updated

4 years ago
Depends on: 860773

Updated

4 years ago
Depends on: 870742
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.