Closed Bug 1229765 Opened 8 years ago Closed 8 years ago

Monitor Marionette browser process for start-up crashes

Categories

(Remote Protocol :: Marionette, defect)

defect
Not set
normal

Tracking

(firefox46 fixed)

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: ato, Assigned: impossibus)

References

(Blocks 1 open bug)

Details

(Keywords: pi-marionette-runner)

Attachments

(1 file)

The Marionette test runner (somewhat oddly named marionette-client) does not monitor the browser process it spawns.  When the browser crashes we would like to detect this and possibly collect a crash report, so that cases like bug 1229549 will not occur again.

Currently Marionette produces this when the browser crashes:

    AssertionError: Timed out waiting for port!
Blocks: 1229549
Summary: Monitor browser process for crashes → Monitor Marionette browser process for crashes
Well, the crash analysis is part of mozcrash and would need symbols to actually process the crash. If those are not provided mozcrash cannot do anything.

One example of such a crash as what we see in one of the newly added dependency bugs is:
https://treeherder.allizom.org/logviewer.html#?job_id=2408059&repo=mozilla-central

Here we still fail in getting the symbols but as you can see mozcrash does not print the PROCESSCRASH lines or such. I get the feeling that this is not a problem by Marionette but mozcrash.
Blocks: 1202375, 1222197
Crash processing works perfectly for us in firefox-ui-tests since today after the right symbol url (bug 1228644) for installer builds on Windows is used when triggering the mozharness script:

https://treeherder.allizom.org/logviewer.html#?job_id=2444766&repo=mozilla-central

 08:42:35     INFO - PROCESS-CRASH | runner.py | application crashed [None]
 08:42:35     INFO - Crash dump filename: c:\jenkins\workspace\mozilla-central_update\build\tmps7o6ku.mozrunner-1449247284\minidumps\ffc0b75d-f14f-477e-880c-bf8fdf14561f.dmp

So I wonder which jobs are affected.
Looking at the media-test job pointed out in bug 1229549 (https://treeherder.mozilla.org/logviewer.html#?job_id=14306272&repo=try) - the same Marionette error manifests itself. If I understand correctly, the issue is specific to start-up crashes. (Those same media-test jobs have processed other crashes correctly in the past. As far as I can tell, MINIDUMP_STACKWALK is set correctly, as is the --symbols-url option.)
As per conversation with :ted, I will investigate adding some logic to the Marionette runner to look for a minidump file when waiting for a port times out.
Assignee: nobody → mjzffr
Blocks: 1210874
Does minidump stackwalk error out? I have seen some startup crashes for our update tests which have invalid minidump files. See bug 474863 for the underlying isuse.
I've modified the Marionette class to check for crashes whenever we hit a timeout while waiting for a port. Works locally with a crashy Firefox Desktop. Let's see how that fares on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0a22e5771cd
Summary: Monitor Marionette browser process for crashes → Monitor Marionette browser process for start-up crashes
Note that the Mnw job [1] in the try run doesn't log PROCESS-CRASH. I think that may be because the emulator doesn't even manage to start up in the first place before we hit a timeout, which is a known problem. [2] 

This leads me to our conversation about checking for a crash sooner by trying to communicate with the process right after it's started: I think it might actually be better to wait until the port timeout, as in the current patch, so that we don't check too early to catch a start-up crash in the first place. Technically a start-up crash is defined as any crash within the first 60 seconds or so, right? What do you think, David?

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0a22e5771cd&exclusion_profile=false&selectedJob=14708759
[2] logs.glob.uno/?c=mozilla%23ateam&s=26+Nov+2015&e=26+Nov+2015&h=emulator+is+slow#c1000252
Flags: needinfo?(dburns)
Let's leave it for 60s then.
Flags: needinfo?(dburns)
(In reply to Maja Frydrychowicz (:maja_zf) from comment #6)
> I've modified the Marionette class to check for crashes whenever we hit a
> timeout while waiting for a port. Works locally with a crashy Firefox
> Desktop. Let's see how that fares on try:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0a22e5771cd

Now a try run with a non-start-up-crashy build: https://treeherder.mozilla.org/#/jobs?repo=try&revision=120ad4bb6d8d
Comment on attachment 8699599 [details]
MozReview Request: Bug 1229765 - Monitor Marionette browser process for start-up crashes; r?automatedtester

https://reviewboard.mozilla.org/r/28293/#review25583
Attachment #8699599 - Flags: review?(dburns) → review+
https://hg.mozilla.org/mozilla-central/rev/86b32f1e4aa1
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.