1299216 - [e10s] Improve crash handling for Marionette

Assignee

Description

•

8 years ago

The following job on Treeherder shows it perfectly:

> PROCESS-CRASH | test_profile_management.py TestLog.test_in_app_restart_the_browser | application crashed [@ mozalloc_abort(char const*)]
> 
[..]
> TEST-UNEXPECTED-ERROR | test_profile_management.py TestLog.test_in_app_restart_the_browser | IOError: Process killed because the connection to Marionette server is lost. Check gecko.log for errors (Reason: Connection timed out after 360s) 

Because the crash we are seeing here is a crash in content, Firefox is still up and running. In force_shutdown [1] we actually check if the process is still alive after waiting 65s. If that is the case we kill Firefox. 

As it looks like the behavior might not be the best for e10s. Shall we keep running our tests if there was a crash in content? We may need a solution for bug 1293270 first, to be able to be fully sure about.

Ted, what do you think?

[1] https://dxr.mozilla.org/mozilla-central/rev/26e22af660e543ebb69930f082188b69ec756185/testing/marionette/client/marionette_driver/marionette.py#756

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Flags: needinfo?(ted)

(not currently active) Ted Mielczarek

Comment 1

•

8 years ago

We actually discussed this situation a while back when the e10s work was first starting. I added an environment variable ` MOZ_CRASHREPORTER_SHUTDOWN` which was supposed to tell the application to quit if a content process crashed. It looks like we only actually implemented this for B2G though:
https://dxr.mozilla.org/mozilla-central/rev/26e22af660e543ebb69930f082188b69ec756185/b2g/chrome/content/shell.js#162

Even if we made this work for Firefox I don't know that we'd necessarily want this enabled for every harness, since we have some like browser-chrome that do intentionally cause content process crashes to test that functionality. Having the option for individual harnesses seems like it'd be a good idea, though.

Flags: needinfo?(ted)

Tracy Walker [:tracy]

Updated

•

8 years ago

Blocks: e10s-tests

tracking-e10s: --- → +

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Assignee: nobody → hskupin

Status: NEW → ASSIGNED

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 3

•

8 years ago

I'm currently blocked on bug 1300709 here given that I have to load a remoting URI but this does not work until sessionstore completed. I really don't want to add a sleep to the test.

Depends on: 1300709

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 4

•

8 years ago

Actually I don't need the extra restart in the tests so the missing sessionstore feature is not blocking me.

No longer depends on: 1300709

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Blocks: 1292588

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Blocks: 1297364

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 5

•

8 years ago

A first try push from two days ago was mostly green:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0d088416f21&selectedJob=27320923

There were only failures for ASAN related builds, which after some investigation make sense because the crash reporter is not available for those builds. I pushed a follow-up try build with a fix for that right now. If all works well, I will upload the patch today, just after some clean-up.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5091b92dee2c

Comment hidden (mozreview-request)

Currently check_for_crashes() behaves differently compared to log_crashes(), whereby it only returns a
boolean if a crash has been detected but not the amount of crash reports found. We should make sure that
both methods behave the same. Given that this change might affect consumers, we should have a major version
bump for the new release.

Review commit: https://reviewboard.mozilla.org/r/78950/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/78950/

Comment hidden (mozreview-request)

Adapt check_for_crashes() for latest mozcrash changes, which returns the number of crashes for both log_crashes()
and check_for_crashes() now. Also the runner should accumulate the number of crashes so it will be known at any
time how many times the process has been crashed.

Review commit: https://reviewboard.mozilla.org/r/78952/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/78952/

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/78954/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/78954/

Comment hidden (mozreview-request)

There are currently various issues in Marionette which cause extra and unnecessary logging output
in case of chrome and content crashes. Especially the latter ones are not easy to discover. This
patch makes the following improvements:

* Correctly handle content process crashes and kill the process with a proper failure message. After
  the process has been started again the next test will be executed. Because our installed framescripts
  will no longer be reachable, a restart is necessary to ensure a proper initial state for the next test
  is available.

* Correctly mark chrome process crashes as IOError test failure. Until now we have only used the "Process
  has been closed" message, which does not correctly cover it, especially not if the process shutdown without
  a crash.

* Allow changing the socket_timeout for the socket client. This was mainly necessary to allow unit tests
  to run faster.

* Collect the number of crashes so it's known even later if the process has been crashed during the
  current session. It also fixes the case when check_for_crash() gets called twice, and for the
  second time False is returned which will trigger an invalid code path.

* Reduce code duplication when destroying a session.

* Adding a unit test to verify the correct behavior for chrome and content crashes.

Review commit: https://reviewboard.mozilla.org/r/78956/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/78956/

Comment hidden (mozreview-request)

Removing the always parameter which is kinda useless here. Originally it was added to call
check_for_crash() in case of a MarionetteException happening. But such an exception is never
thrown in case of crashes. It will always be an IOError. So lets get it removed and ensure
we call check_for_crashes() in all the cases, and can feed the crash status into the call to
handle_socket_failure().

Review commit: https://reviewboard.mozilla.org/r/78958/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/78958/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 11

•

8 years ago

Marking as leave open so I can push the releases to PyPI once it got landed.

Keywords: leave-open

Whiteboard: [needs mozcrash and mozrunner release]

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/2-3/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 16

•

8 years ago

With the latest updates the tests on Linux and Windows pass just fine, but we have issues on OSX with mozprocess:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f5bb2a0c48d1&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&selectedJob=27535448&filter-searchStr=mn

13:18:41    ERROR -  TEST-UNEXPECTED-ERROR | test_crash.py TestCrash.test_crash_chrome_process | OSError: [Errno 1] Operation not permitted
13:18:41     INFO -  Traceback (most recent call last):
13:18:41     INFO -    File "/builds/slave/test/build/venv/lib/python2.7/site-packages/marionette/marionette_test/testcases.py", line 161, in run
13:18:41     INFO -      testMethod()
13:18:41     INFO -    File "/builds/slave/test/build/tests/marionette/tests/testing/marionette/harness/marionette/tests/unit/test_crash.py", line 94, in test_crash_chrome_process
13:18:41     INFO -      self.crash, chrome=True)
13:18:41     INFO -    File "/tools/python27/lib/python2.7/unittest/case.py", line 989, in assertRaisesRegexp
13:18:41     INFO -      callable_obj(*args, **kwargs)
13:18:41     INFO -    File "/builds/slave/test/build/tests/marionette/tests/testing/marionette/harness/marionette/tests/unit/test_crash.py", line 84, in crash
13:18:41     INFO -      """, sandbox=sandbox)
13:18:41     INFO -    File "/builds/slave/test/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 1685, in execute_script
13:18:41     INFO -      rv = self._send_message("executeScript", body, key="value")
13:18:41     INFO -    File "/builds/slave/test/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 40, in _
13:18:41     INFO -      m.handle_socket_failure(crashed)
13:18:41     INFO -    File "/builds/slave/test/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 766, in handle_socket_failure
13:18:41     INFO -      self.instance.close()
13:18:41     INFO -    File "/builds/slave/test/build/venv/lib/python2.7/site-packages/marionette_driver/geckoinstance.py", line 172, in close
13:18:41     INFO -      self.runner.stop()
13:18:41     INFO -    File "/builds/slave/test/build/venv/lib/python2.7/site-packages/mozrunner/base/runner.py", line 169, in stop
13:18:41     INFO -      self.process_handler.kill(sig=sig)
13:18:41     INFO -    File "/builds/slave/test/build/venv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 749, in kill
13:18:41     INFO -      self.proc.kill(sig=sig)
13:18:41     INFO -    File "/builds/slave/test/build/venv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 165, in kill
13:18:41     INFO -      send_sig(signal.SIGTERM)
13:18:41     INFO -    File "/builds/slave/test/build/venv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 152, in send_sig
13:18:41     INFO -      os.killpg(pid, sig)
13:18:41     INFO -  TEST-INFO took 1892ms

Looks like the code I wrote for bug 1176758 still has a bad side-effect. I will investigate later today when the setup of my MBP has been done.

and:

> Exception socket.error: (57, 'Socket is not connected') in <bound method TcpTransport.__del__ of <marionette_driver.transport.TcpTransport object at 0x10daed310>> ignored

I have also seen this on Linux before, and the strange part is that this comes through wptserve.

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/80088/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/80088/

Comment hidden (mozreview-request)

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78950/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8791591 [details]
Bug 1299216 - [mozrunner] check_for_crashes() has to return the crash count since its last invocation.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78952/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8791592 [details]
Bug 1299216 - Bump mozrunner version to 6.13.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78954/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/3-4/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/3-4/

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/4-5/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/4-5/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 25

•

8 years ago

The problematic issue here on OS X was that we didn't wait long enough when calling runner.wait(). A second only is too short and even with a crash the process might not be completely gone. Increasing the timeout value fixed the failures on OS X: https://treeherder.mozilla.org/#/jobs?repo=try&revision=61aba68b8a0a

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Attachment #8791590 - Flags: review?(ahalberstadt)

Attachment #8791591 - Flags: review?(ahalberstadt)

Attachment #8791592 - Flags: review?(ahalberstadt)

Attachment #8793329 - Flags: review?(dburns)

Attachment #8791593 - Flags: review?(dburns)

Attachment #8791594 - Flags: review?(dburns)

David Burns :automatedtester

Comment 26

•

8 years ago

mozreview-review

Comment on attachment 8793329 [details]
Bug 1299216 - Don't care about socket not connected for sock.shutdown() call.

https://reviewboard.mozilla.org/r/80088/#review79828

Attachment #8793329 - Flags: review?(dburns) → review+

David Burns :automatedtester

Comment 27

•

8 years ago

mozreview-review

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

https://reviewboard.mozilla.org/r/78956/#review79830

::: testing/marionette/harness/marionette/marionette_test/decorators.py:64
(Diff revision 5)
>          func.metaparameters.append((func_suffix, args, kwargs))
>          return func
>      return wrapped
>  
>  
> +def run_if_e10s(target):

Shouldnt all tests run in e10s mode?

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 28

•

8 years ago

mozreview-review-reply

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

https://reviewboard.mozilla.org/r/78956/#review79830

> Shouldnt all tests run in e10s mode?

No, we still run Marionette tests in non-e10s mode on all branches.

David Burns :automatedtester

Comment 29

•

8 years ago

mozreview-review

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

https://reviewboard.mozilla.org/r/78958/#review79836

Attachment #8791594 - Flags: review?(dburns) → review+

Andrew Halberstadt [:ahal]

Comment 30

•

8 years ago

mozreview-review

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

https://reviewboard.mozilla.org/r/78950/#review79982

Thanks, this behaviour is more useful!

::: testing/mozbase/mozcrash/mozcrash/mozcrash.py:78
(Diff revision 2)
>      filename of the calling function will be used.
>  
>      If `quiet` is set, no PROCESS-CRASH message will be printed to stdout if a
>      crash is detected.
>  
> -    Returns True if any minidumps were found, False otherwise.
> +    Returns Number of minidump files found.

nit: number should be lower case

Attachment #8791590 - Flags: review?(ahalberstadt) → review+

Andrew Halberstadt [:ahal]

Comment 31

•

8 years ago

mozreview-review

Comment on attachment 8791591 [details]
Bug 1299216 - [mozrunner] check_for_crashes() has to return the crash count since its last invocation.

https://reviewboard.mozilla.org/r/78952/#review79984

Lgtm!

Attachment #8791591 - Flags: review?(ahalberstadt) → review+

Andrew Halberstadt [:ahal]

Comment 32

•

8 years ago

mozreview-review

Comment on attachment 8791592 [details]
Bug 1299216 - Bump mozrunner version to 6.13.

https://reviewboard.mozilla.org/r/78954/#review79988

Attachment #8791592 - Flags: review?(ahalberstadt) → review+

Andrew Halberstadt [:ahal]

Comment 33

•

8 years ago

mozreview-review

Comment on attachment 8791592 [details]
Bug 1299216 - Bump mozrunner version to 6.13.

https://reviewboard.mozilla.org/r/78954/#review79990

::: testing/mozbase/mozrunner/setup.py:21
(Diff revision 2)
>          'mozlog >= 3.0',
>          'mozprocess >= 0.23',
>          'mozprofile >= 0.18',
>          ]
>  
>  EXTRAS_REQUIRE = {'crash': ['mozcrash >= 0.14']}

This should get bumped to 1.0

David Burns :automatedtester

Comment 34

•

8 years ago

mozreview-review

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

https://reviewboard.mozilla.org/r/78956/#review79832

Attachment #8791593 - Flags: review?(dburns) → review+

Comment hidden (mozreview-request)

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78950/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8791591 [details]
Bug 1299216 - [mozrunner] check_for_crashes() has to return the crash count since its last invocation.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78952/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8791592 [details]
Bug 1299216 - Bump mozrunner version to 6.13.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78954/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8793329 [details]
Bug 1299216 - Don't care about socket not connected for sock.shutdown() call.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/80088/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/5-6/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/5-6/

Comment hidden (mozreview-request)

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78950/diff/3-4/

Comment hidden (mozreview-request)

Comment on attachment 8791591 [details]
Bug 1299216 - [mozrunner] check_for_crashes() has to return the crash count since its last invocation.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78952/diff/3-4/

Comment hidden (mozreview-request)

Comment on attachment 8791592 [details]
Bug 1299216 - Bump mozrunner version to 6.13.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78954/diff/3-4/

Comment hidden (mozreview-request)

Comment on attachment 8793329 [details]
Bug 1299216 - Don't care about socket not connected for sock.shutdown() call.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/80088/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/6-7/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/6-7/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 47

•

8 years ago

mozreview-review

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

https://reviewboard.mozilla.org/r/78948/#review80748

Updated patch after rebasing to mozilla-central. It includes a move of the line which ensures that the session gets deleted out of _request_in_app_shutdown into quit() and restart(), to ensure we always call it even in the case of using a callback. Given that no change is actually done I assume that I can take over the r+.

For safety I will retrigger a set of tests including Windows XP which seems to be a bit unstable.

Comment hidden (mozreview-request)

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78950/diff/4-5/

Comment hidden (mozreview-request)

Comment on attachment 8791591 [details]
Bug 1299216 - [mozrunner] check_for_crashes() has to return the crash count since its last invocation.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78952/diff/4-5/

Comment hidden (mozreview-request)

Comment on attachment 8791592 [details]
Bug 1299216 - Bump mozrunner version to 6.13.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78954/diff/4-5/

Comment hidden (mozreview-request)

Comment on attachment 8793329 [details]
Bug 1299216 - Don't care about socket not connected for sock.shutdown() call.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/80088/diff/3-4/

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/7-8/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/7-8/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 54

•

8 years ago

mozreview-review

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

https://reviewboard.mozilla.org/r/78948/#review83156

Rebased again due to conflicts with the latest changes on bug 1291687. I triggered another set on Windows XP to check if the last issue is still persistent or not.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 55

•

8 years ago

As reference the last try build which was successful across all platforms except WinXP where we had dozen of crashes of Firefox:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3750a6ed0e55&selectedJob=28315042

New try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca295062ac9e

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 56

•

8 years ago

The hundreds of crashes on XP are still around. All of them are related to accessibility and specifically "mozilla::a11y::PDocAccessibleChild::SendCOMProxy()". I thought that ally is disabled by default when we run Marionette tests?

https://archive.mozilla.org/pub/firefox/try-builds/hskupin@mozilla.com-ca295062ac9e74bf0073941137c2e1a33027a73e/try-win32/try_xp_ix_test-marionette-e10s-bm110-tests1-windows-build15.txt.gz

Aaron, do you have an idea here why we are crashing that often?

Flags: needinfo?(aklotz)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 57

•

8 years ago

Btw. the same crash also happens for other try builds which do not have any change to Marionette:

https://treeherder.mozilla.org/#/jobs?repo=try&filter-searchStr=windows%20xp&fromchange=e7f585576ba4c8377f218dbecb2cfabd77571e57&selectedJob=28888091

The difference to my push is that with my patch we continue the tests and do not abort after the first crash. Maybe we should better keep this behavior also for content crashes.

David, would you agree with the above?

Flags: needinfo?(dburns)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 58

•

8 years ago

I see the same crashes over on bug 1141483. A build from before my vacation didn't show the crashes. So I feel that the following a11y change which landed on Oct 6th caused the massive crashes:

https://hg.mozilla.org/mozilla-central/rev/7cc58d0708e7#l8.10

David Burns :automatedtester

Comment 59

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #57)
> Btw. the same crash also happens for other try builds which do not have any
> change to Marionette:
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&filter-
> searchStr=windows%20xp&fromchange=e7f585576ba4c8377f218dbecb2cfabd77571e57&se
> lectedJob=28888091
> 
> The difference to my push is that with my patch we continue the tests and do
> not abort after the first crash. Maybe we should better keep this behavior
> also for content crashes.
> 
> David, would you agree with the above?

Yup, sounds good.

Flags: needinfo?(dburns)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 60

•

8 years ago

I'm close to a solution for an early abort but I'm blocked on bug 1309250 now. I will see if I can find a workaround for now to not have to raise an exception, or catch it sanely.

Depends on: 1309250

(No longer employed by Mozilla) Aaron Klotz

Comment 61

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #56)
> The hundreds of crashes on XP are still around. All of them are related to
> accessibility and specifically
> "mozilla::a11y::PDocAccessibleChild::SendCOMProxy()". I thought that ally is
> disabled by default when we run Marionette tests?
> 
> https://archive.mozilla.org/pub/firefox/try-builds/hskupin@mozilla.com-
> ca295062ac9e74bf0073941137c2e1a33027a73e/try-win32/try_xp_ix_test-marionette-
> e10s-bm110-tests1-windows-build15.txt.gz
> 
> Aaron, do you have an idea here why we are crashing that often?

It's because that is a11y code that only runs on e10s, but a11y+e10s is not is not a supported configuration on Windows XP at this time. If possible I'd suggest that we not run a11y and e10s together on Windows XP.

Flags: needinfo?(aklotz)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 62

•

8 years ago

(In reply to Aaron Klotz [:aklotz] from comment #61)
> It's because that is a11y code that only runs on e10s, but a11y+e10s is not
> is not a supported configuration on Windows XP at this time. If possible I'd
> suggest that we not run a11y and e10s together on Windows XP.

That is problematic then for us given that Marionette enables a11y or pieces of it, which are needed for specific events to perform. A while ago I already filed bug 1277791 to get this investigated or stopped.

Depends on: 1277791

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 63

•

8 years ago

mozreview-review

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

https://reviewboard.mozilla.org/r/78948/#review83728

Trying to see if using `self.quit()` works better as only calling `self.instance.close()`. We would at least unregister all listeners which might cause lesser issues during a forced shutdown.

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/8-9/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/8-9/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 66

•

8 years ago

(In reply to David Burns :automatedtester from comment #59)
> > The difference to my push is that with my patch we continue the tests and do
> > not abort after the first crash. Maybe we should better keep this behavior
> > also for content crashes.
> > 
> > David, would you agree with the above?
> 
> Yup, sounds good.

Btw the early abort of a test run right after the instance crashed happens at the moment because of a side-effect which is tracked by bug 1292588. I do not have a clear answer yet without investigating it deeper. But it's most likely due to the additional error/exception in the test runner. With that we seem to break out of everything.

While thinking more about early aborts, I feel that we shouldn't do that. Instead we should always continue until all tests have been run. To get a better knowledge of how other test suites are handling this situation I checked Treeherder results for process crashes and was able to find some. As I was able to see other test suites like XPCShell, Robocop, Mochitest Media, RefTest Crash continue after a crash. Here some logs:

https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-aurora-macosx64-debug/1476129263/mozilla-aurora_yosemite_r7-debug_test-xpcshell-bm135-tests1-macosx-build2.txt.gz

https://public-artifacts.taskcluster.net/Mfu_jSTiRlmSZ74PTmNzTA/0/public/logs/live_backing.log

https://archive.mozilla.org/pub/mobile/tinderbox-builds/autoland-android-api-15/1476258107/autoland_ubuntu64_vm_armv7_large_test-mochitest-media-2-bm130-tests1-linux64-build2.txt.gz

https://public-artifacts.taskcluster.net/a6lgsCWlTjS7qmmobS5Yqw/0/public/logs/live_backing.log

Ted, you as test module owner, can you please give an advise how Marionette should handle such a situation? Should the tests continue even when we could end-up with dozen/hundreds of crashes in case of total bustage of a build. Thanks.

Flags: needinfo?(ted)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 67

•

8 years ago

Using quit() doesn't actually change anything for Windows XP. We still crash like hell in the a11y part:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=453572693326&selectedJob=29027853

So we really have to get a11y disabled, or we cannot land this patch until Firefox reaches 53.0a1, which would be sad.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 68

•

8 years ago

Some information from IRC as discussed with Ted over there:

> [14:05:38] @ted AFAIK most suites will not restart a run if a browser crashesr
> [14:05:40] @ted xpcshell tests all run in individual proceses, so one crash doesn't terminate the run
> [14:05:48] @ted i'm not sure what mochitest does with run-by-dir nowadays
> [14:05:53] whimboo you mean continue or restart
> [14:06:38] jmaher ted: for mochitest we do run-by-dir, so if one dir has a crash, we will continue on and get coverage on other dirs with a fresh profile/browser
> [14:06:44] @ted okay, that makes sense
> [14:06:48] @ted that seems like sensible behavior
> [14:07:07] whimboo ted: with marionette we also restart the browser in case of a content crash
> [14:07:09] @ted run as much of the tests as possible, but don't try to run the rest of the tests in a directory if one of them crashes
> [14:07:27] @ted whimboo: i'm honestly not sure how mochitest would handle a content crash
> [14:07:29] whimboo i don't think we create a fresh profile yet
> [14:07:49] whimboo ted: in my patch i implemetned a shutdown and restart of firefox
> [14:07:55] @ted ah
> [14:07:59] @ted that seems sensible
> [14:08:14] whimboo if a content process crash happens we cannot be sure that the installed listeners by Marionette are still active
> [14:08:48] whimboo ted: so right now its just an assumption that we have a content process crash given that a minidump is present and firefox still running
> [14:08:58] whimboo mozcrash doesnt provide the necessary information yet
> [14:09:33] @ted yeah...
> [14:09:46] @ted well if there's a content process crash there is a chrome notification that happens
> [14:12:23] whimboo there is still the question how functional marionette is at this point. But yes, the chrome process should not be affected by that content crash
> [14:23:15] @ted yeah, terminating the browser on a content process crash is totally sensible
> [14:26:39] whimboo ted: so right now we wait 65s before crashing the browser in case if a possible content crash
> [14:26:50] whimboo just in case to be sure its not a slow shutdown
> [14:27:04] whimboo when firefox would kill a hanging thread after 60s
> [14:27:32] whimboo ted: so do you think the notificatino check would be important for the initial patch?

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 69

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #67)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=453572693326&selectedJob=29027853

We should actually never run Mn-e10s for WinXP on try. Those tests are not executed on any of our integration and release branches. So this is not really a blocker anymore.

I will refresh to latest mozilla-central and trigger another final try build.

(not currently active) Ted Mielczarek

Comment 70

•

8 years ago

I think operating the way that Mochitest does would be sensible--Mochitest will continue on to the next directory of tests after a crash, since it starts a fresh browser instance for each directory, but it will not attempt to restart the current directory in which it crashed, even if that means skipping the rest of the tests in that directory.

If the Marionette harness would have started a new browser process at some point then it's fine to let it continue to do so. If not, it should just stop.

> [14:27:32] whimboo ted: so do you think the notificatino check would be important for the initial patch?

I think this would be more reliable than just waiting for a timeout. It's fairly simple, you just listen for the "ipc:content-shutdown" observer service notification, and check if it has an "abnormal" key:
https://dxr.mozilla.org/mozilla-central/rev/7452437b3ab571b1d60aed4e973d82a1471f72b2/browser/modules/ContentCrashHandlers.jsm#79

As I mentioned back in comment 1, maybe we should just implement support for `MOZ_CRASHREPORTER_SHUTDOWN` in Firefox, to provide a way to exit with an error status if a content process crashes? Then harnesses could opt-in to this behavior. Aside from browser-chrome tests that are explicitly testing content process crashes it's unlikely that a test harness is going to be able to continue to run tests usefully after a content process crash.

Flags: needinfo?(ted)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 71

•

8 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #70)
> If the Marionette harness would have started a new browser process at some
> point then it's fine to let it continue to do so. If not, it should just
> stop.

Whatever type of crash happens Marionette definitely brings up a new browser process. But compared to Mochitest Marionette would continue with the next test method (if there is one) from the same file. This is due to bug 1309250, and where I do not see a way to skip all remaining tests from that file. It means we won't loose any test coverage, but on the other side could have more reported crashes in case a test causes a higher amount of crashes.

> I think this would be more reliable than just waiting for a timeout. It's
> fairly simple, you just listen for the "ipc:content-shutdown" observer
> service notification, and check if it has an "abnormal" key:
> https://dxr.mozilla.org/mozilla-central/rev/
> 7452437b3ab571b1d60aed4e973d82a1471f72b2/browser/modules/
> ContentCrashHandlers.jsm#79

Ah, good to know that we have an observer for specifically this situation. I will have a look and will see if I can get this implemented. Would it be ok to get this moved to a follow-up bug? Or do you think we should have this right away? I ask because there will be API changes, and I would have to handle fallback code for older releases of Firefox. 

> As I mentioned back in comment 1, maybe we should just implement support for
> `MOZ_CRASHREPORTER_SHUTDOWN` in Firefox, to provide a way to exit with an
> error status if a content process crashes? Then harnesses could opt-in to
> this behavior. Aside from browser-chrome tests that are explicitly testing
> content process crashes it's unlikely that a test harness is going to be
> able to continue to run tests usefully after a content process crash.

That's something out of my scope. I'm happy to file a bug for you if you cannot do it, but it's nothing I could contribute to.

Flags: needinfo?(ted)

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/9-10/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/9-10/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 74

•

8 years ago

Talked with ted on IRC and here our conversation:

[21:44:15] whimboo ted: re content crashes... would it be fine to do the observer notification check as a follow-up or shall I add it with my initial patch to Marionette?
[21:44:44] @ted i guess doing it as a follow-up would be OK
[21:44:59] whimboo ted: k, there is also the quesiotn of backward compat
[21:45:03] @ted alternately we just get someone to support that env var in firefox in general
[21:45:05] whimboo especially for our update tests
[21:45:32] whimboo ted: should i file a bug for the env var?
[21:45:41] @ted whimboo: yeah, i think so
[21:45:44] @ted we could use it in mochitest etc too
[21:46:03] whimboo k, i will create a bug now
[21:46:16] whimboo ted: thanks. looks like my remaining questions are solved. 

So I'm going to file two new bugs. One bug for the env variable, and another one for handling the observer notification when content crashes happen.

Flags: needinfo?(ted)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 75

•

8 years ago

Actually on ash we run mn-e10s jobs for Windows 8 x64 opt/debug. It means the test failures as can be seen on my last try push need to be fixed.

I did some more investigation on a local Win10 VM and noticed that the minidump files are getting written out with a delay. As result checking for the files right after the content crash would result in an empty list, which means no crash is detected. Exactly that is also the test failure as listed in my try build.

Ted, given that mozcrash doesn't wait for any probably to appear minidump files, a delay here could mean that we detect a crash and erroneously assign it to a maybe following test. Not sure if this is something we can control or wait for.

For this specific test which tests content crashes, I could update the minidump file retrieval method to just wait a bit. I think that this should be fine enough. But for real unwanted crashes we might end-up with the above situation.

Flags: needinfo?(ted)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Depends on: 1311016

(not currently active) Ted Mielczarek

Comment 76

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #75) 
> Ted, given that mozcrash doesn't wait for any probably to appear minidump
> files, a delay here could mean that we detect a crash and erroneously assign
> it to a maybe following test. Not sure if this is something we can control
> or wait for.

I think the only reasonable thing to do here is to wait for the observer service notification, which fires after the minidump is available.

Flags: needinfo?(ted)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 77

•

8 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #76)
> I think the only reasonable thing to do here is to wait for the observer
> service notification, which fires after the minidump is available.

Hm, working on a temporary solution which might go away soon when we have a fix for bug 1311016 landed, I don't see valuable. Instead I picked up bug 1311016 and provided the patch. With the hope that all is fine and we can get it landed soon, I will be able to finalize this patch soon.

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/88540/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/88540/

Comment hidden (mozreview-request)

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78950/diff/5-6/

Comment hidden (mozreview-request)

Comment on attachment 8791591 [details]
Bug 1299216 - [mozrunner] check_for_crashes() has to return the crash count since its last invocation.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78952/diff/5-6/

Comment hidden (mozreview-request)

Comment on attachment 8791592 [details]
Bug 1299216 - Bump mozrunner version to 6.13.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78954/diff/5-6/

Comment hidden (mozreview-request)

Comment on attachment 8793329 [details]
Bug 1299216 - Don't care about socket not connected for sock.shutdown() call.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/80088/diff/4-5/

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/10-11/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/10-11/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 85

•

8 years ago

Given that autolander is currently broken I triggered the try build myself:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3bb855f560525e847c304cb9916f440821fefe60

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/11-12/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/11-12/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 88

•

8 years ago

Even with the new environment variable in place which let Firefox close itself, my test is still failing with no crash being detected:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=908e8d88290a4057414d5391c1d1ddccf6bc7cf3&selectedJob=29943878

Reason is that the minidump files are getting created somewhat delayed and the crash detection code returns false. With that we assume the process has been somehow shutdown with exit code 0.

I think that I will add some debugging code so we can figure out about which delay we are talking about here.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Depends on: 1313604

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 89

•

8 years ago

I talked with Ted and will try one more thing before digging into a loaner. So what could happen here is that we see a race because we check for crashes first and then for the process status. It means that when we close Firefox due to the crash, it shutdown the socket server before the process ends. Ideally the minidump file should have already been written, but who knows what's going on here because content processes are involved. It means that I will flip the checks, doing the process check first, and then checking for crashes. With that we can be sure that if the process is not running anymore, breakpad cannot do anything.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 90

•

8 years ago

I pushed the changes as mentioned above to try and it's still failing on Windows 8 test slaves. This is totally crazy because when Firefox has been shutdown Breakpad can no longer write to disk, but we can see those files to appear a bit later. I wonder if that is some kind of caching when data is written to the disk. 

So I will make use of the Win8 loaner and hopefully figure out more details.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 91

•

8 years ago

So this failure seem to only happen when the test_crash_chrome_process test is run before. I can also stop the failure when forcing another restart of Firefox at the end of the above test. For me it means that something is wrong with the instance of Firefox which gets started after the chrome crash.

I will dig more into all that tomorrow.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 92

•

8 years ago

Ok, so this was a tricky one! But I think that I nailed it down to a specific issue... What happens? Here the excerpt from the chrome crash test:

>     def test_crash_chrome_process(self):
>         self.assertRaisesRegexp(IOError, 'Process crashed',
>                                 self.crash, chrome=True)
[..]
>         self.marionette.start_session()
>
>         self.assertNotEqual(self.marionette.session['processId'], self.pid)
>         self.marionette.get_url()

The first command actually crashes the browser and we start Firefox again via `start_session()`. The next assertion is also working fine. But then something is wrong with the call to `get_url()`. As result the next test for the content crash is always failing. When this line gets removed all is working fine. Leaving this line in and doing a restart afterward, also makes it working as stated in my last comment.

In this case get_url() runs inside content context which means `getCurrentUrl()` in listener.js is getting executed and we end-up here:

https://dxr.mozilla.org/mozilla-central/rev/3e73fd638e687a4d7f46613586e5156b8e2af846/testing/marionette/listener.js#1051

Something seems to block specific messages between listener.js and driver.js afterward and that as long as Firefox is running and not getting restarted. Due to that the call of the content crash script will not be forwarded to the frame script and doesn't get run. As result no crash happens BUT Marionette client is shutting down Firefox due to a socket timeout. This is why we get no minidump files, and the wrong error message.

What can we do? I will leave in the call to get_url() but comment it out for now with a bug reference to a new bug I have to file. Given that this problem only happens on a single platform we can get this investigated later. No need to do it now.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 93

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #92)
> >         self.marionette.start_session()
> >
> >         self.assertNotEqual(self.marionette.session['processId'], self.pid)
> >         self.marionette.get_url()

Just to add, the faulty page here is about:blank, so I assume this issue could be related to bug 1312674.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Depends on: 1314594

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/90060/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/90060/

Comment hidden (mozreview-request)

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78950/diff/6-7/

Comment hidden (mozreview-request)

Comment on attachment 8791591 [details]
Bug 1299216 - [mozrunner] check_for_crashes() has to return the crash count since its last invocation.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78952/diff/6-7/

Comment hidden (mozreview-request)

Comment on attachment 8791592 [details]
Bug 1299216 - Bump mozrunner version to 6.13.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78954/diff/6-7/

Comment hidden (mozreview-request)

Comment on attachment 8793329 [details]
Bug 1299216 - Don't care about socket not connected for sock.shutdown() call.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/80088/diff/5-6/

Comment hidden (mozreview-request)

Comment on attachment 8804621 [details]
Bug 1299216 - Make use of MOZ_CRASHREPORTER_SHUTDOWN environment variable in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/88540/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/12-13/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/12-13/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 102

•

8 years ago

Using the --artifact switch for try doesn't seem to create the Mn jobs for Windows. Also pulse_actions is lagging a lot. So I cannot even reschedule those jobs. I will trigger a new try build specifically for Windows.

Here the current results which look promosing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9418bf09f4a0

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 103

•

8 years ago

mozreview-review

Comment on attachment 8793329 [details]
Bug 1299216 - Don't care about socket not connected for sock.shutdown() call.

https://reviewboard.mozilla.org/r/80088/#review89804

David, beside the 2 new commits in this series there is an existing one which I would like to have another review from you. Reason is that a lot of changes went in there so your original review is obsolete. Please see: https://reviewboard.mozilla.org/r/78956/diff/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Attachment #8804621 - Flags: review?(dburns)

Attachment #8806706 - Flags: review?(dburns)

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/13-14/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/13-14/

Comment hidden (mozreview-request)

Comment on attachment 8806706 [details]
Bug 1299216 - Wait for process exit first before checking for crashes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/90060/diff/1-2/

David Burns :automatedtester

Comment 107

•

8 years ago

mozreview-review

Comment on attachment 8806706 [details]
Bug 1299216 - Wait for process exit first before checking for crashes.

https://reviewboard.mozilla.org/r/90060/#review90532

Attachment #8806706 - Flags: review?(dburns) → review+

David Burns :automatedtester

Comment 108

•

8 years ago

mozreview-review

Comment on attachment 8804621 [details]
Bug 1299216 - Make use of MOZ_CRASHREPORTER_SHUTDOWN environment variable in Marionette.

https://reviewboard.mozilla.org/r/88540/#review90534

Attachment #8804621 - Flags: review?(dburns) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 109

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #103)
> https://reviewboard.mozilla.org/r/80088/#review89804
> 
> David, beside the 2 new commits in this series there is an existing one
> which I would like to have another review from you. Reason is that a lot of
> changes went in there so your original review is obsolete. Please see:
> https://reviewboard.mozilla.org/r/78956/diff/

Maybe you missed this request to check an already reviewed commit again. Thanks

Flags: needinfo?(dburns)

David Burns :automatedtester

Comment 110

•

8 years ago

looks fine to me

Flags: needinfo?(dburns)

Comment hidden (mozreview-request)

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78950/diff/7-8/

Comment hidden (mozreview-request)

Comment on attachment 8791591 [details]
Bug 1299216 - [mozrunner] check_for_crashes() has to return the crash count since its last invocation.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78952/diff/7-8/

Comment hidden (mozreview-request)

Comment on attachment 8791592 [details]
Bug 1299216 - Bump mozrunner version to 6.13.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78954/diff/7-8/

Comment hidden (mozreview-request)

Comment on attachment 8793329 [details]
Bug 1299216 - Don't care about socket not connected for sock.shutdown() call.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/80088/diff/6-7/

Comment hidden (mozreview-request)

Comment on attachment 8804621 [details]
Bug 1299216 - Make use of MOZ_CRASHREPORTER_SHUTDOWN environment variable in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/88540/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8791593 [details]
Bug 1299216 - Enhance handling of crashes in Marionette.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78956/diff/14-15/

Comment hidden (mozreview-request)

Comment on attachment 8791594 [details]
Bug 1299216 - Remove always parameter from do_process_check decorator.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/78958/diff/14-15/

Comment hidden (mozreview-request)

Comment on attachment 8806706 [details]
Bug 1299216 - Wait for process exit first before checking for crashes.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/90060/diff/2-3/

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 119

•

8 years ago

mozreview-review

Comment on attachment 8791590 [details]
Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0.

https://reviewboard.mozilla.org/r/78948/#review91220

Last update of the patch series to fix the rebase conflict for the unittest manifest file.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Summary: [e10s] Improve handling of content crashes for Marionette → [e10s] Improve crash handling for Marionette

Pulsebot

Comment 120

•

8 years ago

Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d58be05673a2
[mozcrash] Count crash reports in check_for_crashes and bump version to 1.0. r=ahal
https://hg.mozilla.org/integration/autoland/rev/6ec047c4e48e
[mozrunner] check_for_crashes() has to return the crash count since its last invocation. r=ahal
https://hg.mozilla.org/integration/autoland/rev/7c0bdd869d2a
Bump mozrunner version to 6.13. r=ahal
https://hg.mozilla.org/integration/autoland/rev/46fd1d89a918
Don't care about socket not connected for sock.shutdown() call. r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/8463d0e5bf28
Make use of MOZ_CRASHREPORTER_SHUTDOWN environment variable in Marionette. r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/32b908ad1b02
Enhance handling of crashes in Marionette. r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/d57d7ada1080
Remove always parameter from do_process_check decorator. r=automatedtester
https://hg.mozilla.org/integration/autoland/rev/e7aa936d8b6e
Wait for process exit first before checking for crashes. r=automatedtester

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 121

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d58be05673a2
https://hg.mozilla.org/mozilla-central/rev/6ec047c4e48e
https://hg.mozilla.org/mozilla-central/rev/7c0bdd869d2a
https://hg.mozilla.org/mozilla-central/rev/46fd1d89a918
https://hg.mozilla.org/mozilla-central/rev/8463d0e5bf28
https://hg.mozilla.org/mozilla-central/rev/32b908ad1b02
https://hg.mozilla.org/mozilla-central/rev/d57d7ada1080
https://hg.mozilla.org/mozilla-central/rev/e7aa936d8b6e

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 122

•

8 years ago

Pushed to pypi:

mozcrash:
=========
creating build/bdist.macosx-10.11-x86_64/wheel/mozcrash-1.0.dist-info/WHEEL
Uploading distributions to https://upload.pypi.org/legacy/
Uploading mozcrash-1.0-py2-none-any.whl
[================================] 11673/11673 - 00:00:02
Uploading mozcrash-1.0.tar.gz
[================================] 10007/10007 - 00:00:03

mozrunner:
==========
creating build/bdist.macosx-10.11-x86_64/wheel/mozrunner-6.13.dist-info/WHEEL
Uploading distributions to https://upload.pypi.org/legacy/
Uploading mozrunner-6.13-py2-none-any.whl
[================================] 80823/80823 - 00:00:04
Uploading mozrunner-6.13.tar.gz
[================================] 70895/70895 - 00:00:02


Given that this patch contributes a lot of improvements for handling crashes especially for content crashes in e10s, it might be worth uplifting the patches to aurora. Lets wait one more day with the decision, which I will use to check if the patches would apply cleanly.

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox51: --- → ?

status-firefox52: --- → fixed

Keywords: leave-open

Resolution: --- → FIXED

Whiteboard: [needs mozcrash and mozrunner release]

Target Milestone: --- → mozilla52

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 123

•

8 years ago

There are actually too many dependencies which would have to be uplifted. So lets cancel a possible uplift to aurora.

status-firefox51: ? → ---

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

Depends on: 1316552

James Graham [:jgraham]

Updated

•

8 years ago

Blocks: 1320380

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

8 years ago

No longer blocks: 1320380

Depends on: 1320380

BMO Automation

Updated

•

1 year ago

Product: Testing → Remote Protocol

Bug 1299216 - [mozcrash] Count crash reports in check_for_crashes and bump version to 1.0. 8 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 58 bytes, text/x-review-board-request	ahal : review+	Details
Bug 1299216 - [mozrunner] check_for_crashes() has to return the crash count since its last invocation. 8 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 58 bytes, text/x-review-board-request	ahal : review+	Details
Bug 1299216 - Bump mozrunner version to 6.13. 8 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 58 bytes, text/x-review-board-request	ahal : review+	Details
Bug 1299216 - Enhance handling of crashes in Marionette. 8 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 58 bytes, text/x-review-board-request	automatedtester : review+	Details
Bug 1299216 - Remove always parameter from do_process_check decorator. 8 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 58 bytes, text/x-review-board-request	automatedtester : review+	Details
Bug 1299216 - Don't care about socket not connected for sock.shutdown() call. 8 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 58 bytes, text/x-review-board-request	automatedtester : review+	Details
Bug 1299216 - Make use of MOZ_CRASHREPORTER_SHUTDOWN environment variable in Marionette. 8 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 58 bytes, text/x-review-board-request	automatedtester : review+	Details
Bug 1299216 - Wait for process exit first before checking for crashes. 8 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 58 bytes, text/x-review-board-request	automatedtester : review+	Details