Last Comment Bug 802877 - Intermittent B2G emulator "TEST-UNEXPECTED-FAIL | * | EOFError: telnet connection closed"
: Intermittent B2G emulator "TEST-UNEXPECTED-FAIL | * | EOFError: telnet connec...
Status: RESOLVED FIXED
: intermittent-failure
Product: Testing
Classification: Components
Component: Marionette (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla19
Assigned To: Jonathan Griffin (:jgriffin)
:
:
Mentors:
: 817250 823067 824045 824059 824079 826131 (view as bug list)
Depends on: 813292 826182
Blocks: 823076
  Show dependency treegraph
 
Reported: 2012-10-17 16:39 PDT by Jonathan Griffin (:jgriffin)
Modified: 2014-10-15 12:03 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
fixed
fixed


Attachments
Add a delay before setting default emulator geolocation (817 bytes, patch)
2012-10-17 16:47 PDT, Jonathan Griffin (:jgriffin)
no flags Details | Diff | Splinter Review

Description Jonathan Griffin (:jgriffin) 2012-10-17 16:39:02 PDT
Sometimes WebAPI test runs are aborting during newSession in the first test with this stack trace:

Traceback (most recent call last):
  File "/data/jenkins/workspace/webapi-marionette-test/testing/marionette/client/marionette/marionette_test.py", line 78, in setUp
    self.marionette.start_session()
  File "/data/jenkins/workspace/webapi-marionette-test/testing/marionette/client/marionette/marionette.py", line 243, in start_session
    self.session = self._send_message('newSession', 'value')
  File "/data/jenkins/workspace/webapi-marionette-test/testing/marionette/client/marionette/marionette.py", line 162, in _send_message
    port = self.emulator.restart(self.local_port)
  File "/data/jenkins/workspace/webapi-marionette-test/testing/marionette/client/marionette/emulator.py", line 255, in restart
    self.start()
  File "/data/jenkins/workspace/webapi-marionette-test/testing/marionette/client/marionette/emulator.py", line 309, in start
    self.geo.set_default_location()
  File "/data/jenkins/workspace/webapi-marionette-test/testing/marionette/client/marionette/emulator_geo.py", line 13, in set_default_location
    self.set_location(self.lon, self.lat)
  File "/data/jenkins/workspace/webapi-marionette-test/testing/marionette/client/marionette/emulator_geo.py", line 16, in set_location
    self.emulator._run_telnet('geo fix %0.5f %0.5f' % (self.lon, self.lat))
  File "/data/jenkins/workspace/webapi-marionette-test/testing/marionette/client/marionette/emulator.py", line 215, in _run_telnet
    return self._get_telnet_response(command)
  File "/data/jenkins/workspace/webapi-marionette-test/testing/marionette/client/marionette/emulator.py", line 204, in _get_telnet_response
    line = self.telnet.read_until('\n')
  File "/usr/lib/python2.7/telnetlib.py", line 319, in read_until
    return self.read_very_lazy()
  File "/usr/lib/python2.7/telnetlib.py", line 395, in read_very_lazy
    raise EOFError, 'telnet connection closed'
TEST-UNEXPECTED-FAIL : EOFError: telnet connection closed

It appears that setting the default geolocation is sometimes causing the emulator to crash.  I'm going to try and add a delay before this to see if that clears this up; if not, I will disable it.
Comment 1 Jonathan Griffin (:jgriffin) 2012-10-17 16:47:45 PDT
Created attachment 672586 [details] [diff] [review]
Add a delay before setting default emulator geolocation
Comment 2 Jonathan Griffin (:jgriffin) 2012-10-17 16:48:26 PDT
https://hg.mozilla.org/mozilla-central/rev/f549d4510829.

Leaving open for further tweaks.
Comment 3 Ed Morley [:emorley] 2012-10-20 13:39:03 PDT
One of the changesets here turned marionette orange:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=marionette&rev=67f8d99d80fb

In that range, I believe this bug is the most likely contender.
Comment 4 Ed Morley [:emorley] 2012-10-20 13:43:08 PDT
> One of the changesets here turned marionette orange:

Fixed link:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=marionette&rev=67f8d99d80fb&noignore=1
Comment 5 Jonathan Griffin (:jgriffin) 2012-10-31 16:43:56 PDT
I haven't seen this problem recur, I'm closing this.
Comment 6 Jonathan Griffin (:jgriffin) 2012-10-31 16:49:25 PDT
https://hg.mozilla.org/projects/ash/rev/6b8114dced68
Comment 7 Jonathan Griffin (:jgriffin) 2012-11-02 16:58:15 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/647847eb85ce
Comment 8 Andrew Halberstadt [:ahal] 2012-11-20 14:20:14 PST
Still seeing this occasionally on inbound (ignore the mozharness logcat exception):
https://tbpl.mozilla.org/php/getParsedLog.php?id=17217191&tree=Mozilla-Inbound
Comment 9 Jonathan Griffin (:jgriffin) 2012-11-21 10:28:01 PST
This is another weird emulator problem.  I'm thinking the best fix is to add it to the retry code, so if it happens, we just try everything again.
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-11-22 17:00:59 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=17290290&tree=Mozilla-Aurora
Comment 13 Andrew Halberstadt [:ahal] 2012-11-23 07:14:11 PST
Looking at that last log I see:
00:57:25     INFO - F/libc    (   43): Fatal signal 11 (SIGSEGV) at 0x00000047 (code=1)

Maybe this bug is a side effect of bug 814695. Though the tests still seem to run even after that seg fault so maybe it is completely unrelated.
Comment 15 Jonathan Griffin (:jgriffin) 2012-11-26 18:48:03 PST
(In reply to Ryan VanderMeulen from comment #14)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=17358883&tree=Mozilla-Beta

This is an emulator crash, but one that occurred during a non-WebAPI test.
Comment 17 Jonathan Griffin (:jgriffin) 2012-11-27 10:08:47 PST
(In reply to Ed Morley [:edmorley, UTC, email: emorley@moco] from comment #16)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=17374301&tree=Firefox

This is an emulator crash during the test test_execute_async_script.py:test_execute_permission.  We really need better crash detection and messaging.
Comment 18 Ryan VanderMeulen [:RyanVM] 2012-11-27 11:54:06 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=17362640&tree=Mozilla-Beta

(sorry, but gotta post this somewhere...)
Comment 19 Jonathan Griffin (:jgriffin) 2012-11-27 12:35:39 PST
(In reply to Ryan VanderMeulen from comment #18)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=17362640&tree=Mozilla-Beta
> 
> (sorry, but gotta post this somewhere...)

This is a crash during the test test_execute_script.py:test_that_we_can_pass_in_floats.  I'm beginning to think that re-enabling the telephony tests has caused some weird problem in which the emulator crashes after some delay.
Comment 21 Jonathan Griffin (:jgriffin) 2012-11-27 14:27:37 PST
(In reply to Ryan VanderMeulen from comment #20)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=17384033&tree=Mozilla-Beta

Another random emulator crash, the one during test_perf.py:test_perf_basic
Comment 23 Jonathan Griffin (:jgriffin) 2012-11-28 10:51:47 PST
(In reply to Ryan VanderMeulen from comment #22)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=17391237&tree=Mozilla-
> Inbound

Another random emulator crash, this one during test_execute_async_script.py:test_execute_no_return
Comment 27 Jonathan Griffin (:jgriffin) 2012-11-29 18:30:25 PST
This is likely fallout from re-enabling the telephony tests in bug 790463, but it's hard to be sure because this is so infrequent.

One idea that ahal and I had was to cause this error to automatically retry the run, which would effectively hide it.
Comment 32 Andrew Halberstadt [:ahal] 2012-12-03 06:36:39 PST
(In reply to Jonathan Griffin (:jgriffin) from comment #27)
> This is likely fallout from re-enabling the telephony tests in bug 790463,
> but it's hard to be sure because this is so infrequent.
> 
> One idea that ahal and I had was to cause this error to automatically retry
> the run, which would effectively hide it.

Seems to be happening more frequently now. We already have the code to retry the job in mozharness, we'd just need to add a new magic string to it (boo)
Comment 33 Jonathan Griffin (:jgriffin) 2012-12-03 10:43:07 PST
*** Bug 817250 has been marked as a duplicate of this bug. ***
Comment 39 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-12-07 01:25:47 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=17664577&tree=Mozilla-Aurora
Comment 40 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-12-08 01:59:37 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=17741506&tree=Mozilla-Beta
Comment 41 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-12-08 14:01:14 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=17743251&tree=Mozilla-Beta
Comment 51 Ryan VanderMeulen [:RyanVM] 2012-12-30 17:37:57 PST
https://tbpl.mozilla.org/php/getParsedLog.php?id=18359097&tree=Firefox
Comment 57 Jonathan Griffin (:jgriffin) 2013-01-02 16:32:18 PST
*** Bug 824045 has been marked as a duplicate of this bug. ***
Comment 58 Jonathan Griffin (:jgriffin) 2013-01-02 17:31:19 PST
*** Bug 823067 has been marked as a duplicate of this bug. ***
Comment 59 Jonathan Griffin (:jgriffin) 2013-01-02 17:34:47 PST
*** Bug 824079 has been marked as a duplicate of this bug. ***
Comment 61 Ed Morley [:emorley] 2013-01-03 01:26:12 PST
*** Bug 826131 has been marked as a duplicate of this bug. ***
Comment 62 Ed Morley [:emorley] 2013-01-03 01:26:20 PST
*** Bug 824059 has been marked as a duplicate of this bug. ***
Comment 64 Jonathan Griffin (:jgriffin) 2013-01-09 11:35:39 PST
This particular problem has been fixed due to bug 826132.

Note You need to log in before you can comment on or make changes to this bug.