Closed Bug 820739 Opened 7 years ago Closed 6 years ago

Intermittent B2G emulator "command timed out: 1200 seconds without output, attempting to kill" or "command timed out: 3600 seconds without output, attempting to kill" during run-marionette or b2g_emulator_unittest step

Categories

(Testing :: Marionette, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: emorley, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [mozharness])

b2g_ics_armv7a_gecko_emulator mozilla-inbound opt test marionette-webapi on 2012-12-09 19:38:06 PST for push 0770315e3538

slave: talos-r3-fed-046

https://tbpl.mozilla.org/php/getParsedLog.php?id=17769242&tree=Mozilla-Inbound

{
19:40:44     INFO - #####
19:40:44     INFO - ##### Running install step.
19:40:44     INFO - #####
19:40:44     INFO - Getting output from command: ['/home/cltbld/talos-slave/test/build/venv/bin/pip', 'freeze']
19:40:44     INFO - Copy/paste: /home/cltbld/talos-slave/test/build/venv/bin/pip freeze
19:40:44     INFO - Reading from file tmpfile_stdout
19:40:44     INFO - Emulator tests; skipping.
19:40:44     INFO - #####
19:40:44     INFO - ##### Running run-marionette step.
19:40:44     INFO - #####
19:40:44     INFO - Running command: ['/home/cltbld/talos-slave/test/build/venv/bin/python', '-u', '/home/cltbld/talos-slave/test/build/tests/marionette/marionette/runtests.py', '--logcat-dir', '/home/cltbld/talos-slave/test/build', '--emulator', 'arm', '--gecko-path', '/home/cltbld/talos-slave/test/build/gecko/b2g', '--load-early', '--homedir', '/home/cltbld/talos-slave/test/build/emulator/b2g-distro', '--type', 'b2g', '/home/cltbld/talos-slave/test/build/tests/marionette/tests/testing/marionette/client/marionette/tests/unit-tests.ini']
19:40:44     INFO - Copy/paste: /home/cltbld/talos-slave/test/build/venv/bin/python -u /home/cltbld/talos-slave/test/build/tests/marionette/marionette/runtests.py --logcat-dir /home/cltbld/talos-slave/test/build --emulator arm --gecko-path /home/cltbld/talos-slave/test/build/gecko/b2g --load-early --homedir /home/cltbld/talos-slave/test/build/emulator/b2g-distro --type b2g /home/cltbld/talos-slave/test/build/tests/marionette/tests/testing/marionette/client/marionette/tests/unit-tests.ini
19:40:45     INFO -  starting httpd
19:40:45     INFO -  running webserver on http://10.12.49.173:42342/
19:41:00     INFO -  Android Debug Bridge version 1.0.29

command timed out: 1200 seconds without output, attempting to kill
}
Blocks: 823076
https://tbpl.mozilla.org/php/getParsedLog.php?id=19084122&tree=Firefox

01:10:17     INFO - #####
01:10:17     INFO - ##### Running install step.
01:10:17     INFO - #####
01:10:17     INFO - Getting output from command: ['/home/cltbld/talos-slave/test/build/venv/bin/pip', 'freeze']
01:10:17     INFO - Copy/paste: /home/cltbld/talos-slave/test/build/venv/bin/pip freeze
01:10:17     INFO - Reading from file tmpfile_stdout
01:10:17     INFO - Detecting whether we're running mozinstall >=1.0...
01:10:17     INFO - Getting output from command: ['/home/cltbld/talos-slave/test/build/venv/bin/mozinstall', '--app', 'b2g', '-h']
01:10:17     INFO - Copy/paste: /home/cltbld/talos-slave/test/build/venv/bin/mozinstall --app b2g -h
01:10:18     INFO - Reading from file tmpfile_stdout
01:10:18     INFO - Output received:
01:10:18     INFO -  Usage: mozinstall [options] installer
01:10:18     INFO -  Options:
01:10:18     INFO -    -h, --help            show this help message and exit
01:10:18     INFO -    -d DEST, --destination=DEST
01:10:18     INFO -                          Directory to install application into. [default:
01:10:18     INFO -                          "/home/cltbld/talos-slave/test"]
01:10:18     INFO -    --app=APP             Application being installed. [default: firefox]
01:10:18     INFO - mkdir: /home/cltbld/talos-slave/test/build/application
01:10:18     INFO - Getting output from command: ['/home/cltbld/talos-slave/test/build/venv/bin/mozinstall', '--app', 'b2g', '/home/cltbld/talos-slave/test/build/b2g-21.0a1.en-US.android-arm.tar.gz', '--destination', '/home/cltbld/talos-slave/test/build/application']
01:10:18     INFO - Copy/paste: /home/cltbld/talos-slave/test/build/venv/bin/mozinstall --app b2g /home/cltbld/talos-slave/test/build/b2g-21.0a1.en-US.android-arm.tar.gz --destination /home/cltbld/talos-slave/test/build/application
01:10:19     INFO - Reading from file tmpfile_stdout
01:10:19     INFO - Output received:
01:10:19     INFO -  /home/cltbld/talos-slave/test/build/application/b2g/b2g
01:10:19     INFO - #####
01:10:19     INFO - ##### Running run-tests step.
01:10:19     INFO - #####
01:10:19     INFO - Running command: ['/home/cltbld/talos-slave/test/build/venv/bin/python', 'runreftestb2g.py', '--adbpath', '/home/cltbld/talos-slave/test/build/emulator/b2g-distro/out/host/linux-x86/bin/adb', '--b2gpath', '/home/cltbld/talos-slave/test/build/emulator/b2g-distro', '--emulator', 'arm', '--emulator-res', '800x1000', '--ignore-window-size', '--logcat-dir', '/home/cltbld/talos-slave/test/build', '--remote-webserver', '10.0.2.2', '--xre-path', '/home/cltbld/talos-slave/test/build/xre/bin', '--total-chunks', '10', '--this-chunk', '2', '--gecko-path', '/home/cltbld/talos-slave/test/build/application/b2g', '--busybox', '/home/cltbld/talos-slave/test/build/busybox', 'tests/layout/reftests/reftest.list'] in /home/cltbld/talos-slave/test/build/tests/reftest
01:10:19     INFO - Copy/paste: /home/cltbld/talos-slave/test/build/venv/bin/python runreftestb2g.py --adbpath /home/cltbld/talos-slave/test/build/emulator/b2g-distro/out/host/linux-x86/bin/adb --b2gpath /home/cltbld/talos-slave/test/build/emulator/b2g-distro --emulator arm --emulator-res 800x1000 --ignore-window-size --logcat-dir /home/cltbld/talos-slave/test/build --remote-webserver 10.0.2.2 --xre-path /home/cltbld/talos-slave/test/build/xre/bin --total-chunks 10 --this-chunk 2 --gecko-path /home/cltbld/talos-slave/test/build/application/b2g --busybox /home/cltbld/talos-slave/test/build/busybox tests/layout/reftests/reftest.list
01:10:35     INFO -  Android Debug Bridge version 1.0.29

command timed out: 1200 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1364.127121
========= Finished '/tools/buildbot/bin/python scripts/scripts/b2g_emulator_unittest.py ...' failed (results: 2, elapsed: 22 mins, 46 secs) (at 2013-01-24 01:30:37.918921) =========
Summary: Intermittent B2G emulator "command timed out: 1200 seconds without output, attempting to kill" during run-marionette step → Intermittent B2G emulator "command timed out: 1200 seconds without output, attempting to kill" during run-marionette or b2g_emulator_unittest step
That last one timed out while downloading tests.zip. In that case the emulator hasn't even been launched yet. Filed bug 848442 for it.
In the most recent cases, it looks like the emulator isn't starting up successfully, but it's a little hard to tell.  Mozharness might be eating the last bit of log (bug 791406), and we don't have a process list or logcat output.  It will be hard to add additional logging here, since the mozharness script is getting killed by buildbot due to lack of output.
I have a timeout patch in bug 840305.
Oh cool, that will make it much easier to debug this.
Whiteboard: [mozharness]
https://tbpl.mozilla.org/php/getParsedLog.php?id=24326084&tree=Mozilla-Inbound
Summary: Intermittent B2G emulator "command timed out: 1200 seconds without output, attempting to kill" during run-marionette or b2g_emulator_unittest step → Intermittent B2G emulator "command timed out: 1200 seconds without output, attempting to kill" or "command timed out: 3600 seconds without output, attempting to kill" during run-marionette or b2g_emulator_unittest step
This is now one of our top oranges - could you take another look or find someone who has the cycles to? :-)
Flags: needinfo?(jgriffin)
Yes, I'll try.  Looking at the logs, it looks like we could be dealing with several distinct problems; some may be test related, some B2G related, others harness related.

Unfortunately the 1200s timeout kill prevents us from dumping logcat, so the information we have to go on is limited.

Ideally, we could pass a timeout parameter to run_command so that it can timeout within the mozharness script, and still allow us to dump logcat.  I'll open a separate bug to track that.
Flags: needinfo?(jgriffin)
Sounds like a good plan, thank you :-)
Depends on: 892563
Spun off the last 3 failures to bug 899074.
The last many instances of this failure are a hang in one specific crashtest:

12:56:10     INFO -  REFTEST TEST-PASS | http://10.0.2.2:8888/tests/content/media/test/crashtests/878014.html | (LOAD ONLY)
Depends on: 904839
878014.html and 880724.html disabled for frequent timeouts.
https://hg.mozilla.org/integration/mozilla-inbound/rev/9df04b16a655