Closed Bug 989048 Opened 11 years ago Closed 10 years ago

Frequent "marionette.errors.TimeoutException: TimeoutException: timed out waiting for emulator to start" or " mozrunner.errors.TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs

Categories

(Testing :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(firefox36 fixed, firefox37 fixed, firefox38 fixed, firefox-esr31 unaffected, b2g-v1.4 wontfix, b2g-v2.0 wontfix, b2g-v2.0M wontfix, b2g-v2.1 fixed, b2g-v2.1S fixed, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
mozilla38
Tracking Status
firefox36 --- fixed
firefox37 --- fixed
firefox38 --- fixed
firefox-esr31 --- unaffected
b2g-v1.4 --- wontfix
b2g-v2.0 --- wontfix
b2g-v2.0M --- wontfix
b2g-v2.1 --- fixed
b2g-v2.1S --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: RyanVM, Assigned: ting)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 6 obsolete files)

I don't know where to file this because I'm not sure where to start looking for a cause. This started within the last few days and could be infra or could be some recent harness change. These are particularly problematic because TBPL can't easily star them, so they usually end up in the "b;r" bucket instead. https://tbpl.mozilla.org/php/getParsedLog.php?id=36824620&tree=B2g-Inbound b2g_emulator_vm b2g-inbound opt test mochitest-3 on 2014-03-27 12:31:28 PDT for push d40ff3cec293 slave: tst-linux64-spot-790 12:33:21 INFO - Calling ['/builds/slave/test/build/venv/bin/python', 'runtestsb2g.py', '--adbpath=/builds/slave/test/build/emulator/b2g-distro/out/host/linux-x86/bin/adb', '--b2gpath=/builds/slave/test/build/emulator/b2g-distro', '--console-level=INFO', '--emulator=arm', '--logcat-dir=/builds/slave/test/build', '--remote-webserver=10.0.2.2', '', '--xre-path=/builds/slave/test/build/xre/bin', '--symbols-path=http://pvtbuilds.pvt.build.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/b2g-inbound-emulator/20140327120133/b2g-31.0a1.en-US.android-arm.crashreporter-symbols.zip', '--busybox=/builds/slave/test/build/busybox', '--total-chunks=9', '--this-chunk=3'] with output_timeout 1000 12:33:21 INFO - /builds/slave/test/build/venv/local/lib/python2.7/site-packages/mozrunner/utils.py:19: UserWarning: Module manifestparser was already imported from /builds/slave/test/build/tests/mochitest/manifestparser.py, but /builds/slave/test/build/venv/lib/python2.7/site-packages is being added to sys.path 12:33:21 INFO - import pkg_resources 12:34:26 INFO - Traceback (most recent call last): 12:34:26 INFO - File "runtestsb2g.py", line 389, in <module> 12:34:26 INFO - main() 12:34:26 INFO - File "runtestsb2g.py", line 386, in main 12:34:26 INFO - run_remote_mochitests(parser, options) 12:34:26 INFO - File "runtestsb2g.py", line 310, in run_remote_mochitests 12:34:26 INFO - marionette = Marionette.getMarionetteOrExit(**kwargs) 12:34:26 INFO - File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette.py", line 556, in getMarionetteOrExit 12:34:26 INFO - m = cls(*args, **kwargs) 12:34:26 INFO - File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette.py", line 503, in __init__ 12:34:26 INFO - self.emulator.start() 12:34:26 INFO - File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/emulator.py", line 351, in start 12:34:26 INFO - raise Exception('timed out waiting for emulator to start') 12:34:26 INFO - Exception: timed out waiting for emulator to start 12:34:26 ERROR - Return code: 1
Flags: needinfo?(ahalberstadt)
FWIW, I suspect this is from a harness change because it only appears to affect trunk. Aurora and b2g28 do not appear to be hitting it.
I can't think of any harness changes that landed recently. This is what's failing: http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/emulator.py#346 It's looking for a change in adb devices after launching the emulator. So either the emulator isn't starting up or there's some problem with how we are using adb devices. I don't see any problems with the latter after a quick look.
Flags: needinfo?(ahalberstadt)
I'd guess I'm starring in the neighborhood of 40-50 a day (as "b", typing the ";r" is too much trouble).
Not sure if it's related or not, but the qemu.logs that get uploaded to blobber just seem to be garbage. This is true for green runs too though. Having access to this might tell us what is happening.
I'll also land some debug output on Cedar. Unfortunately b2g mochitests are currently b0rked there, will need to hunt down who broke them first.
Attached file Good qemu.log —
Turns out the qemu.logs were just gzipped but didn't have their content-type set. This is the qemu.log of a failing job. For comparison, here is the qemu.log of a successful run (use zcat to view it): http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/21cbf7857cdae4239d1ce842d6138db14021c08563b4f6d1698b40b78d2c8fff584efa6b4eca05764fe6b1ff6bb533a1c68bda12f677fbd857e830d1c32af837 The difference seems to be the first line: NAND: could not write file /tmp/android-cltbld/emulator-TPKvSG, Resource temporarily unavailable Seems like something isn't always getting cleaned up properly? Not sure if this is a slave, test harness or emulator build problem.
Comment on attachment 8399429 [details] Good qemu.log I accidentally uploaded the good log.
Attachment #8399429 - Attachment description: Bad qemu.log → Good qemu.log
Attached file Bad qemu.log —
Here is the bad one.
We could look and see if any of the emulator repos changed around March 26-27. There don't seem to be any RelEng changes in that timeframe that look like possible culprits: https://wiki.mozilla.org/ReleaseEngineering/Maintenance
There are a couple of changes in this timeframe from non-gecko/gaia sources; not sure if any could be a culprit: { "repo": "http://git.mozilla.org/?p=external/apitrace/apitrace.git", "commits": [ { "commit": "glretrace: Don't rely on kCGLOGLPVersion_GL4_Core.", "author": "Jos\u00e9 Fonseca", "revision": "89c5816399e71bda92a8959b5b771c04d6672ea3" }, { "commit": "Revert \"qapitrace: fix an inline string FIXME request\"", "author": "Jos\u00e9 Fonseca", "revision": "cb7c76fe29110ac98cdd2677e42370fe7e6b8796" }, { "commit": "qapitrace: fix an inline string FIXME request", "author": "Lawrence L Love", "revision": "0d86efd38a375c526491c679f4d2185ebe51434e" }, { "commit": "glstate: Describe a few more transform feedback states.", "author": "Jos\u00e9 Fonseca", "revision": "3d91a979947afe25314882a47db1b2853b23b676" }, { "commit": "glretrace: avoid glXSwapBuffers failures because of... ", "author": "Brian Paul", "revision": "485846b2a40d8ac7d6c1c5f8af6d15b0c10af19d" } ], }, { "repo": "http://git.mozilla.org/?p=b2g/moztt.git", "commits": [ { "commit": "Merge pull request #33 from jfkthame/woff", "author": "Ryan VanderMeulen", "revision": "5b93c7150acac5f657675b91889d828cc2b532e3" }, { "commit": "If the product defines MOZ_PRODUCT_COMPRESS_FONTS:... ", "author": "Jonathan Kew", "revision": "79b0cbfbded935c8acce4d16dd41f3d7d0dee182" }, { "commit": "Merge pull request #31 from jfkthame/khmer", "author": "Ryan VanderMeulen", "revision": "cb16958e41105d7c551d9941f522db97b8312538" } ], Michael, Gregor, could any of these cause the emulator to periodically fail to start? See the error at the top of the attachment in comment #10.
Actually this is starting to look like an infrastructure issue. Many of the occurrences of this problem today have been on the slave tst-linux64-spot-794. Can someone in rel-eng check this instance and see if /tmp/android-cltbld/ is inaccessible there for some reason?
Could we make this a TimeoutException rather than Exception: https://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/emulator.py#351 ...so that TBPL staring working? (or alternatively adjust the Exception message)
Flags: needinfo?(jgriffin)
(In reply to Ed Morley [:edmorley UTC+0] from comment #15) > Could we make this a TimeoutException rather than Exception: > https://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/ > marionette/emulator.py#351 > > ...so that TBPL staring working? (or alternatively adjust the Exception > message) Yep, I'll take care of it today.
Flags: needinfo?(jgriffin)
Assignee: nobody → jgriffin
Attachment #8400082 - Flags: review?(mdas) → review+
Summary: Frequent "Exception: timed out waiting for emulator to start" errors on B2G mochitest runs → Frequent "TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs
Summary: Frequent "TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs → Frequent "marionette.errors.TimeoutException: TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs
Comment on attachment 8400082 [details] [diff] [review] Use TimeoutException instead of Exception for emulator startup timeouts, Landed on aurora too; a=test-only remote: https://hg.mozilla.org/releases/mozilla-aurora/rev/797d9a31a0d8
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
The patch just helps with starring, I meant to add a leave-open.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → NEW
Assignee: jgriffin → nobody
I update my B2G and try to run mochitest on my linux PC. It can always reproduce. Here is my log. Can anyone help? /home/steven/workspace/b2g/emulator2/gecko/python/mach/mach/main.py:246: UserWarning: Module mozprocess was already imported from /home/steven/workspace/b2g/emulator2/gecko/testing/mozbase/mozprocess/mozprocess/__init__.pyc, but /usr/local/lib/python2.7/dist-packages/mozprocess-0.5-py2.7.egg is being added to sys.path import pkg_resources From _tests: Kept 12485 existing; Added/updated 0; Removed 0 files and 0 directories. Error running mach: ['mochitest-remote'] The error occurred in code that was called by the mach command. This is either a bug in the called code itself or in the way that mach is calling it. You should consider filing a bug for this issue. If filing a bug, please include the full output of mach, including this error message. The details of the failure are as follows: TimeoutException: TimeoutException: timed out waiting for emulator to start File "/home/steven/workspace/b2g/emulator2/gecko/testing/mochitest/mach_commands.py", line 714, in run_mochitest_remote xre_path=self.xre_path, test_paths=test_paths, **kwargs) File "/home/steven/workspace/b2g/emulator2/gecko/testing/mochitest/mach_commands.py", line 184, in run_b2g_test return mochitest.run_remote_mochitests(parser, options) File "/home/steven/workspace/b2g/emulator2/objdir-gecko/_tests/testing/mochitest/runtestsb2g.py", line 303, in run_remote_mochitests marionette = Marionette.getMarionetteOrExit(**kwargs) File "/home/steven/workspace/b2g/emulator2/gecko/testing/marionette/client/marionette/marionette.py", line 565, in getMarionetteOrExit m = cls(*args, **kwargs) File "/home/steven/workspace/b2g/emulator2/gecko/testing/marionette/client/marionette/marionette.py", line 512, in __init__ self.emulator.start() File "/home/steven/workspace/b2g/emulator2/gecko/testing/marionette/client/marionette/emulator.py", line 343, in start raise TimeoutException('timed out waiting for emulator to start')
(In reply to StevenLee[:slee] from comment #147) > I update my B2G and try to run mochitest on my linux PC. It can always > reproduce. Here is my log. Can anyone help? I found the problem on my PC. It's because I'm running mochitest from screen and it does not have x-window. When I running mochitest from the terminal, it works fine.
Summary: Frequent "marionette.errors.TimeoutException: TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs → Frequent "marionette.errors.TimeoutException: TimeoutException: timed out waiting for emulator to start" or " mozrunner.errors.TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs
The qemu.logs of many of those failures look the same as the qemu.logs of passing jobs except they don't get as far. The failed logs are also all at various stages of completion. This might just be a timeout that needs increasing.
I ran into the same issue, and adding "-wipe-data" to Emulator's qemu_args [1] does help resolve it. But I am not sure should it be always set or like no_window as an argument for B2GEmulatorRunner. [1] http://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/mozrunner/devices/emulator.py#72
Attached patch Do not overwrite emulator's user data image (obsolete) — — Splinter Review
I made a patch which does not overwrite emulator's user data image. But I am not sure how should this be tested on Try?
Attachment #8530098 - Flags: review?(ahalberstadt)
Attached patch Do not overwrite emulator's user data image v2 (obsolete) — — Splinter Review
Fix typo.
Attachment #8530098 - Attachment is obsolete: true
Attachment #8530098 - Flags: review?(ahalberstadt)
Attachment #8530202 - Flags: review?(ahalberstadt)
Comment on attachment 8530202 [details] [diff] [review] Do not overwrite emulator's user data image v2 Review of attachment 8530202 [details] [diff] [review]: ----------------------------------------------------------------- Looks reasonable to me. r+ assuming this doesn't break tests on try. You can use the syntax: try: -b do -p emulator -u all -t none
Attachment #8530202 - Flags: review?(ahalberstadt) → review+
mdas, zac: Would passing in '-wipe-data' to qemu break anything you guys are working on/maintaining? It sounds kind of disruptive, but I'm not entirely sure what it implies.
Flags: needinfo?(zcampbell)
Flags: needinfo?(mdas)
Nothing I work on is affected by that patch.
Flags: needinfo?(zcampbell)
(In reply to Andrew Halberstadt [:ahal] from comment #609) > mdas, zac: Would passing in '-wipe-data' to qemu break anything you guys are > working on/maintaining? It sounds kind of disruptive, but I'm not entirely > sure what it implies. Actually it wipes nothing as the user-data disk image is a temporary file, it copies the content from initdata though.
(In reply to Andrew Halberstadt [:ahal] from comment #608) > Looks reasonable to me. r+ assuming this doesn't break tests on try. You can > use the syntax: > > try: -b do -p emulator -u all -t none Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9edbd1b4ac82
(In reply to Ting-Yu Chou [:ting] from comment #612) > (In reply to Andrew Halberstadt [:ahal] from comment #608) > > Looks reasonable to me. r+ assuming this doesn't break tests on try. You can > > use the syntax: > > > > try: -b do -p emulator -u all -t none > > Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9edbd1b4ac82 Try looks good, but should we cook this on inbound for a few days before moving into central see if it really fixes the issue?
Flags: needinfo?(ahalberstadt)
Attached patch Do not overwrite emulator's user data image v3 (obsolete) — — Splinter Review
Added r=ahal.
Attachment #8530202 - Attachment is obsolete: true
Merges from b2g-inbound to mozilla-central occur on a multiple times daily basis. We can't hold them up for one patch.
Assignee: nobody → tchou
Target Milestone: mozilla31 → ---
Flags: needinfo?(ahalberstadt)
Oh, I see. Thanks Ryan.
Keywords: checkin-needed
sure, wiping user data won't affect any of our tests.
Flags: needinfo?(mdas)
(In reply to TBPL Robot from comment #620) This is from after the landing in comment 619 :(
:( I'll see if I can reproduce locally after back to Taipei.
(In reply to Ting-Yu Chou [:ting] from comment #623) > I'll see if I can reproduce locally after back to Taipei. Haven't reproduced with my local built emulator and on Try [1]. If I just want to add some debug log, how can I do this without sending a patch to m-c? [1] https://tbpl.mozilla.org/?tree=Try&rev=956d5cb11d6b
I just reproduced the issue once on Try with my debug logs [1], so the emulator is not offline but really not shown. Compare the qemu.log [2] with a good one [3], it stops after a "Failed to load libGL.so", and the good one has two the same messages. I am not sure is the message related to the issue, but it should be fixed IMO. I don't see that message when I run my local built emulator, and I have libgl1-mesa-dev installed on my Ubuntu 14.04 (x64) by following the instructions at [4]. Ryan, do you know how could I check and maybe fix this on test machine? [1] https://tbpl.mozilla.org/php/getParsedLog.php?id=55331988&tree=Try&full=1 [2] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/b6a3d6687c2394142e2604143e43a6f136f2be8ba9e9a8e0240d2b599af84b20226e0a193b31f2eed4fd247889144595319fa74dab6f9496d93e4c20f0eb0890 [3] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/46d1b66c5564a348d7965c4747665d3708a3b863aa864ae35dc838f67d6c422b7a06bb07b7554219dfefd3256042fc687bc154d076c81002e795819e7dc54e0b [4] https://developer.mozilla.org/en-US/Firefox_OS/Firefox_OS_build_prerequisites
Flags: needinfo?(ryanvm)
(In reply to Ting-Yu Chou [:ting] from comment #915) > a good one [3], it stops after a "Failed to load libGL.so", and the good one There's a bug about this: bug 1013634.
See Also: → 1013634
I'll look for the answer in bug 1013634 instead.
Flags: needinfo?(ryanvm)
Depends on: 1117395