Closed Bug 989048 Opened 10 years ago Closed 9 years ago

Frequent "marionette.errors.TimeoutException: TimeoutException: timed out waiting for emulator to start" or " mozrunner.errors.TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs

Categories

(Testing :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(firefox36 fixed, firefox37 fixed, firefox38 fixed, firefox-esr31 unaffected, b2g-v1.4 wontfix, b2g-v2.0 wontfix, b2g-v2.0M wontfix, b2g-v2.1 fixed, b2g-v2.1S fixed, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
mozilla38
Tracking Status
firefox36 --- fixed
firefox37 --- fixed
firefox38 --- fixed
firefox-esr31 --- unaffected
b2g-v1.4 --- wontfix
b2g-v2.0 --- wontfix
b2g-v2.0M --- wontfix
b2g-v2.1 --- fixed
b2g-v2.1S --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: RyanVM, Assigned: ting)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 6 obsolete files)

I don't know where to file this because I'm not sure where to start looking for a cause. This started within the last few days and could be infra or could be some recent harness change. These are particularly problematic because TBPL can't easily star them, so they usually end up in the "b;r" bucket instead.

https://tbpl.mozilla.org/php/getParsedLog.php?id=36824620&tree=B2g-Inbound

b2g_emulator_vm b2g-inbound opt test mochitest-3 on 2014-03-27 12:31:28 PDT for push d40ff3cec293
slave: tst-linux64-spot-790

12:33:21     INFO - Calling ['/builds/slave/test/build/venv/bin/python', 'runtestsb2g.py', '--adbpath=/builds/slave/test/build/emulator/b2g-distro/out/host/linux-x86/bin/adb', '--b2gpath=/builds/slave/test/build/emulator/b2g-distro', '--console-level=INFO', '--emulator=arm', '--logcat-dir=/builds/slave/test/build', '--remote-webserver=10.0.2.2', '', '--xre-path=/builds/slave/test/build/xre/bin', '--symbols-path=http://pvtbuilds.pvt.build.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/b2g-inbound-emulator/20140327120133/b2g-31.0a1.en-US.android-arm.crashreporter-symbols.zip', '--busybox=/builds/slave/test/build/busybox', '--total-chunks=9', '--this-chunk=3'] with output_timeout 1000
12:33:21     INFO -  /builds/slave/test/build/venv/local/lib/python2.7/site-packages/mozrunner/utils.py:19: UserWarning: Module manifestparser was already imported from /builds/slave/test/build/tests/mochitest/manifestparser.py, but /builds/slave/test/build/venv/lib/python2.7/site-packages is being added to sys.path
12:33:21     INFO -    import pkg_resources
12:34:26     INFO -  Traceback (most recent call last):
12:34:26     INFO -    File "runtestsb2g.py", line 389, in <module>
12:34:26     INFO -      main()
12:34:26     INFO -    File "runtestsb2g.py", line 386, in main
12:34:26     INFO -      run_remote_mochitests(parser, options)
12:34:26     INFO -    File "runtestsb2g.py", line 310, in run_remote_mochitests
12:34:26     INFO -      marionette = Marionette.getMarionetteOrExit(**kwargs)
12:34:26     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette.py", line 556, in getMarionetteOrExit
12:34:26     INFO -      m = cls(*args, **kwargs)
12:34:26     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette.py", line 503, in __init__
12:34:26     INFO -      self.emulator.start()
12:34:26     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/emulator.py", line 351, in start
12:34:26     INFO -      raise Exception('timed out waiting for emulator to start')
12:34:26     INFO -  Exception: timed out waiting for emulator to start
12:34:26    ERROR - Return code: 1
Flags: needinfo?(ahalberstadt)
FWIW, I suspect this is from a harness change because it only appears to affect trunk. Aurora and b2g28 do not appear to be hitting it.
I can't think of any harness changes that landed recently. This is what's failing:
http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/emulator.py#346

It's looking for a change in adb devices after launching the emulator. So either the emulator isn't starting up or there's some problem with how we are using adb devices. I don't see any problems with the latter after a quick look.
Flags: needinfo?(ahalberstadt)
I'd guess I'm starring in the neighborhood of 40-50 a day (as "b", typing the ";r" is too much trouble).
Not sure if it's related or not, but the qemu.logs that get uploaded to blobber just seem to be garbage. This is true for green runs too though. Having access to this might tell us what is happening.
I'll also land some debug output on Cedar. Unfortunately b2g mochitests are currently b0rked there, will need to hunt down who broke them first.
Attached file Good qemu.log —
Turns out the qemu.logs were just gzipped but didn't have their content-type set. This is the qemu.log of a failing job. For comparison, here is the qemu.log of a successful run (use zcat to view it):
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/21cbf7857cdae4239d1ce842d6138db14021c08563b4f6d1698b40b78d2c8fff584efa6b4eca05764fe6b1ff6bb533a1c68bda12f677fbd857e830d1c32af837

The difference seems to be the first line:
NAND: could not write file /tmp/android-cltbld/emulator-TPKvSG, Resource temporarily unavailable

Seems like something isn't always getting cleaned up properly? Not sure if this is a slave, test harness or emulator build problem.
Comment on attachment 8399429 [details]
Good qemu.log

I accidentally uploaded the good log.
Attachment #8399429 - Attachment description: Bad qemu.log → Good qemu.log
Attached file Bad qemu.log —
Here is the bad one.
We could look and see if any of the emulator repos changed around March 26-27.  There don't seem to be any RelEng changes in that timeframe that look like possible culprits:  https://wiki.mozilla.org/ReleaseEngineering/Maintenance
There are a couple of changes in this timeframe from non-gecko/gaia sources; not sure if any could be a culprit:

  {
    "repo": "http://git.mozilla.org/?p=external/apitrace/apitrace.git", 
    "commits": [
      {
        "commit": "glretrace: Don't rely on kCGLOGLPVersion_GL4_Core.", 
        "author": "Jos\u00e9 Fonseca", 
        "revision": "89c5816399e71bda92a8959b5b771c04d6672ea3"
      }, 
      {
        "commit": "Revert \"qapitrace: fix an inline string FIXME request\"", 
        "author": "Jos\u00e9 Fonseca", 
        "revision": "cb7c76fe29110ac98cdd2677e42370fe7e6b8796"
      }, 
      {
        "commit": "qapitrace: fix an inline string FIXME request", 
        "author": "Lawrence L Love", 
        "revision": "0d86efd38a375c526491c679f4d2185ebe51434e"
      }, 
      {
        "commit": "glstate: Describe a few more transform feedback states.", 
        "author": "Jos\u00e9 Fonseca", 
        "revision": "3d91a979947afe25314882a47db1b2853b23b676"
      }, 
      {
        "commit": "glretrace: avoid glXSwapBuffers failures because of... ", 
        "author": "Brian Paul", 
        "revision": "485846b2a40d8ac7d6c1c5f8af6d15b0c10af19d"
      }
    ], 
  }, 
  {
    "repo": "http://git.mozilla.org/?p=b2g/moztt.git", 
    "commits": [
      {
        "commit": "Merge pull request #33 from jfkthame/woff", 
        "author": "Ryan VanderMeulen", 
        "revision": "5b93c7150acac5f657675b91889d828cc2b532e3"
      }, 
      {
        "commit": "If the product defines MOZ_PRODUCT_COMPRESS_FONTS:... ", 
        "author": "Jonathan Kew", 
        "revision": "79b0cbfbded935c8acce4d16dd41f3d7d0dee182"
      }, 
      {
        "commit": "Merge pull request #31 from jfkthame/khmer", 
        "author": "Ryan VanderMeulen", 
        "revision": "cb16958e41105d7c551d9941f522db97b8312538"
      }
    ], 

Michael, Gregor, could any of these cause the emulator to periodically fail to start?  See the error at the top of the attachment in comment #10.
Actually this is starting to look like an infrastructure issue.  Many of the occurrences of this problem today have been on the slave tst-linux64-spot-794.  Can someone in rel-eng check this instance and see if /tmp/android-cltbld/ is inaccessible there for some reason?
Could we make this a TimeoutException rather than Exception:
https://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/emulator.py#351

...so that TBPL staring working? (or alternatively adjust the Exception message)
Flags: needinfo?(jgriffin)
(In reply to Ed Morley [:edmorley UTC+0] from comment #15)
> Could we make this a TimeoutException rather than Exception:
> https://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/
> marionette/emulator.py#351
> 
> ...so that TBPL staring working? (or alternatively adjust the Exception
> message)

Yep, I'll take care of it today.
Flags: needinfo?(jgriffin)
Assignee: nobody → jgriffin
Attachment #8400082 - Flags: review?(mdas) → review+
Summary: Frequent "Exception: timed out waiting for emulator to start" errors on B2G mochitest runs → Frequent "TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs
https://tbpl.mozilla.org/php/getParsedLog.php?id=37139085&tree=Mozilla-Inbound
Summary: Frequent "TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs → Frequent "marionette.errors.TimeoutException: TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs
Comment on attachment 8400082 [details] [diff] [review]
Use TimeoutException instead of Exception for emulator startup timeouts,

Landed on aurora too; a=test-only
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/797d9a31a0d8
https://hg.mozilla.org/mozilla-central/rev/d0118120480d
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
The patch just helps with starring, I meant to add a leave-open.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → NEW
Assignee: jgriffin → nobody
I update my B2G and try to run mochitest on my linux PC. It can always reproduce. Here is my log. Can anyone help?

/home/steven/workspace/b2g/emulator2/gecko/python/mach/mach/main.py:246: UserWarning: Module mozprocess was already imported from /home/steven/workspace/b2g/emulator2/gecko/testing/mozbase/mozprocess/mozprocess/__init__.pyc, but /usr/local/lib/python2.7/dist-packages/mozprocess-0.5-py2.7.egg is being added to sys.path
  import pkg_resources
From _tests: Kept 12485 existing; Added/updated 0; Removed 0 files and 0 directories.
Error running mach:

    ['mochitest-remote']

The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.

You should consider filing a bug for this issue.

If filing a bug, please include the full output of mach, including this error
message.

The details of the failure are as follows:

TimeoutException: TimeoutException: timed out waiting for emulator to start


  File "/home/steven/workspace/b2g/emulator2/gecko/testing/mochitest/mach_commands.py", line 714, in run_mochitest_remote
    xre_path=self.xre_path, test_paths=test_paths, **kwargs)
  File "/home/steven/workspace/b2g/emulator2/gecko/testing/mochitest/mach_commands.py", line 184, in run_b2g_test
    return mochitest.run_remote_mochitests(parser, options)
  File "/home/steven/workspace/b2g/emulator2/objdir-gecko/_tests/testing/mochitest/runtestsb2g.py", line 303, in run_remote_mochitests
    marionette = Marionette.getMarionetteOrExit(**kwargs)
  File "/home/steven/workspace/b2g/emulator2/gecko/testing/marionette/client/marionette/marionette.py", line 565, in getMarionetteOrExit
    m = cls(*args, **kwargs)
  File "/home/steven/workspace/b2g/emulator2/gecko/testing/marionette/client/marionette/marionette.py", line 512, in __init__
    self.emulator.start()
  File "/home/steven/workspace/b2g/emulator2/gecko/testing/marionette/client/marionette/emulator.py", line 343, in start
    raise TimeoutException('timed out waiting for emulator to start')
(In reply to StevenLee[:slee] from comment #147)
> I update my B2G and try to run mochitest on my linux PC. It can always
> reproduce. Here is my log. Can anyone help?
I found the problem on my PC. It's because I'm running mochitest from screen and it does not have x-window. When I running mochitest from the terminal, it works fine.
https://tbpl.mozilla.org/php/getParsedLog.php?id=42940658&tree=Mozilla-Central
Summary: Frequent "marionette.errors.TimeoutException: TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs → Frequent "marionette.errors.TimeoutException: TimeoutException: timed out waiting for emulator to start" or " mozrunner.errors.TimeoutException: timed out waiting for emulator to start" errors on B2G mochitest runs