Closed Bug 1572563 Opened 8 months ago Closed 8 months ago

Android geckoview-mochitest frequent retries sometimes end in exception

Categories

(Testing :: General, defect, P1)

Version 3
defect

Tracking

(firefox70 fixed)

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: dvarga, Assigned: bc)

References

(Regression)

Details

Attachments

(4 files)

Most of the retries seem to have something like this:

https://taskcluster-artifacts.net/Z2lWhUpDSFKBq8quVvk6BA/1/public/logs/live_backing.log

[task 2019-08-08T16:52:40.620Z] 16:52:40     INFO -  Running manifest: dom/filesystem/tests/mochitest.ini
[task 2019-08-08T16:52:40.845Z] 16:52:40     INFO -  runtests.py | Failed to copy /builds/worker/workspace/build/tests/mochitest/hyphenation to profile
[task 2019-08-08T16:52:40.972Z] 16:52:40     INFO -  pk12util: PKCS12 IMPORT SUCCESSFUL
[task 2019-08-08T16:52:41.028Z] 16:52:41     INFO -  MochitestServer : launching [u'/builds/worker/workspace/build/hostutils/host-utils-68.0a1.en-US.linux-x86_64/xpcshell', '-g', '/builds/worker/workspace/build/hostutils/host-utils-68.0a1.en-US.linux-x86_64', '-f', '/builds/worker/workspace/build/hostutils/host-utils-68.0a1.en-US.linux-x86_64/components/httpd.js', '-e', "const _PROFILE_PATH = '/tmp/tmp8XMVKB.mozrunner'; const _SERVER_PORT = '8854'; const _SERVER_ADDR = '10.0.2.2'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', '/builds/worker/workspace/build/tests/mochitest/server.js']
[task 2019-08-08T16:52:41.028Z] 16:52:41     INFO -  runtests.py | Server pid: 4817
[task 2019-08-08T16:52:41.031Z] 16:52:41     INFO -  runtests.py | Websocket server pid: 4820
[task 2019-08-08T16:52:41.038Z] 16:52:41     INFO -  runtests.py | SSL tunnel pid: 4823
[task 2019-08-08T16:52:41.407Z] 16:52:41     INFO -  runtests.py | Running with scheme: http
[task 2019-08-08T16:52:41.408Z] 16:52:41     INFO -  runtests.py | Running with e10s: True
[task 2019-08-08T16:52:41.408Z] 16:52:41     INFO -  runtests.py | Running with serviceworker_e10s: False
[task 2019-08-08T16:52:41.408Z] 16:52:41     INFO -  runtests.py | Running with socketprocess_e10s: False
[task 2019-08-08T16:52:41.408Z] 16:52:41     INFO -  runtests.py | Running tests: start.
[task 2019-08-08T16:52:41.614Z] 16:52:41     INFO -  Traceback (most recent call last):
[task 2019-08-08T16:52:41.614Z] 16:52:41     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2862, in doTests
[task 2019-08-08T16:52:41.614Z] 16:52:41     INFO -      e10s=options.e10s
[task 2019-08-08T16:52:41.614Z] 16:52:41     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtestsremote.py", line 339, in runApp
[task 2019-08-08T16:52:41.614Z] 16:52:41     INFO -      ret, _ = self.automation.runApp(*args, **kwargs)
[task 2019-08-08T16:52:41.614Z] 16:52:41     INFO -    File "/builds/worker/workspace/build/tests/mochitest/remoteautomation.py", line 47, in runApp
[task 2019-08-08T16:52:41.614Z] 16:52:41     INFO -      self.device.rm(self.remoteLog, root=True)
[task 2019-08-08T16:52:41.614Z] 16:52:41     INFO -    File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 2396, in rm
[task 2019-08-08T16:52:41.615Z] 16:52:41     INFO -      self.shell_output("%s %s" % (cmd, path), timeout=timeout, root=root)
[task 2019-08-08T16:52:41.615Z] 16:52:41     INFO -    File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1559, in shell_output
[task 2019-08-08T16:52:41.615Z] 16:52:41 CRITICAL -      raise ADBProcessError(adb_process)
[task 2019-08-08T16:52:41.615Z] 16:52:41 CRITICAL -  ADBProcessError: args: adb wait-for-device shell rm /sdcard/tests/logs/mochitest.log, exitcode: 1, stdout: rm: /sdcard/tests/logs/mochitest.log: No such file or directory
[task 2019-08-08T16:52:41.615Z] 16:52:41     INFO -  1922 ERROR Automation Error: Received unexpected exception while running application

This seems to happen on a variety of manifests. Random startup crash?

Assignee: nobody → gbrown
Component: Mochitest → General
Priority: -- → P1
Summary: Android geckoview-mochitest retrigerrs turn in exception → Android geckoview-mochitest frequent retries sometimes end in exception

(In reply to Geoff Brown [:gbrown] from comment #1)

Random startup crash?

I don't see evidence of a crash in associated logcats. Actually, it looks like the process hasn't been launched yet -- we're just ensuring that the log file, if it still exists after the last run, is being deleted:

        if self.device.is_file(self.remoteLog):
            self.device.rm(self.remoteLog, root=True)

(In reply to Daniel Varga [:dvarga] from comment #0)

Started from: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=260657953&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=ae70794d4b51cece91ee79131ce4f66a373f8ad0

I see this happening much earlier. I think it may have started with bug 1571501.

I think there is a similar but less frequent problem in web-platform tests, with exceptions raised at:

https://searchfox.org/mozilla-central/rev/9775cca0a10a9b5c5f4e15c8f7b3eff5bf91bbd0/testing/mozbase/mozrunner/mozrunner/devices/base.py#109

https://taskcluster-artifacts.net/dKdI5bW6S26hwWawUTZHHA/0/public/logs/live_backing.log

[task 2019-08-06T17:45:02.961Z] 17:45:02 CRITICAL - Failure during init Traceback (most recent call last):
[task 2019-08-06T17:45:02.961Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 222, in init
[task 2019-08-06T17:45:02.961Z] 17:45:02 CRITICAL -     self.browser.start(group_metadata=group_metadata, **self.browser_settings)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox_android.py", line 173, in start
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     interactive=self.debug_info and self.debug_info.interactive)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/base/device.py", line 66, in start
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     self.device.setup_profile(self.profile)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 109, in setup_profile
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     self.device.pull(self.app_ctx.remote_profiles_ini, local_profiles_ini.name)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 2334, in pull
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     self.command_output(["pull", remote, local], timeout=timeout)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1153, in command_output
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     timeout=timeout)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 315, in command_output
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     raise ADBProcessError(adb_process)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL - ADBProcessError: args: /builds/worker/workspace/build/android-sdk-linux/platform-tools/adb wait-for-device pull /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini /tmp/tmp97lZbT, exitcode: 1, stdout: adb: error: remote object '/data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini' does not exist
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL - 
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL - Traceback (most recent call last):
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 222, in init
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     self.browser.start(group_metadata=group_metadata, **self.browser_settings)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox_android.py", line 173, in start
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     interactive=self.debug_info and self.debug_info.interactive)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/base/device.py", line 66, in start
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     self.device.setup_profile(self.profile)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 109, in setup_profile
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     self.device.pull(self.app_ctx.remote_profiles_ini, local_profiles_ini.name)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 2334, in pull
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     self.command_output(["pull", remote, local], timeout=timeout)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1153, in command_output
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     timeout=timeout)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 315, in command_output
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL -     raise ADBProcessError(adb_process)
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL - ADBProcessError: args: /builds/worker/workspace/build/android-sdk-linux/platform-tools/adb wait-for-device pull /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini /tmp/tmp97lZbT, exitcode: 1, stdout: adb: error: remote object '/data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini' does not exist
[task 2019-08-06T17:45:02.962Z] 17:45:02 CRITICAL - 

(In reply to Geoff Brown [:gbrown] from comment #3)

Watching backfills at:

https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=geckoview%2Cmochitest&tochange=328a9d64b35763e183e7f0f2112c0f5fe372a504&fromchange=cb67d2ac1d73308a7a226bf15da5151da59a89fb

I think that confirms bug 1571501 (most of the mochitest retries in that range have the ADBProcessError shown in comment 1).

:bc - Can you take it from here? (I'm on pto next week).

Flags: needinfo?(bob)
Regressed by: 1571501

Sure. Thanks for the diagnosis.

Assignee: gbrown → bob

Why would bug 1571501 trigger this? Does it mean that the adb exit code intermittently differs from the shell $?, at least for 'test', on emulators?

That's what I am beginning to believe. I just submitted another patch on top of my stack to try that reverts to using echo adb_returncode=$? at the end of the shell command.

Things were a bit better but I still had a couple of issues though most were claim expired or ADBTimeoutErrors. I've got some more ideas on how to proceed.

Duplicate of this bug: 1572979

I ran into a weird instance

[task 2019-08-10T21:05:12.117Z] 21:05:12     INFO -  gtest INFO | adbd not restarted as root
...
[task 2019-08-10T21:05:15.979Z] 21:05:15     INFO -  Traceback (most recent call last):
[task 2019-08-10T21:05:15.979Z] 21:05:15     INFO -    File "/builds/worker/workspace/build/tests/gtest/remotegtests.py", line 396, in main
[task 2019-08-10T21:05:15.979Z] 21:05:15     INFO -      options.symbols_path, options.enable_webrender)
[task 2019-08-10T21:05:15.979Z] 21:05:15     INFO -    File "/builds/worker/workspace/build/tests/gtest/remotegtests.py", line 89, in run_gtest
[task 2019-08-10T21:05:15.979Z] 21:05:15     INFO -      raise Exception("a device with a root shell is required to run Android gtest")
[task 2019-08-10T21:05:15.979Z] 21:05:15     INFO -  Exception: a device with a root shell is required to run Android gtest
[task 2019-08-10T21:05:17.203Z] 21:05:17    ERROR - Return code: 1

which shouldn't be possible with an emulator.

This try run is looking pretty good. I am going do a run from tip then with my patches applied and then compare.

Attachment #9084331 - Attachment description: Bug 1572563 - [mozdevice 3.0.5] - work around intermittent test path failures on emulators, r=gbrown,jmaher. → Bug 1572563 - [mozdevice 3.0.5] - work around intermittent test path failures on emulators, r=jmaher.
Attachment #9084332 - Attachment description: Bug 1572563 - [mozdevice 3.0.5] - execute ls to determine which instance is available, r=gbrown,jmaher. → Bug 1572563 - [mozdevice 3.0.5] - repeatedly attempt to detect ls through execution to determine which instance is available, r=jmaher.

I did a try push from tip followed by a try push with this stack of patches. This isn't perfect as there is one instance where it appears a file/directory detection failed, the situation is much improved.

After

18 ADBTimeoutError
 3 claim expired
 1 bogomips
 1 ADBError: cp: /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini.orig: Not a directory

Before

 1 ADBError: cp: /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini.orig: Not a directory 
 4 ADBProcessError: args: /builds/worker/workspace/build/android-sdk-linux/platform-tools/adb wait-for-device pull /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini /tmp/tmpAYhawt, exitcode: 1, stdout: adb: error: remote object '/data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini' does not exist
 6 ADBProcessError: args: /builds/worker/workspace/build/android-sdk-linux/platform-tools/adb wait-for-device shell cp -R /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini.orig, exitcode: 1, stdout: cp: /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini: No such file or directory
 5 ADBProcessError: args: /builds/worker/workspace/build/android-sdk-linux/platform-tools/adb wait-for-device shell rm -r /sdcard/tests/profile, exitcode: 1, stdout: rm: /sdcard/tests/profile: No such file or directory
19 ADBProcessError: args: adb wait-for-device shell rm /sdcard/tests/logs/mochitest.log, exitcode: 1, stdout: rm: /sdcard/tests/logs/mochitest.log: No such file or directory
16 ADBTimeoutError
 1 Unable to start emulator after 5 attempts
 4 bogomips
Flags: needinfo?(bob)
Pushed by bclary@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/47e1bbd29bcb
[mozdevice 3.0.5] - work around intermittent test path failures on emulators, r=jmaher.
https://hg.mozilla.org/integration/autoland/rev/5600635bd564
[mozdevice 3.0.5] - repeatedly attempt to detect ls through execution to determine which instance is available, r=gbrown,jmaher.
https://hg.mozilla.org/integration/autoland/rev/0ed1545e14f0
[mozdevice 3.0.5] - Continue to use echo adb_returncode=0 hack for emulators regardless of Android version, r=jmaher.
https://hg.mozilla.org/integration/autoland/rev/0281da0505bd
[mozdevice 3.0.5] - sync file system after modifications, r=jmaher.
See Also: → 1573555

We are still getting retries on web-platform-tests due to the cp issues. I have filed bug 1573555 to track the issue. It appears to me that it is a wpt or mozrunner issue and not an emulator or adb.py issue. Otherwise this is much improved from what I can tell.

You need to log in before you can comment on or make changes to this bug.