ADBError: cp: can't stat '/data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini': No such file or directory
Categories
(Testing :: web-platform-tests, defect)
Tracking
(firefox70 fixed)
Tracking | Status | |
---|---|---|
firefox70 | --- | fixed |
People
(Reporter: bc, Assigned: bc)
References
Details
Attachments
(2 files, 2 obsolete files)
We are still getting excessive retries on Anroid emulators due to errors running web-platform-tests. Examples are:
task 2019-08-12T22:59:22.952Z] 22:59:22 CRITICAL - Failure during init Traceback (most recent call last):
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 222, in init
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - self.browser.start(group_metadata=group_metadata, **self.browser_settings)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox_android.py", line 173, in start
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - interactive=self.debug_info and self.debug_info.interactive)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/base/device.py", line 66, in start
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - self.device.setup_profile(self.profile)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 121, in setup_profile
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - self.backup_file(self.app_ctx.remote_profiles_ini)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 175, in backup_file
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - self.device.cp(remote_path, '%s.orig' % remote_path, recursive=True)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 2697, in cp
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - source)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - ADBError: cp: can't stat '/data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini': No such file or directory
The log tells us:
[task 2019-08-08T17:10:06.318Z] 17:10:06 INFO - Using adb 1.0.41
[task 2019-08-08T17:10:08.149Z] 17:10:08 INFO - adbd running as root
[task 2019-08-08T17:10:08.149Z] 17:10:08 INFO - adbd restarted as root
[task 2019-08-08T17:10:08.257Z] 17:10:08 INFO - su -c setenforce 0 exitcode 0, stdout: None
[task 2019-08-08T17:10:08.573Z] 17:10:08 INFO - su 0 supported
[task 2019-08-08T17:10:08.783Z] 17:10:08 INFO - /system/bin/ls -1A supported
[task 2019-08-08T17:10:08.887Z] 17:10:08 INFO - Native cp support: True
[task 2019-08-08T17:10:08.994Z] 17:10:08 INFO - Native chmod -R support: True
[task 2019-08-08T17:10:09.101Z] 17:10:09 INFO - Native chown -R support: True
[task 2019-08-08T17:10:09.207Z] 17:10:09 INFO - Setting SELinux Permissive
[task 2019-08-08T17:10:09.418Z] 17:10:09 INFO - Native pidof support: True
However the location of this raise ADBError
if not self.exists(source, timeout=timeout, root=root):
raise ADBError("cp: can't stat '%s': No such file or directory" %
source)
should never have been executed from what I can tell since it occurs after where the support for native cp is checked
if self._have_cp:
r = '-R' if recursive else ''
self.shell_output('cp %s %s %s' % (r, source, destination),
timeout=timeout, root=root)
self._sync(timeout=timeout)
return
wpt9 retry shows a similar error
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - Traceback (most recent call last):
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 222, in init
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - self.browser.start(group_metadata=group_metadata, **self.browser_settings)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox_android.py", line 173, in start
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - interactive=self.debug_info and self.debug_info.interactive)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/base/device.py", line 66, in start
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - self.device.setup_profile(self.profile)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 121, in setup_profile
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - self.backup_file(self.app_ctx.remote_profiles_ini)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 175, in backup_file
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - self.device.cp(remote_path, '%s.orig' % remote_path, recursive=True)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 2710, in cp
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - raise ADBError('cp: %s: Not a directory' % destination)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - ADBError: cp: /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini.orig: Not a directory
This seems to me to be an issue with either how mozrunner is using the device or with how web-platform-tests are using mozrunner.
In addition, I have noticed that even though these errors cause the job to be retried, the test run continues to run to completion rather than exiting early. I would think that if a retry error were found, we would terminate and not waste the cycles of continuing to run the other tests.
whimboo, jgraham: Can you take a look at this?
I'll take a look
There are two problems here: (1) the intermittent errors encountered when using cp on the device, and (2) the fact that tests continue to run when we've concluded the job status is "retry".
Maybe (1) isn't a major problem?
For (2), it's not just wpt. Here's an example of mochitests continuing to run after we conclude the status should be retry: ADBProcessError happens early on, but browser is restarted and tests continue to run for 20 more minutes.
I think the correct thing for wpt at least is to stop setting the retry status as soon as we see an adb-related error, which is what StructuredOutputParser does by default in the wpt mozharness script. It would make more sense to defer to wptrunner, which attempts to restart the browser, and set a retry status when it logs "Max restarts exceeded".
Here are some try pushes that do just that:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=35407d0ca6a1df0859cebc368f029f7ff4d26f71
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=8b1afb8d427e24a38015661f8512972ddc0fc61a
Assignee | ||
Comment 4•5 years ago
|
||
(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #2)
Maybe (1) isn't a major problem?
It's not that huge a deal but it concerns me that a piece of code which should not be executing when Native cp support: True is actually executing. I was thinking about it and maybe some stale/uninitialized instance of ADBDevice is being used somehow. If you don't have time to look into it I'll see what I can do.
For (2),
I'm not familiar with how wpt works and don't feel comfortable reviewing those changes, but I don't think this will improve things over the current situation.
ADBTimeoutErrors are special in that once it occurs with a device the connection to the device is essentially lost and any further attempts to access it will also time out. The default time out period is 5 minutes so depending on the attempts to access the device after the first ADBTimeoutError occurs you may end up waiting several multiples of 5 minutes for the test to end and if you are automatically restarting the test this could be a very long time indeed.
I see your try push is using the previous mozdevice 3.0.4 version which does not include the latest work arounds for emulator intermittent failures. To get a clearer picture of the current situation, can we do a try push for all web-platform-tests on android emulators from the tip of a recent pull and another with your patch applied so that we can get a clear before and after picture?
I'm also concerned/mystified by the cp behaviour. I have to prioritize other work now, but if you share your findings maybe I can pick it up later.
Good point about ADBTimeoutErrors! What about ADBError and ADBProcessError? Those don't seem to be as special; I see many examples of the harness recovering from those.
Assignee | ||
Comment 6•5 years ago
|
||
ADBError and ADBProcessError mean that a command exited with a non-zero exitcode on the device. It may be an intermittent like some of the emulator issues I've been chasing down or could be a more general logic issue. Typically we catch and handle expected cases and only let the real failures to propagate to the log. Hard to generalize them actually.
(In reply to Bob Clary [:bc:] from comment #4)
I see your try push is using the previous mozdevice 3.0.4 version which does not include the latest work arounds for emulator intermittent failures. To get a clearer picture of the current situation, can we do a try push for all web-platform-tests on android emulators from the tip of a recent pull
Oooh, those mozdevice changes look promising...
https://treeherder.mozilla.org/#/jobs?repo=try&revision=416007270737c4405fe0a68129ae303f2b22c0e3
and another with your patch applied so that we can get a clear before and after picture?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f69eec9abe3abb83369930eb2ab2441bee07ffbf
Updated•5 years ago
|
Updated•5 years ago
|
In any wpt job, I see that an ADBDevice is instantiated twice: first by mozharness in AndroidMixin at the verify-device step, then by wptrunner (call to runtests.py) at browser start-up, which uses FennecContext with device serial "emulator-5554".
Looking at the orange Wr3 with the cp error versus any other green wpt job, I can see that the second ADBDevice init reports that we do not have native cp. This is intermittent, yay.
From AndroidMixin - every wpt job:
task 2019-08-14T21:14:53.606Z] 21:14:53 INFO - Running main action method: verify_device
[task 2019-08-14T21:14:53.606Z] 21:14:53 INFO - >> Check emulator: Attempt #1 of 5
[task 2019-08-14T21:14:53.606Z] 21:14:53 INFO - >> Verify Android boot completed: Attempt #1 of 30
[task 2019-08-14T21:14:53.676Z] 21:14:53 INFO - Using adb 1.0.41
[task 2019-08-14T21:14:55.527Z] 21:14:55 INFO - adbd running as root
[task 2019-08-14T21:14:55.527Z] 21:14:55 INFO - adbd restarted as root
[task 2019-08-14T21:14:55.633Z] 21:14:55 INFO - su -c setenforce 0 exitcode 0, stdout: None
[task 2019-08-14T21:14:55.952Z] 21:14:55 INFO - su 0 supported
[task 2019-08-14T21:14:56.167Z] 21:14:56 INFO - /system/bin/ls -1A supported
[task 2019-08-14T21:14:56.273Z] 21:14:56 INFO - Native cp support: True
[task 2019-08-14T21:14:56.380Z] 21:14:56 INFO - Native chmod -R support: True
[task 2019-08-14T21:14:56.488Z] 21:14:56 INFO - Native chown -R support: True
[task 2019-08-14T21:14:56.591Z] 21:14:56 INFO - Setting SELinux Permissive
[task 2019-08-14T21:14:56.801Z] 21:14:56 INFO - Native pidof support: True
[task 2019-08-14T21:14:57.222Z] 21:14:57 INFO - New mozdevice with adb=/builds/worker/workspace/build/android-sdk-linux/platform-tools/adb, device=emulator-5554
Orange job with cp error:
[task 2019-08-14T21:15:10.595Z] 21:15:10 INFO - adb Using adb 1.0.41
[task 2019-08-14T21:15:10.924Z] 21:15:10 INFO - adb adbd running as root
[task 2019-08-14T21:15:11.032Z] 21:15:11 INFO - adb su -c setenforce 0 exitcode 0, stdout: None
[task 2019-08-14T21:15:11.364Z] 21:15:11 INFO - adb su 0 supported
[task 2019-08-14T21:15:11.584Z] 21:15:11 INFO - adb /system/bin/ls -1A supported
[task 2019-08-14T21:15:11.682Z] 21:15:11 INFO - adb Native cp support: False
[task 2019-08-14T21:15:11.791Z] 21:15:11 INFO - adb Native chmod -R support: True
[task 2019-08-14T21:15:11.897Z] 21:15:11 INFO - adb Native chown -R support: True
[task 2019-08-14T21:15:12.005Z] 21:15:12 INFO - adb Setting SELinux Permissive
[task 2019-08-14T21:15:12.228Z] 21:15:12 INFO - adb Native pidof support: True
[task 2019-08-14T21:15:20.137Z] 21:15:20 INFO - STDOUT: timed out waiting for profiles.ini
[task 2019-08-14T21:15:20.685Z] 21:15:20 INFO - adb Granting important runtime permissions to rg.mozilla.geckoview.test
[task 2019-08-14T21:15:23.535Z] 21:15:23 INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity ....
Green job:
[task 2019-08-14T21:20:03.436Z] 21:20:03 INFO - adb Using adb 1.0.41
[task 2019-08-14T21:20:03.755Z] 21:20:03 INFO - adb adbd running as root
[task 2019-08-14T21:20:03.857Z] 21:20:03 INFO - adb su -c setenforce 0 exitcode 0, stdout: None
[task 2019-08-14T21:20:04.961Z] 21:20:04 INFO - adb su 0 supported
[task 2019-08-14T21:20:05.177Z] 21:20:05 INFO - adb /system/bin/ls -1A supported
[task 2019-08-14T21:20:05.278Z] 21:20:05 INFO - adb Native cp support: True
[task 2019-08-14T21:20:05.382Z] 21:20:05 INFO - adb Native chmod -R support: True
[task 2019-08-14T21:20:05.490Z] 21:20:05 INFO - adb Native chown -R support: True
[task 2019-08-14T21:20:05.592Z] 21:20:05 INFO - adb Setting SELinux Permissive
[task 2019-08-14T21:20:05.808Z] 21:20:05 INFO - adb Native pidof support: True
[task 2019-08-14T21:20:13.333Z] 21:20:13 INFO - STDOUT: timed out waiting for profiles.ini
[task 2019-08-14T21:20:13.863Z] 21:20:13 INFO - adb Granting important runtime permissions to org.mozilla.geckoview.test
[task 2019-08-14T21:20:15.199Z] 21:20:15 INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity ....
Assignee | ||
Comment 10•5 years ago
|
||
maja_zf: Thanks! That's awesome! I worked around intermittent ls detection issues on emulators by actually executing the command. Maybe I can do something similar for cp.
Assignee | ||
Comment 11•5 years ago
|
||
Assignee | ||
Comment 12•5 years ago
|
||
Depends on D42861
Assignee | ||
Comment 13•5 years ago
|
||
I moved to detecting cp and pidof using execution instead of type.
try run appears to be good.
Comment 14•5 years ago
|
||
Pushed by bclary@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8abb96374e66 [mozdevice] detect cp via execution, r=gbrown. https://hg.mozilla.org/integration/autoland/rev/5cd486415e71 [mozdevice] detect pidof via execution, r=gbrown.
Comment 15•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8abb96374e66
https://hg.mozilla.org/mozilla-central/rev/5cd486415e71
Updated•5 years ago
|
Updated•4 years ago
|
Description
•