Closed Bug 1573555 Opened 5 years ago Closed 5 years ago

ADBError: cp: can't stat '/data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini': No such file or directory

Categories

(Testing :: web-platform-tests, defect)

defect
Not set
normal

Tracking

(firefox70 fixed)

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: bc, Assigned: bc)

References

Details

Attachments

(2 files, 2 obsolete files)

We are still getting excessive retries on Anroid emulators due to errors running web-platform-tests. Examples are:

wpt9 retry

task 2019-08-12T22:59:22.952Z] 22:59:22 CRITICAL - Failure during init Traceback (most recent call last):
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 222, in init
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -     self.browser.start(group_metadata=group_metadata, **self.browser_settings)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox_android.py", line 173, in start
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -     interactive=self.debug_info and self.debug_info.interactive)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/base/device.py", line 66, in start
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -     self.device.setup_profile(self.profile)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 121, in setup_profile
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -     self.backup_file(self.app_ctx.remote_profiles_ini)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 175, in backup_file
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -     self.device.cp(remote_path, '%s.orig' % remote_path, recursive=True)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 2697, in cp
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL -     source)
[task 2019-08-12T22:59:22.953Z] 22:59:22 CRITICAL - ADBError: cp: can't stat '/data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini': No such file or directory

The log tells us:

[task 2019-08-08T17:10:06.318Z] 17:10:06     INFO - Using adb 1.0.41
[task 2019-08-08T17:10:08.149Z] 17:10:08     INFO - adbd running as root
[task 2019-08-08T17:10:08.149Z] 17:10:08     INFO - adbd restarted as root
[task 2019-08-08T17:10:08.257Z] 17:10:08     INFO - su -c setenforce 0 exitcode 0, stdout: None
[task 2019-08-08T17:10:08.573Z] 17:10:08     INFO - su 0 supported
[task 2019-08-08T17:10:08.783Z] 17:10:08     INFO - /system/bin/ls -1A supported
[task 2019-08-08T17:10:08.887Z] 17:10:08     INFO - Native cp support: True
[task 2019-08-08T17:10:08.994Z] 17:10:08     INFO - Native chmod -R support: True
[task 2019-08-08T17:10:09.101Z] 17:10:09     INFO - Native chown -R support: True
[task 2019-08-08T17:10:09.207Z] 17:10:09     INFO - Setting SELinux Permissive
[task 2019-08-08T17:10:09.418Z] 17:10:09     INFO - Native pidof support: True

However the location of this raise ADBError

    if not self.exists(source, timeout=timeout, root=root):
        raise ADBError("cp: can't stat '%s': No such file or directory" %
                       source)

should never have been executed from what I can tell since it occurs after where the support for native cp is checked

        if self._have_cp:
            r = '-R' if recursive else ''
            self.shell_output('cp %s %s %s' % (r, source, destination),
                              timeout=timeout, root=root)
            self._sync(timeout=timeout)
            return

wpt9 retry shows a similar error

[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - Traceback (most recent call last):
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 222, in init
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -     self.browser.start(group_metadata=group_metadata, **self.browser_settings)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox_android.py", line 173, in start
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -     interactive=self.debug_info and self.debug_info.interactive)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/base/device.py", line 66, in start
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -     self.device.setup_profile(self.profile)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 121, in setup_profile
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -     self.backup_file(self.app_ctx.remote_profiles_ini)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozrunner/devices/base.py", line 175, in backup_file
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -     self.device.cp(remote_path, '%s.orig' % remote_path, recursive=True)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 2710, in cp
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL -     raise ADBError('cp: %s: Not a directory' % destination)
[task 2019-08-12T22:11:13.730Z] 22:11:13 CRITICAL - ADBError: cp: /data/data/org.mozilla.geckoview.test/files/mozilla/profiles.ini.orig: Not a directory

This seems to me to be an issue with either how mozrunner is using the device or with how web-platform-tests are using mozrunner.

In addition, I have noticed that even though these errors cause the job to be retried, the test run continues to run to completion rather than exiting early. I would think that if a retry error were found, we would terminate and not waste the cycles of continuing to run the other tests.

whimboo, jgraham: Can you take a look at this?

There are two problems here: (1) the intermittent errors encountered when using cp on the device, and (2) the fact that tests continue to run when we've concluded the job status is "retry".

Maybe (1) isn't a major problem?

For (2), it's not just wpt. Here's an example of mochitests continuing to run after we conclude the status should be retry: ADBProcessError happens early on, but browser is restarted and tests continue to run for 20 more minutes.

I think the correct thing for wpt at least is to stop setting the retry status as soon as we see an adb-related error, which is what StructuredOutputParser does by default in the wpt mozharness script. It would make more sense to defer to wptrunner, which attempts to restart the browser, and set a retry status when it logs "Max restarts exceeded".

Here are some try pushes that do just that:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=35407d0ca6a1df0859cebc368f029f7ff4d26f71
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=8b1afb8d427e24a38015661f8512972ddc0fc61a

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #2)

Maybe (1) isn't a major problem?

It's not that huge a deal but it concerns me that a piece of code which should not be executing when Native cp support: True is actually executing. I was thinking about it and maybe some stale/uninitialized instance of ADBDevice is being used somehow. If you don't have time to look into it I'll see what I can do.

For (2),

I'm not familiar with how wpt works and don't feel comfortable reviewing those changes, but I don't think this will improve things over the current situation.

ADBTimeoutErrors are special in that once it occurs with a device the connection to the device is essentially lost and any further attempts to access it will also time out. The default time out period is 5 minutes so depending on the attempts to access the device after the first ADBTimeoutError occurs you may end up waiting several multiples of 5 minutes for the test to end and if you are automatically restarting the test this could be a very long time indeed.

I see your try push is using the previous mozdevice 3.0.4 version which does not include the latest work arounds for emulator intermittent failures. To get a clearer picture of the current situation, can we do a try push for all web-platform-tests on android emulators from the tip of a recent pull and another with your patch applied so that we can get a clear before and after picture?

I'm also concerned/mystified by the cp behaviour. I have to prioritize other work now, but if you share your findings maybe I can pick it up later.

Good point about ADBTimeoutErrors! What about ADBError and ADBProcessError? Those don't seem to be as special; I see many examples of the harness recovering from those.

ADBError and ADBProcessError mean that a command exited with a non-zero exitcode on the device. It may be an intermittent like some of the emulator issues I've been chasing down or could be a more general logic issue. Typically we catch and handle expected cases and only let the real failures to propagate to the log. Hard to generalize them actually.

(In reply to Bob Clary [:bc:] from comment #4)

I see your try push is using the previous mozdevice 3.0.4 version which does not include the latest work arounds for emulator intermittent failures. To get a clearer picture of the current situation, can we do a try push for all web-platform-tests on android emulators from the tip of a recent pull

Oooh, those mozdevice changes look promising...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=416007270737c4405fe0a68129ae303f2b22c0e3

and another with your patch applied so that we can get a clear before and after picture?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f69eec9abe3abb83369930eb2ab2441bee07ffbf

Attachment #9085539 - Attachment is obsolete: true
Attachment #9085484 - Attachment description: Bug 1573555 - Retry web-platform-tests job when max restarts exceeded; r=bc → Bug 1573555 - Retry web-platform-tests job when max restarts exceeded; r=jgraham

In any wpt job, I see that an ADBDevice is instantiated twice: first by mozharness in AndroidMixin at the verify-device step, then by wptrunner (call to runtests.py) at browser start-up, which uses FennecContext with device serial "emulator-5554".

Looking at the orange Wr3 with the cp error versus any other green wpt job, I can see that the second ADBDevice init reports that we do not have native cp. This is intermittent, yay.

From AndroidMixin - every wpt job:

task 2019-08-14T21:14:53.606Z] 21:14:53     INFO - Running main action method: verify_device
[task 2019-08-14T21:14:53.606Z] 21:14:53     INFO - >> Check emulator: Attempt #1 of 5
[task 2019-08-14T21:14:53.606Z] 21:14:53     INFO - >> Verify Android boot completed: Attempt #1 of 30
[task 2019-08-14T21:14:53.676Z] 21:14:53     INFO - Using adb 1.0.41
[task 2019-08-14T21:14:55.527Z] 21:14:55     INFO - adbd running as root
[task 2019-08-14T21:14:55.527Z] 21:14:55     INFO - adbd restarted as root
[task 2019-08-14T21:14:55.633Z] 21:14:55     INFO - su -c setenforce 0 exitcode 0, stdout: None
[task 2019-08-14T21:14:55.952Z] 21:14:55     INFO - su 0 supported
[task 2019-08-14T21:14:56.167Z] 21:14:56     INFO - /system/bin/ls -1A supported
[task 2019-08-14T21:14:56.273Z] 21:14:56     INFO - Native cp support: True
[task 2019-08-14T21:14:56.380Z] 21:14:56     INFO - Native chmod -R support: True
[task 2019-08-14T21:14:56.488Z] 21:14:56     INFO - Native chown -R support: True
[task 2019-08-14T21:14:56.591Z] 21:14:56     INFO - Setting SELinux Permissive
[task 2019-08-14T21:14:56.801Z] 21:14:56     INFO - Native pidof support: True
[task 2019-08-14T21:14:57.222Z] 21:14:57     INFO - New mozdevice with adb=/builds/worker/workspace/build/android-sdk-linux/platform-tools/adb, device=emulator-5554

Orange job with cp error:

[task 2019-08-14T21:15:10.595Z] 21:15:10     INFO - adb Using adb 1.0.41
[task 2019-08-14T21:15:10.924Z] 21:15:10     INFO - adb adbd running as root
[task 2019-08-14T21:15:11.032Z] 21:15:11     INFO - adb su -c setenforce 0 exitcode 0, stdout: None
[task 2019-08-14T21:15:11.364Z] 21:15:11     INFO - adb su 0 supported
[task 2019-08-14T21:15:11.584Z] 21:15:11     INFO - adb /system/bin/ls -1A supported
[task 2019-08-14T21:15:11.682Z] 21:15:11     INFO - adb Native cp support: False
[task 2019-08-14T21:15:11.791Z] 21:15:11     INFO - adb Native chmod -R support: True
[task 2019-08-14T21:15:11.897Z] 21:15:11     INFO - adb Native chown -R support: True
[task 2019-08-14T21:15:12.005Z] 21:15:12     INFO - adb Setting SELinux Permissive
[task 2019-08-14T21:15:12.228Z] 21:15:12     INFO - adb Native pidof support: True
[task 2019-08-14T21:15:20.137Z] 21:15:20     INFO - STDOUT: timed out waiting for profiles.ini
[task 2019-08-14T21:15:20.685Z] 21:15:20     INFO - adb Granting important runtime permissions to rg.mozilla.geckoview.test
[task 2019-08-14T21:15:23.535Z] 21:15:23     INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity ....

Green job:

[task 2019-08-14T21:20:03.436Z] 21:20:03     INFO - adb Using adb 1.0.41
[task 2019-08-14T21:20:03.755Z] 21:20:03     INFO - adb adbd running as root
[task 2019-08-14T21:20:03.857Z] 21:20:03     INFO - adb su -c setenforce 0 exitcode 0, stdout: None
[task 2019-08-14T21:20:04.961Z] 21:20:04     INFO - adb su 0 supported
[task 2019-08-14T21:20:05.177Z] 21:20:05     INFO - adb /system/bin/ls -1A supported
[task 2019-08-14T21:20:05.278Z] 21:20:05     INFO - adb Native cp support: True
[task 2019-08-14T21:20:05.382Z] 21:20:05     INFO - adb Native chmod -R support: True
[task 2019-08-14T21:20:05.490Z] 21:20:05     INFO - adb Native chown -R support: True
[task 2019-08-14T21:20:05.592Z] 21:20:05     INFO - adb Setting SELinux Permissive
[task 2019-08-14T21:20:05.808Z] 21:20:05     INFO - adb Native pidof support: True
[task 2019-08-14T21:20:13.333Z] 21:20:13     INFO - STDOUT: timed out waiting for profiles.ini
[task 2019-08-14T21:20:13.863Z] 21:20:13     INFO - adb Granting important runtime permissions to org.mozilla.geckoview.test
[task 2019-08-14T21:20:15.199Z] 21:20:15     INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity ....

maja_zf: Thanks! That's awesome! I worked around intermittent ls detection issues on emulators by actually executing the command. Maybe I can do something similar for cp.

Flags: needinfo?(bob)

I moved to detecting cp and pidof using execution instead of type.

try run appears to be good.

Flags: needinfo?(bob)
Pushed by bclary@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8abb96374e66
[mozdevice] detect cp via execution, r=gbrown.
https://hg.mozilla.org/integration/autoland/rev/5cd486415e71
[mozdevice] detect pidof via execution, r=gbrown.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Assignee: nobody → bob
Attachment #9085484 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: