Closed Bug 1554375 Opened 5 years ago Closed 5 years ago

Intermittent mozdevice.adb.ADBTimeoutError: args: adb wait-for-device shell rm -r /sdcard/raptor; echo adb_returncode=$?, exitcode: None, stdout:

Categories

(Testing :: Raptor, defect, P5)

Version 3
defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1500266

People

(Reporter: intermittent-bug-filer, Assigned: aerickson)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell disable-recommended])

Filed by: btara [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=248324602&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/PLPsk5rkSZakM4X3u8VLRg/runs/0/artifacts/public/logs/live_backing.log


07:07:43 INFO - Activity: org.mozilla.geckoview_example/.GeckoViewActivity
07:07:43 INFO - ThisTime: 347
07:07:43 INFO - TotalTime: 347
07:07:43 INFO - WaitTime: 352
07:07:43 INFO - Complete
07:07:43 INFO - adb shell_output: adb -s HT83K1A02572 wait-for-device shell pidof org.mozilla.geckoview_example; echo adb_returncode=$?, timeout: None, root: False, timedout: None, exitcode: 0, output: 12142
07:07:44 INFO - raptor-control-server received webext_status: raptor runner.js is loaded!
07:07:44 INFO - raptor-control-server reading test settings from raptor-tp6m-instagram-geckoview-cold.json
07:07:44 INFO - raptor-control-server sent test settings to web ext runner
07:07:44 INFO - raptor-control-server received webext_status: * pausing 30 seconds to let browser settle... *
07:09:19 INFO - raptor-main application timed out after 100 seconds
07:14:19 INFO - raptor-main removing reverse socket connections
07:19:19 INFO - raptor-main removing test folder for raptor: /sdcard/raptor
07:24:19 ERROR - Traceback (most recent call last):
07:24:19 INFO - File "/builds/task_1558767571/workspace/build/tests/raptor/raptor/raptor.py", line 1230, in <module>
07:24:19 INFO - main()
07:24:19 INFO - File "/builds/task_1558767571/workspace/build/tests/raptor/raptor/raptor.py", line 1198, in main
07:24:19 INFO - success = raptor.run_tests(raptor_test_list, raptor_test_names)
07:24:19 INFO - File "/builds/task_1558767571/workspace/build/tests/raptor/raptor/raptor.py", line 960, in run_tests
07:24:19 INFO - return super(RaptorAndroid, self).run_tests(tests, test_names)
07:24:19 INFO - File "/builds/task_1558767571/workspace/build/tests/raptor/raptor/raptor.py", line 191, in run_tests
07:24:19 INFO - self.clean_up()
07:24:19 INFO - File "/builds/task_1558767571/workspace/build/tests/raptor/raptor/raptor.py", line 1141, in clean_up
07:24:19 INFO - self.device.rm(self.remote_test_root, force=True, recursive=True)
07:24:19 INFO - File "/builds/task_1558767571/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 2306, in rm
07:24:19 INFO - self.shell_output("%s %s" % (cmd, path), timeout=timeout, root=root)
07:24:19 INFO - File "/builds/task_1558767571/workspace/build/venv/lib/python2.7/site-packages/mozdevice/adb.py", line 1477, in shell_output
07:24:19 INFO - raise ADBTimeoutError("%s" % adb_process)
07:24:19 INFO - mozdevice.adb.ADBTimeoutError: args: adb wait-for-device shell rm -r /sdcard/raptor; echo adb_returncode=$?, exitcode: None, stdout:
07:24:19 ERROR - Return code: 1
07:24:19 WARNING - setting return code to 1
07:24:19 INFO - Killing logcat pid 469.
07:24:19 CRITICAL - PERFHERDER_DATA was seen 0 times, expected 1.
07:24:19 INFO - copying raptor results to upload dir:
07:24:19 INFO - /builds/task_1558767571/workspace/build/blobber_upload_dir/perfherder-data.json
07:24:19 INFO - copying raptor results from /builds/task_1558767571/workspace/build/raptor.json to /builds/task_1558767571/workspace/build/blobber_upload_dir/perfherder-data.json
07:24:19 CRITICAL - Error copying results /builds/task_1558767571/workspace/build/raptor.json to upload dir /builds/task_1558767571/workspace/build/blobber_upload_dir/perfherder-data.json
07:24:19 INFO - [Errno 2] No such file or directory: u'/builds/task_1558767571/workspace/build/raptor.json'
07:24:19 INFO - Running post-action listener: _package_coverage_data
07:24:19 INFO - Running post-action listener: _resource_record_post_action
07:24:19 INFO - Running post-action listener: process_java_coverage_data
07:24:19 INFO - Running post-action listener: stop_device

davehunt> rwood: looks like bug 1554375 is mostly affecting two pixel devices (21, 22) I wonder if there's an issue with them? looks like we're timing out trying to remove a path on the device's storage

:bc maybe you have an idea on why do we see this intermittent on these 2 devices

Flags: needinfo?(bob)

I've quarantined them and will ask bitbar to re-image them.

Flags: needinfo?(bob) → needinfo?(aerickson)
Assignee: nobody → aerickson
Status: NEW → ASSIGNED
Flags: needinfo?(aerickson)

Bitbar has reimaged pixel2-21 and pixel2-22. I've removed them from quarantine. I'll watch and see how they behave.

2-21 has failed with the same error.

from https://tools.taskcluster.net/groups/I_bHvmJ0Qv-bHgMFwAsy-A/tasks/XyvekdIUStyDgs4qLb63qw/runs/0:

21:52:57 INFO - mozdevice.adb.ADBTimeoutError: args: adb wait-for-device shell rm -r /sdcard/raptor; echo adb_returncode=$?, exitcode: None, stdout:

I've put it back in quarantine.

Update: 2-21 is still quarantined. Pixel 2-22 has had 5 successful jobs (and a bunch of exceptions due to superseding jobs - not an indicator of being bad) since removed from quarantine this afternoon.

Bitbar is going to try replacing the USB hub that power 2-21 and 2-22 (pixel2-17 to pixel2-22 and a FireTV stick are on the same cluster/hub). pixel2-21 being in quarantine could lower the power draw enough to make it work fine if it is a bad port.

Pixel2-22 has had 3 failures since being back in service. It also rebooted and didn't come back a few hours ago. Putting it in quarantine until Bitbar replaces the hub and we can test again.

Bitbar has replaced the hub. Removing both devices from quarantine.

I will watch them for a few hours and decide if we should leave them enabled or not.

If they're not performing well, we're going to replace the devices.

Both devices have been running for 3 hours and haven't had any errors. Changing the hub seems to have helped.

Will leave them running and re-evaluate their status on Monday.

There weren't any failures of this type over the weekend.

The devices have continued to operate well. Closing.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

The recent round of errors seem to be focused on motog5-06 and motog5-07. I've quarantined them and will talk with Bitbar.

Status: RESOLVED → REOPENED
Flags: needinfo?(aerickson)
Resolution: FIXED → ---

Bitbar has replaced the hub used by motog5-06 and 07. Putting the hosts back online. Will monitor them.

motog5-06 and 07 have been good since being removed from quarantine.

:bc has quarantined pixel2-26, 27, and 28. Bitbar is replacing the hub.

It seems like a bad hub can also cause errors like in https://bugzilla.mozilla.org/show_bug.cgi?id=1516985. The three devices above also had a few of those errors.

These failures were all correctly classified and they were all ADBTimeoutErrors

 10 pixel2-27
  7 pixel2-28
  6 pixel2-26
  5 motog5-09
  4 motog5-07
  3 motog5-10
  2 pixel2-25
  2 motog5-06
  1 pixel2-58
  1 pixel2-57
  1 pixel2-56
  1 pixel2-55
  1 pixel2-41
  1 pixel2-24
  1 pixel2-23
  1 motog5-13
  1 motog5-04

I'll follow up with bitbar later today.

Flags: needinfo?(bob)

I've determined pixel2-14 is having problems and have raised the issue with bitbar.

Flags: needinfo?(bob)
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.