Closed Bug 1175540 Opened 5 years ago Closed 5 years ago

Android 4.3 tests take too long to cleanup when adb is hung

Categories

(Testing :: General, defect)

defect
Not set

Tracking

(firefox41 fixed)

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: gbrown, Assigned: gbrown)

Details

Attachments

(1 file)

If adb is non-responsive, devicemanagerADB calls will timeout, generally after 5 minutes, but between retries, attempts to retrieve diagnostic information from the device, and cleanup operations, many further adb calls will still be attempted; if adb remains non-responsive, it can take over an hour to end the test run.

http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-api-11/1434524689/mozilla-inbound_ubuntu64_vm_armv7_large_test-crashtest-1-bm113-tests1-linux64-build107.txt.gz

01:17:37     INFO -  REFTEST TEST-START | http://10.0.2.2:8854/tests/layout/base/crashtests/479360-1.xhtml
01:17:37     INFO -  REFTEST TEST-LOAD | http://10.0.2.2:8854/tests/layout/base/crashtests/479360-1.xhtml | 1275 / 1429 (89%)
01:23:18     INFO -  mozdevice ERROR | Timeout exceeded for shell call 'adb shell dumpsys window windows; echo $?'
01:28:18     INFO -  mozdevice ERROR | Timeout exceeded for _runCmd call 'adb shell ls -a /sdcard/tests/reftest/reftest.log'
01:33:18     INFO -  mozdevice ERROR | Timeout exceeded for shell call 'adb shell dumpsys window windows; echo $?'
01:33:18     INFO -  INFO | automation.py | Application ran for: 0:42:46.723296
01:33:18     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpb9cTkNpidlog
01:38:18     INFO -  mozdevice ERROR | Timeout exceeded for _runCmd call 'adb shell ls -a /data/anr/traces.txt'
01:38:18     INFO -  /data/anr/traces.txt not found
01:43:18     INFO -  mozdevice ERROR | Timeout exceeded for _runCmd call 'adb shell ls -a /data/tombstones/'
01:48:18     INFO -  mozdevice ERROR | Timeout exceeded for shell call 'adb shell chmod 777 /data/tombstones; echo $?'
01:53:18     INFO -  mozdevice ERROR | Timeout exceeded for shell call 'adb shell rm -r /data/tombstones; echo $?'
01:58:18     INFO -  mozdevice ERROR | Timeout exceeded for shell call 'adb shell /system/bin/logcat -v time -d dalvikvm:I ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S NetworkStateTracker:S; echo $?'
01:58:18     INFO -  getLogcat threw DMError; re-trying just once...
02:03:19     INFO -  mozdevice ERROR | Timeout exceeded for shell call 'adb shell /system/bin/logcat -v time -d dalvikvm:I ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S NetworkStateTracker:S; echo $?'
02:08:19     INFO -  mozdevice ERROR | Timeout exceeded for _runCmd call 'adb shell ls -a /sdcard/tests/reftest/reftest.log'
02:08:19     INFO -  WARNING: Unable to retrieve log file (/sdcard/tests/reftest/reftest.log) from remote device
02:13:20     INFO -  mozdevice ERROR | Timeout exceeded for _runCmd call 'adb shell ls -a /sdcard/tests/reftest/profile/'
02:18:20     INFO -  mozdevice ERROR | Timeout exceeded for _checkCmd call 'adb shell rm -r /sdcard/tests/reftest/profile'
02:23:20     INFO -  mozdevice ERROR | Timeout exceeded for _runCmd call 'adb shell ls -a /sdcard/tests/reftest/'
02:28:20     INFO -  mozdevice ERROR | Timeout exceeded for _checkCmd call 'adb shell rm -r /sdcard/tests/reftest'
02:28:20     INFO -  Automation Error: Exception caught while running tests
02:28:20     INFO -  Traceback (most recent call last):
02:28:20     INFO -    File "/builds/slave/test/build/tests/reftest/remotereftest.py", line 551, in main
02:28:20     INFO -      retVal = reftest.runTests(manifest, options, cmdlineArgs)
02:28:20     INFO -    File "/builds/slave/test/build/tests/reftest/runreftest.py", line 334, in runTests
02:28:20     INFO -      return self.runSerialTests(testPath, options, cmdlineArgs)
02:28:20     INFO -    File "/builds/slave/test/build/tests/reftest/runreftest.py", line 632, in runSerialTests
02:28:20     INFO -      debuggerInfo=debuggerInfo)
02:28:20     INFO -    File "/builds/slave/test/build/tests/reftest/remotereftest.py", line 449, in runApp
02:28:20     INFO -      timeout=timeout)
02:28:20     INFO -    File "/builds/slave/test/build/tests/reftest/automation.py", line 844, in runApp
02:28:20     INFO -      crashed = self.checkForCrashes(os.path.join(profileDir, "minidumps"), symbolsPath)
02:28:20     INFO -    File "/builds/slave/test/build/tests/reftest/remoteautomation.py", line 200, in checkForCrashes
02:28:20     INFO -      logcat = self._devicemanager.getLogcat(filterOutRegexps=fennecLogcatFilters)
02:28:20     INFO -    File "/builds/slave/test/build/tests/reftest/devicemanager.py", line 160, in getLogcat
02:28:20     INFO -      root=self._logcatNeedsRoot)
02:28:20     INFO -    File "/builds/slave/test/build/tests/reftest/devicemanager.py", line 399, in shellCheckOutput
02:28:20     INFO -      raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
02:28:20     INFO -  DMError: Non-zero return code for command: ['/system/bin/logcat', '-v', 'time', '-d', 'dalvikvm:I', 'ConnectivityService:S', 'WifiMonitor:S', 'WifiStateTracker:S', 'wpa_supplicant:S', 'NetworkStateTracker:S'] (output: '', retval: 'None')
02:33:20     INFO -  mozdevice ERROR | Timeout exceeded for shell call 'adb shell /system/bin/logcat -v time -d dalvikvm:I ConnectivityService:S WifiMonitor:S WifiStateTracker:S wpa_supplicant:S NetworkStateTracker:S; echo $?'
02:33:20     INFO -  WARNING: Error getting device information
02:33:20    ERROR - Return code: 1
02:33:20    ERROR - No tests run or test summary not found
Ideas:
 - 5 minutes may be too long to wait for some operations
 - can we recover via adb kill-server/start-server?
 - skip some shutdown operations on timeout/error?
 - abandon shutdown/cleanup if it takes longer than x minutes?
 - abandon adb calls if time-outs persist?
Attached patch reduce timeoutsSplinter Review
This is a long patch, but straight-forward: I am reducing the timeout used for most "simple" adb calls from 300 seconds to 30 seconds. "Simple" commands are ones like "rm" which usually return in less than 1 second; if we cannot remove a file in 30 seconds, it is a good bet that something is amiss! The default 300 second timeout is left in place for potentially longer operations like directory pushes, retrieving logcat, etc. By my calculations, this should allow tests to fail at least 30 minutes faster when adb is hung.

I have not modified devicemanagerSUT commands since I have never seen repeated time-outs on SUT: If the device or network goes down, we detect an error immediately and raise. A few SUT commands are affected because remoteautomation makes some shellCheckOutput() calls that are dm-agnostic.

I also removed a logcat retry: I added that recently but have come to the conclusion that it is useless.

I am considering other changes for this bug but I would like to get this landed now to observe its effect before proceeding.

I have been unsuccessful in reproducing time-outs with this patch on try (a good thing, overall!):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dba1cccef3c4
Attachment #8624713 - Flags: review?(mcote)
Comment on attachment 8624713 [details] [diff] [review]
reduce timeouts

lgtm!
Attachment #8624713 - Flags: review?(mcote) → review+
https://hg.mozilla.org/mozilla-central/rev/2505945b9d43
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
You need to log in before you can comment on or make changes to this bug.