Closed Bug 1172174 Opened 9 years ago Closed 9 years ago

Android 4.3 DMError: Timeout exceeded for shell call


(Testing :: General, defect)

Not set


(firefox41 fixed)

Tracking Status
firefox41 --- fixed


(Reporter: gbrown, Assigned: gbrown)




(1 file)

Bug 1051908 collects intermittent Android failures with "2400 seconds without output running <mozharness script>". There are several different scenarios causing those failures. Many of the recent failures show "DMError: Timeout exceeded for shell call" just before the 2400 second timeout.

repository: mozilla-inbound
start_time: 2015-06-05T08:40:15
who: rvandermeulen[at]mozilla[dot]com
machine: tst-linux64-spot-025
buildname: Android 4.3 armv7 API 11+ mozilla-inbound opt test mochitest-13
revision: ab595285aa97

DMError: Timeout exceeded for shell call
command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/', '--cfg', 'android/', '--test-suite', 'mochitest-13', '--blob-upload-branch', 'mozilla-inbound', '--download-symbols', 'ondemand'], attempting to kill
From the log in comment 0...

09:03:39     INFO -  269 INFO TEST-START | dom/workers/test/test_errorPropagation.html
09:03:39     INFO -  270 INFO TEST-OK | dom/workers/test/test_errorPropagation.html | took 6271ms
09:03:39     INFO -  271 INFO TEST-START | dom/workers/test/test_errorwarning.html
09:09:43     INFO -  Exception in thread ProcessReader:
09:09:43     INFO -  Traceback (most recent call last):
09:09:43     INFO -    File "/usr/lib/python2.7/", line 551, in __bootstrap_inner
09:09:43     INFO -
09:09:43     INFO -    File "/usr/lib/python2.7/", line 504, in run
09:09:43     INFO -      self.__target(*self.__args, **self.__kwargs)
09:09:43     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/mozprocess/", line 929, in _read
09:09:43     INFO -      self.timeout_callback()
09:09:43     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/mozprocess/", line 840, in __call__
09:09:43     INFO -      e(*args, **kwargs)
09:09:43     INFO -    File "/builds/slave/test/build/tests/mochitest/", line 138, in _raise
09:09:43     INFO -      raise DMError("Timeout exceeded for shell call")
09:09:43     INFO -  DMError: Timeout exceeded for shell call

command timed out: 2400 seconds without output running ['/tools/buildbot/bin/python', 'scripts/scripts/', '--cfg', 'android/', '--test-suite', 'mochitest-13', '--blob-upload-branch', 'mozilla-inbound', '--download-symbols', 'ondemand'], attempting to kill
process killed by signal 9
program finished with exit code -1
========= Finished '/tools/buildbot/bin/python scripts/scripts/ ...' failed (results: 2, elapsed: 1 hrs, 9 mins, 8 secs) (at 2015-06-05 09:49:43.243137) =========
Blocks: 1026710 is running the mochitest harness It looks like is using adb to periodically poll the device to retrieve logs and check whether the fennec activity is still running. One of those adb commands times out, calling:

Note that the DMError is raised in the ProcessReader thread. I don't think that is intended. I suspect that kills a ProcessHandler thread and causes the call to proc.wait() to hang, subsequently causing the 2400 second buildbot timeout - yuck!

Instead of raising in _raise, perhaps we could just log a warning?

Armen - thoughts?
Flags: needinfo?(armenzg)
That works for me!
My apologies if this has been causing so much pain.
Flags: needinfo?(armenzg)
I don't think this is really causing pain...this is more about improving our logging when adb hangs.

I changed the raise to an error log; I used error over warning mostly because warnings are not logged by default, and I think timeouts are a fairly serious condition.

Testing locally, I forced a timeout by hijacking adb calls:

mozdevice ERROR | Timeout exceeded for shell call 'adb shell ps; echo $?'
Traceback (most recent call last):
  File "/home/gbrown/src/", line 4, in <module>
  File "/home/gbrown/src/testing/mozbase/mozdevice/mozdevice/", line 399, in shellCheckOutput
    raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
mozdevice.devicemanager.DMError: Non-zero return code for command: ['ps'] (output: '', retval: 'None')

I liked this result, so applied the same logic to _checkCmd().

_runCmd() might profit from the same sort of logging, but that's more complicated -- I filed bug 1173120 instead.
Attachment #8617616 - Flags: review?(armenzg)
Attachment #8617616 - Flags: review?(armenzg) → review+
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
See Also: → 1173120
You need to log in before you can comment on or make changes to this bug.