Closed Bug 833765 Opened 13 years ago Closed 12 years ago

Intermittent DMError: Automation Error: DeviceManager isdir returned null | Exception caught while running tests

Categories

(Testing :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla23

People

(Reporter: RyanVM, Assigned: gbrown)

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=19042299&tree=Mozilla-Inbound Android 4.0 Panda mozilla-inbound opt test mochitest-8 on 2013-01-22 23:56:52 PST for push 2d5b54f00e2e slave: panda-0580 INFO | runtests.py | Installing extension at /builds/panda-0580/test/build/tests/mochitest/extensions/roboextender@mozilla.org to /tmp/tmpuhFNpb. INFO | runtests.py | Installing extension at /builds/panda-0580/test/build/tests/mochitest/extensions/specialpowers to /tmp/tmpuhFNpb. INFO | runtests.py | Installing extension at /builds/panda-0580/test/build/tests/mochitest/extensions/workerbootstrap to /tmp/tmpuhFNpb. INFO | runtests.py | Installing extension at /builds/panda-0580/test/build/tests/mochitest/extensions/worker to /tmp/tmpuhFNpb. Traceback (most recent call last): File "mochitest/runtestsremote.py", line 577, in main File "/builds/panda-0580/test/build/tests/mochitest/runtests.py", line 705, in runTests manifest = self.buildProfile(options) File "mochitest/runtestsremote.py", line 299, in buildProfile File "/builds/panda-0580/test/build/tests/mochitest/devicemanagerSUT.py", line 442, in removeDir if self.dirExists(remoteDir): File "/builds/panda-0580/test/build/tests/mochitest/devicemanagerSUT.py", line 398, in dirExists raise DMError('Automation Error: DeviceManager isdir returned null') DMError: Automation Error: DeviceManager isdir returned null Traceback (most recent call last): File "mochitest/runtestsremote.py", line 595, in <module> Automation Error: Exception caught while running tests main() File "mochitest/runtestsremote.py", line 581, in main mochitest.stopWebServer(options) File "mochitest/runtestsremote.py", line 292, in stopWebServer self.server.stop() AttributeError: 'MochiRemote' object has no attribute 'server' program finished with exit code 1 elapsedTime=6.779581 TinderboxPrint: mochitest-plain<br/><em class="testfail">T-FAIL</em> Unknown Error: command finished with exit code: 1
Almost all of the recent failures occur in robocop's testSystemPages and show a reboot. Many logs have: W/Watchdog( 1403): *** WATCHDOG KILLING SYSTEM PROCESS: null just before the reboot.
Also (Comments 18, 21): *** WATCHDOG KILLING SYSTEM PROCESS: com.android.server.PowerManagerService
I am not sure about the cause of the WATCHDOG messages, but.... Notice that in most of these logs, the first thing that goes wrong is not "isdir returned null" but "Timeout in command exec". Here's a try push that shows the same thing happening during a special long-running test with devicemanagerSUT debug level 4: https://tbpl.mozilla.org/?tree=Try&rev=f4960773a683. Here we see that we issue a sut exec call and wait for response: sent cmd: exec "MOZ_CRASHREPORTER=1,XPCOM_DEBUG_BREAK=stack,MOZ_HIDE_RESULTS_TABLE=1,MOZ_CRASHREPORTER_NO_REPORT=1,NO_EM_RESTART=1,MOZ_PROCESS_LOG=/tmp/tmp0AILfZpidlog,XPCOM_MEM_BLOAT_LOG=/tmp/tmpDYoKnO/runtests_leaks.log" am instrument -w -e deviceroot /mnt/sdcard/tests -e class org.mozilla.fennec.tests.testLong2 org.mozilla.roboexample.test/org.mozilla.fennec.FennecInstrumentationTestRunner recv'ing... ... (about 300 lines) recv'ing... recv'ing... INFO | runtests.py | Received unexpected exception while running application Traceback (most recent call last): File "/builds/panda-0559/test/build/tests/mochitest/runtests.py", line 768, in runTests onLaunch=onLaunch) File "/builds/panda-0559/test/build/tests/mochitest/automation.py", line 1207, in runApp stderr = subprocess.STDOUT) File "/builds/panda-0559/test/build/tests/mochitest/remoteautomation.py", line 175, in Process return self.RProcess(self._devicemanager, cmd, stdout, stderr, env, cwd) File "/builds/panda-0559/test/build/tests/mochitest/remoteautomation.py", line 185, in __init__ self.proc = dm.launchProcess(cmd, stdout, cwd, env, True) File "/builds/panda-0559/test/build/tests/mochitest/devicemanagerSUT.py", line 519, in launchProcess self.fireProcess(cmdline, failIfRunning) File "/builds/panda-0559/test/build/tests/mochitest/devicemanagerSUT.py", line 472, in fireProcess self._runCmds([{ 'cmd': 'exec ' + appname }]) File "/builds/panda-0559/test/build/tests/mochitest/devicemanagerSUT.py", line 151, in _runCmds self._sendCmds(cmdlist, outputfile, timeout, retryLimit=retryLimit) File "/builds/panda-0559/test/build/tests/mochitest/devicemanagerSUT.py", line 132, in _sendCmds raise err DMError: Automation Error: Timeout in command exec This does not always fail this way. In particular, it usually succeeds on tegras, and often fails on pandas. devicemanagerSUT is waiting for the $> prompt response for 300 seconds. sutagent is also waiting 300 s for program completion: https://hg.mozilla.org/mozilla-central/file/40a228f74389/build/mobile/sutagent/android/DoCommand.java#l3734 I'm thinking that increasing the devicemanagerSUT timeout by a few seconds will help us on long-running (more than 5 minutes) tests.
Assignee: nobody → gbrown
Special long-running test, before: https://tbpl.mozilla.org/?tree=Try&rev=f4960773a683 And with devicemanagerSUT timeout bumped up from 300 to 320 seconds: https://tbpl.mozilla.org/?tree=Try&rev=165cb260ff89 And here we check it against the full set of tests: https://tbpl.mozilla.org/?tree=Try&rev=995bc18baa6d There are still panda rc failures here, but no "isdir returned null" or "Timeout in command exec" (so far, but I'll retry some more).
Why 320? Since sutagent is waiting in 10s increments, I want an extra 10s in case there is an extra increment somehow. And then I added another 10s for network delay, etc. It's more than needed, but still only a 7% increase.
Attachment #734077 - Flags: review?(jmaher)
Attachment #734077 - Flags: feedback?(wlachance)
Comment on attachment 734077 [details] [diff] [review] increase dmSUT default timeout from 300 to 320 s Review of attachment 734077 [details] [diff] [review]: ----------------------------------------------------------------- this is only a robocop issue as mochitest/talos/reftest all fire the process and then poll for the process. Robocop (due to the instrumentation interface) does not return from the fireCommand until it is completely done. So this patch makes sense. Could we instead do this inside of runtestsremote.py: if robocop: dm.default_timeout = 320 If we could go that route I would be happier. If it doesn't work, I am fine with this patch. r+ as the solution is valid.
Attachment #734077 - Flags: review?(jmaher) → review+
As suggested, here is a patch that limits the change to robocop tests. It seems to work fine. I am a little concerned that there might be some way to hit the 300 s timeout during other test types, but I think either approach is acceptable.
Attachment #734702 - Flags: review?(jmaher)
Attachment #734702 - Flags: feedback?(wlachance)
Comment on attachment 734702 [details] [diff] [review] increase dmSUT default timeout during robocop tests only Review of attachment 734702 [details] [diff] [review]: ----------------------------------------------------------------- I like this better, something to keep in mind for other harnesses, but I would rather hack it from the harness level than DM.
Attachment #734702 - Flags: review?(jmaher) → review+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Comment on attachment 734077 [details] [diff] [review] increase dmSUT default timeout from 300 to 320 s Makes sense
Attachment #734077 - Flags: feedback?(wlachance) → feedback+
Comment on attachment 734702 [details] [diff] [review] increase dmSUT default timeout during robocop tests only Makes sense
Attachment #734702 - Flags: feedback?(wlachance) → feedback+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: