Closed
Bug 833765
Opened 13 years ago
Closed 12 years ago
Intermittent DMError: Automation Error: DeviceManager isdir returned null | Exception caught while running tests
Categories
(Testing :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla23
People
(Reporter: RyanVM, Assigned: gbrown)
Details
(Keywords: intermittent-failure)
Attachments
(2 files)
|
1.15 KB,
patch
|
jmaher
:
review+
wlach
:
feedback+
|
Details | Diff | Splinter Review |
|
990 bytes,
patch
|
jmaher
:
review+
wlach
:
feedback+
|
Details | Diff | Splinter Review |
https://tbpl.mozilla.org/php/getParsedLog.php?id=19042299&tree=Mozilla-Inbound
Android 4.0 Panda mozilla-inbound opt test mochitest-8 on 2013-01-22 23:56:52 PST for push 2d5b54f00e2e
slave: panda-0580
INFO | runtests.py | Installing extension at /builds/panda-0580/test/build/tests/mochitest/extensions/roboextender@mozilla.org to /tmp/tmpuhFNpb.
INFO | runtests.py | Installing extension at /builds/panda-0580/test/build/tests/mochitest/extensions/specialpowers to /tmp/tmpuhFNpb.
INFO | runtests.py | Installing extension at /builds/panda-0580/test/build/tests/mochitest/extensions/workerbootstrap to /tmp/tmpuhFNpb.
INFO | runtests.py | Installing extension at /builds/panda-0580/test/build/tests/mochitest/extensions/worker to /tmp/tmpuhFNpb.
Traceback (most recent call last):
File "mochitest/runtestsremote.py", line 577, in main
File "/builds/panda-0580/test/build/tests/mochitest/runtests.py", line 705, in runTests
manifest = self.buildProfile(options)
File "mochitest/runtestsremote.py", line 299, in buildProfile
File "/builds/panda-0580/test/build/tests/mochitest/devicemanagerSUT.py", line 442, in removeDir
if self.dirExists(remoteDir):
File "/builds/panda-0580/test/build/tests/mochitest/devicemanagerSUT.py", line 398, in dirExists
raise DMError('Automation Error: DeviceManager isdir returned null')
DMError: Automation Error: DeviceManager isdir returned null
Traceback (most recent call last):
File "mochitest/runtestsremote.py", line 595, in <module>
Automation Error: Exception caught while running tests
main()
File "mochitest/runtestsremote.py", line 581, in main
mochitest.stopWebServer(options)
File "mochitest/runtestsremote.py", line 292, in stopWebServer
self.server.stop()
AttributeError: 'MochiRemote' object has no attribute 'server'
program finished with exit code 1
elapsedTime=6.779581
TinderboxPrint: mochitest-plain<br/><em class="testfail">T-FAIL</em>
Unknown Error: command finished with exit code: 1
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Assignee | ||
Comment 17•12 years ago
|
||
Almost all of the recent failures occur in robocop's testSystemPages and show a reboot. Many logs have:
W/Watchdog( 1403): *** WATCHDOG KILLING SYSTEM PROCESS: null
just before the reboot.
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Assignee | ||
Comment 22•12 years ago
|
||
Also (Comments 18, 21):
*** WATCHDOG KILLING SYSTEM PROCESS: com.android.server.PowerManagerService
| Assignee | ||
Comment 23•12 years ago
|
||
I am not sure about the cause of the WATCHDOG messages, but....
Notice that in most of these logs, the first thing that goes wrong is not "isdir returned null" but "Timeout in command exec".
Here's a try push that shows the same thing happening during a special long-running test with devicemanagerSUT debug level 4: https://tbpl.mozilla.org/?tree=Try&rev=f4960773a683. Here we see that we issue a sut exec call and wait for response:
sent cmd: exec "MOZ_CRASHREPORTER=1,XPCOM_DEBUG_BREAK=stack,MOZ_HIDE_RESULTS_TABLE=1,MOZ_CRASHREPORTER_NO_REPORT=1,NO_EM_RESTART=1,MOZ_PROCESS_LOG=/tmp/tmp0AILfZpidlog,XPCOM_MEM_BLOAT_LOG=/tmp/tmpDYoKnO/runtests_leaks.log" am instrument -w -e deviceroot /mnt/sdcard/tests -e class org.mozilla.fennec.tests.testLong2 org.mozilla.roboexample.test/org.mozilla.fennec.FennecInstrumentationTestRunner
recv'ing...
... (about 300 lines)
recv'ing...
recv'ing...
INFO | runtests.py | Received unexpected exception while running application
Traceback (most recent call last):
File "/builds/panda-0559/test/build/tests/mochitest/runtests.py", line 768, in runTests
onLaunch=onLaunch)
File "/builds/panda-0559/test/build/tests/mochitest/automation.py", line 1207, in runApp
stderr = subprocess.STDOUT)
File "/builds/panda-0559/test/build/tests/mochitest/remoteautomation.py", line 175, in Process
return self.RProcess(self._devicemanager, cmd, stdout, stderr, env, cwd)
File "/builds/panda-0559/test/build/tests/mochitest/remoteautomation.py", line 185, in __init__
self.proc = dm.launchProcess(cmd, stdout, cwd, env, True)
File "/builds/panda-0559/test/build/tests/mochitest/devicemanagerSUT.py", line 519, in launchProcess
self.fireProcess(cmdline, failIfRunning)
File "/builds/panda-0559/test/build/tests/mochitest/devicemanagerSUT.py", line 472, in fireProcess
self._runCmds([{ 'cmd': 'exec ' + appname }])
File "/builds/panda-0559/test/build/tests/mochitest/devicemanagerSUT.py", line 151, in _runCmds
self._sendCmds(cmdlist, outputfile, timeout, retryLimit=retryLimit)
File "/builds/panda-0559/test/build/tests/mochitest/devicemanagerSUT.py", line 132, in _sendCmds
raise err
DMError: Automation Error: Timeout in command exec
This does not always fail this way. In particular, it usually succeeds on tegras, and often fails on pandas.
devicemanagerSUT is waiting for the $> prompt response for 300 seconds. sutagent is also waiting 300 s for program completion: https://hg.mozilla.org/mozilla-central/file/40a228f74389/build/mobile/sutagent/android/DoCommand.java#l3734
I'm thinking that increasing the devicemanagerSUT timeout by a few seconds will help us on long-running (more than 5 minutes) tests.
Assignee: nobody → gbrown
| Assignee | ||
Comment 24•12 years ago
|
||
Special long-running test, before:
https://tbpl.mozilla.org/?tree=Try&rev=f4960773a683
And with devicemanagerSUT timeout bumped up from 300 to 320 seconds:
https://tbpl.mozilla.org/?tree=Try&rev=165cb260ff89
And here we check it against the full set of tests:
https://tbpl.mozilla.org/?tree=Try&rev=995bc18baa6d
There are still panda rc failures here, but no "isdir returned null" or "Timeout in command exec" (so far, but I'll retry some more).
| Assignee | ||
Comment 25•12 years ago
|
||
Why 320? Since sutagent is waiting in 10s increments, I want an extra 10s in case there is an extra increment somehow. And then I added another 10s for network delay, etc. It's more than needed, but still only a 7% increase.
Attachment #734077 -
Flags: review?(jmaher)
Attachment #734077 -
Flags: feedback?(wlachance)
Comment 26•12 years ago
|
||
Comment on attachment 734077 [details] [diff] [review]
increase dmSUT default timeout from 300 to 320 s
Review of attachment 734077 [details] [diff] [review]:
-----------------------------------------------------------------
this is only a robocop issue as mochitest/talos/reftest all fire the process and then poll for the process. Robocop (due to the instrumentation interface) does not return from the fireCommand until it is completely done. So this patch makes sense. Could we instead do this inside of runtestsremote.py:
if robocop:
dm.default_timeout = 320
If we could go that route I would be happier. If it doesn't work, I am fine with this patch. r+ as the solution is valid.
Attachment #734077 -
Flags: review?(jmaher) → review+
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Assignee | ||
Comment 28•12 years ago
|
||
As suggested, here is a patch that limits the change to robocop tests. It seems to work fine.
I am a little concerned that there might be some way to hit the 300 s timeout during other test types, but I think either approach is acceptable.
Attachment #734702 -
Flags: review?(jmaher)
Attachment #734702 -
Flags: feedback?(wlachance)
Comment 29•12 years ago
|
||
Comment on attachment 734702 [details] [diff] [review]
increase dmSUT default timeout during robocop tests only
Review of attachment 734702 [details] [diff] [review]:
-----------------------------------------------------------------
I like this better, something to keep in mind for other harnesses, but I would rather hack it from the harness level than DM.
Attachment #734702 -
Flags: review?(jmaher) → review+
| Assignee | ||
Comment 30•12 years ago
|
||
| Comment hidden (Legacy TBPL/Treeherder Robot) |
| Reporter | ||
Comment 32•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Comment 33•12 years ago
|
||
Comment on attachment 734077 [details] [diff] [review]
increase dmSUT default timeout from 300 to 320 s
Makes sense
Attachment #734077 -
Flags: feedback?(wlachance) → feedback+
Comment 34•12 years ago
|
||
Comment on attachment 734702 [details] [diff] [review]
increase dmSUT default timeout during robocop tests only
Makes sense
Attachment #734702 -
Flags: feedback?(wlachance) → feedback+
| Comment hidden (Legacy TBPL/Treeherder Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•